System status request

I’d like to make a formal request to DH based on some discusion in a previous thread. First, I’d like to see if there is popular support for my request.

One of the problems I have with DH’s current announcement system is they don’t announce problems until after they are fixed. It is great that they acknowledge and provide information about problems via these email announcements, but they don’t do so until after they have fixed the problem.

So, if your site server goes down or emails are bouncing that are being sent to your domains, you may not find out until much after the incident. This makes it very difficult to attempt to work around issues by pointing to a backup server, or posting an alternative email address that might work.

This also makes it difficult to know if DH is aware of the problem you are experiencing. Which raises questions: Am I the only one with this problem? Does DH know about this issue? Are they working on it? etc. I imagine this results in quite a view duplicate support tickets being issued for the same issue, which decreases the efficiency on DH’s side. I’d rather their support staff focus on fixing the problems than wading through their queue looking for duplicate tickets.

It seems only logical that DH should provide optional email announcments and/or a status web page that shows when serious issues are discovered before they are fixed. Sure, they currently have the status page in the control panel, but we all know that is useless. I don’t necessarily want something automatically generated – I want to know what the active critical issues are that are being worked on.

If emails are bouncing that are being sent to my domain, then I think DH has an obligation to tell me about this issue while it is still an issue. Once it is fixed it isn’t a problem anymore. While it is still an unresolved (but known) issue is when it is critical that I know inbound email may not be reaching me.


matt bendiksen

First of all, I’d like to apologize for the recent problems; there have been a few threads here, and it’s certainly hard to show my face around here right now. Letting our domain expire is obviously an extremely embarassing thing to have happen, and was totally inexecusable (in reply to a comment in another thread, we will definitely be renewing it for 10 years in advance or something – after we’ve transferred it away from Network Solutions / Verisign). Ironically, we have always been hesitant to do the transfer exactly because we were worried about problems coming up.

I do hear the comments that our service has gotten worse lately. I do realize that there have been some (significant) problems, but I also think that we’ve actually improved our reliability overall since I’ve been with the company (almost 4 years now). I do think there are some improvements that can be made in terms of making sure that problems are identified and resolved quickly.

Problems have been there in the past, but I think that certain types of problems now tend to have a more global effect (because certain services are centralized) and thus more people notice when there are problems. I think that certain types of problems may also have been identified more quickly in the past.

Both support and the admin team (the team I work with) do tend to keep an eye on stuff that’s low in the queue, and try to identify problems as they come in. In addition, we have someone “on call” at all times, which means they’re watching our internal notification system. This identifies most problems very early, and we’re able to fix them proactively (and announce them, if necessary).

I do think it would be really interesting to write a system that essentially looks for trends (x number of users on a machine, or sharing a group of mail machines, with a particular keyword in the support request). At this time, we don’t have the development resources to write such an “intelligent” system.

I know we look pretty dumb right now; all I can do is ask that people stick with us and at least give us a chance to do better.