Please clarify scope of status.dreamhost.com


#1

Hello Dreamhost. I couldn’t find a statement of what we should expect from status.dreamhost.com, in terms of how serious an outage needs to be in order for it to be documented there.

According to SiteUpTime, one of my Dreamhost sites was down for more than an hour from around 13:00 on 14th December, then up for a short while, then down again for more than an hour from around 14:30.

According to Pingdom (which has a finer resolution), another of my Dreamhost sites was “UP again at 12/14/2012 02:13:09PM, after 1h 7m of downtime”, and then “UP again at 12/14/2012 04:20:09PM, after 1h 33m of downtime”.

While the sites were down, I was puzzled not to see anything about any problem on status.dreamhost.com.

I understand that the intention of status.dreamhost.com is not to document every tiny outage, but is for major outages.

Please could you give an indication of how major is major?

Thanks
~Tom


#2

Along with announcements about service changes, that’s pretty much the gist of it. “Major” is outages which affect more than one server — typically network outages, or issues with shared services (like mail) that would affect a significant fraction of our customers.


#3

Thank you Andrew that’s helpful.

I note on September 3rd there was a status.dreamhost.com announcement about a single server (Hanjin) being offline for hardware replacement. But that was an outage of several hours, so clearly it deserved to be categorised as major.

It would be reassuring if you could give an indication of roughly how long a single-server outage has to be in order for it to count as major.

~Tom


#4

hanjin is a bit of a special case — it’s our personal backups server, and as such is shared between all customers. Outages which just affect a single web / MySQL server are communicated directly to the affected customers (via email and a banner on the panel), rather than cluttering up the status blog.


#5

Thank you for clarifying slightly but this is still puzzling. Has there been a change of policy, and if so, when/how were we told? Looking back a bit further, on March 10th there was an entry on dreamhoststatus.com with the proviso “this is regarding the shared web server ‘dubhe’ and no other services (email, MySQL) or servers are affected”.

Also on March 10th, “this is regarding the shared web server ‘kingston’ and no other services (email, MySQL) or servers are affected”.

On March 8th, “this is regarding the shared web server ‘izar’ and no other services (email, MySQL) or servers are affected”.

On March 4th, “this is regarding the shared web server windhoek only, and no other servers or services (such as mail or mysql) have been affected”.

On March 2nd, “this is regarding the shared web server ‘castor’ and no other services (email, MySQL) or servers are affected”.

Also on March 2nd, “this is regarding the shared web server ‘port-au-prince’ and no other services (email, MySQL) or servers are affected”.

On March 1st, “this is regarding the shared web server ‘singapore’, and no other services (email, MySQL) or web servers are affected”.

On Feb 28th, “this is regarding the shared web server ‘mystique’ and no other services (email, MySQL) or servers are affected”.

On Feb 23rd, “this is regarding the shared web server ‘stampeders’ and no other services (email, MySQL) or servers are affected”.

On Feb 22nd, “this is regarding the shared server ‘michaelkelly’ and no other services (email, MySQL) or servers are affected” … and so on.


#6

Those status posts all date from before we’d solidified this policy. :slight_smile:


#7

Then I liked your liquid policies and hope your solid ones are even better. Anyway, my question stands (“How major is major?”) but needs re-phrasing.

Taking the OP outage (2 and a half hours) as an example … is that something one would expect notification about (either through dreamhoststatus.com, if multi-server, or through email, if single-server)?

I’m trying to get a feel for quantities. I don’t know if that kind of outage would reasonably be regarded as trivial, or if your solidified policy is that it should be notified.

(I didn’t get any email notification, and I’ve just checked the spam folder … nothing there either. Funnily enough, the spam folder contained my pingdom monthly report for November, which says 3 outages adding up to 10 minutes, which feels trivial.)

~Tom


#8

Tom

When we used the status post for server outages there was actually constant complaints that email notifications were better. Major means a 1% of our customer base. Particular server issues it is more contained and we can generally reach customers in a timely manner with an email response. So if it’s only your server having an issue, it’s more efficient to email you directly. We will also put a status message directly on your DreamHost Web Panel. 2 hours outage would warrant an email. An ongoing issue would get you an email and a status post on your DreamHost Web Panel. Being a part of an issue effecting more than one machine will lead to a status post. We are however always willing to improve our system. We are already investigating a better email system where we can do everything we can to confirm receipt. So we never just have policy for policies sake. We constantly evaluate and try to make things the best they can be for our customers. So the scope of the status site has changed and will be changing again as we try to make the tool more useful for our customers. Changes are already in the works for the status page and a new email system is currently being tested for announcements. I hope this helps and if you have any suggestions I’ll be happy to check them out.