We will be upgrading the power strips in our racks today which will result in a short period of downtime for all services (ie HTTP, MySQL, Telnet, SSH, FTP, etc…). Your sites and servers should only be down for approximately half an hour. If you have any questions or concerns regarding this upgrade please do not hesitate to contact our Tech Support Team.
At what time will this take place? (Pacific time)
Which IDC? / Which servers?
C’mon-- You guys are going back to your old habit of sending out vague information. Please clarify.
And… half an hour? It’s not possible to install redundantly and cut over?
Sorry to follow up to myself, but this needs to be discussed publicly – and I’ve been encouraged by one particular member of Dream Host staff to post these things publicly.
This is my response to a reply from Support (personnel name omitted).
I seem to have written this letter before…
[quote]Well it’s hard to be specific when you’re unplugging 65 different
machines one at a time over a period of hours. The actual downtime for
each machine was in the range of 3-5 minutes, so it’s not as if there
was a really significant period of downtime.
Most of our machines don’t have dual power supplies, so they have to be
shut down and then plugged into the new power strips.
I can’t really ask the folks at the office to make announcements for
each machine individually at a particular data center for a whole bunch
of different machines (nor is it possible to predict the time that we’ll
take an individual machine down).
You just gave me the details. That wasn’t so hard, was it?
But you sound miffed.
Do you think users don’t have the right to know what’s going on? Do you think we don’t have the right to know when and for how long our clients are going to be out of business? It sounds as though you think that asking to be informed of incidents in a professional way is completely unreasonable.
Listen, I’m not asking for an individual announcement for every machine. All I’m asking for is clarity.
You had the details, and you gave them to me above. Why weren’t those details put into the announcement? There’s no reason for it.
Dream Host staff have sent out vague announcement after vague announcement. I griped about it, and the announcements became clearer-- for a while. Now they’re vague again.
All you have to do to write a proper announcement is to provide specific information, as you did above. If you’re able to provide me with details like the above, then the people at the office shuld be able to as well. Give them the details to give to us.
The announcement said the downtime would be 1/2 hour. But above, you say the downtime will be 3-5 minutes. That’s a very, very large difference.
It’s interesting-- You say it’s hard to be specific, and then to provide a reason why you can’t be specific, you provide specifics.
If give us the specifics in the original announcement, you wouldn’t have to give us the specifics when we’re forced to ask for them later.
Your patronizing tone says that you don’t think users need to know specifics – and that we wouldn’t have any idea what to do with them if you did give them to us.
Dream Host signs its announcements with “Happy This” or “Happy That”. Hey, it’s nice to have a friendly tone in company communications… But when the announcements are completely vague and uninformative, and then are signed with a flip, fluffy one-liner, it says – true or not – that Dream Host’s attitude toward business is anything but professional. When you tell the users “don’t worry-- be happy” and then tell them that they don’t need details, you show contempt for them.
“Happy Power Upgrade Team” my ass.
We need the details. We deserve the details. Give us the details.
I’ve said it before: I’m going to keep on hammering you guys about this until you figure it out.
A bit harsh, don’t you think Bob? I’m sure a more encouraging and friendly message “asking” rather than “demanding” would go down a lot better.
As a System Administrator myself, these kind of messages do not go down well, and most likely ignored. This kind of tone is certainly not appreciated.
I understand your argument, and I hear what you’re trying to say. But bashing and publicly insulting someone’s work is hardly going to encourage them to change to your liking.
I’m personally glad they put the 30 minutes head on the maintenance task. This allows room for error, and if difference services are hosted on various machines, then the outage would have amounted to 30 minutes or so. I’ve got mail, ftp, web, web-redirects, mysql running on 5 different machines. That’s 25 minutes downtime according to the calculations provided by the support staff. That sounds just about right for all users I should imagine.
All communications issues aside, I DO think Bob has a good point about making announcements clear and informative. We understand that many of our customers run businesses every bit as important as ours is to us, and this stuff makes it a lot easier for them to prepare for outages, etc.
I also understand that while we may see messages from our customers about downtime, we don’t see the messages from THEIR clients to them. That’s something we should always keep in mind. Far more people depend on us than those we collect payment from.
On the other hand, I also see an extraordinary Admin team that puts in longer hours than anyone in our company, and overall does an excellent job at what they do. I’d be begging for change on a street corner if it weren’t for them - our company couldn’t survive more than a day or two without their talents.
I guess the main thing to remember is, if our Admin folk had the communicative skills that my team (Marketing) does, they’d probably be sitting at my desk instead of some cold data center somewhere. We hired them mostly based on their ability to resurrect dead servers and keep them operational, and that’s what they do best.
But - while they are great at what they do, we also need to find new (and especially, time efficient!) ways to help them get the word out to customers without taking them away with their most important tasks, such as upgrades and ‘putting out fires’.
It’s easy for me to take the time to write informative, friendly messages to customers. It’s a lot harder to write something spelled correctly - let alone useful - after 10 straight hours fighting fires at a data center when you can hardly keep your eyes open.
Any constructive suggestions you all can think of to do that will be more than appreciated, and will be passed on to our happy Admin people. Really, we DO want to get this right…
Nope. Not at all, considering the amount of time this problem has existed, and the patronizing tone of the admin’s response.
[quote]I’m sure a more encouraging and friendly
message “asking” rather than “demanding” would go down a lot better.
Wil, there is substantial history involved here, of which you are apparently not aware.
I have encouraged, and kidded, and begged, and asked and pleaded for what? months? for the announcements to include information that is necessary for me and many other users, no doubt to conduct business.
I have been assured several times by Dream Host management that communication would be improved, and that the announcements would contain all the pertinent information.
The announcements got better-- temporarily… And then they started going back to being vague.
I asked Support to clarify, and Support acted as though I was being unreasonable.
[quote]As a System Administrator myself, these kind of messages do not go down
well, and most likely ignored.
As a lower-level system administrator myself, I know that the announcements that are being sent out are completely inadequate.
There is no way I would ever send out a notice to my users that said their machines would be going down for a half an hour sometime in the next 24 hours, when the reality was that they would probably be going down for 3-5 minutes sometime within a certain hour or hours.
As a lower-level admin myself, I also know that the tone of the response that prompted my response – the one that offends you – was unnecessarily patronizing, and that acting like a BOFH is going to get a reply like mine a majority of the time.
And responding to a BOFH attitude with strong words is completely understandable, and is sometimes the only way to get the attention of some admins.
Acting like a snivelling, lowly, totally unknowledgable user only encourages some admins to maintain their BOFH persona.
And I don’t like bullies – especially bullies who are paid to be helpful. An admin who bullies me will get a response like that every time, and his superiors will be CC’d, and/or the post will be made public.
I would never talk to one of my clients the way that one particular admin talks to me and other users. I expect to be treated with the kind of respect – and to be given the kind of information – that I give my own clients.
[quote]This kind of tone is certainly not appreciated.
The admin’s patronizing tone is certainly not appreciated.
Re-read the response I got from the admin.
His tone is the reason for my tone.
It’s clear that at least one admin at Dream Host is suffering from a case of “High Priest Syndrome”.
This is a very old problem in the computing world. The problem arises from an admin attitude that says “Users don’t need the arcane information that we possess; you are not knowledgable enough to handle it… And what would you need it for, anyway? All you have to do is to put your faith in us, and we will take care of everything.”
The parallel to this is “Black Box Syndrome.” It’s based on the belief that users don’t need to know how a piece of software or hardware works… they just need to blindly trust the makers.
It’s easy to fall into this pattern, and it doesn’t necessarily result from a lack of caring about quality on the part of the admin/support person. It’s usually used – consciously or not – as a kind of defense mechanism.
Admins can be flooded with questions or complaints, but instead of offering as much information as is possible, they sometimes do the opposite, and close off communication, elevating themselves above the users, trying to maintain a “trust us-- we know what we’re doing” kind of persona. This persona can be optionally warm and friendly (a la the “Happy” announcements) or snippy and condescending (a la the admin’s response).
The problems with these syndromes are sometimes subtle, sometimes not.
The announcement said:
The announcement does not give any of these necessary points:
Which IDC? All of them?
If just certain IDCs, which machines? All of them?
When will this service window begin?
When will this service window end?
In addition, the annoucement says that the servers should be down for only [sic] half an hour… It is not saying how long each individual server is expected to be down, although it’s making it sound as though a single server could be down for 1/2 hour… when the actual downtime, according to the admin’s response, would probably only be 3 to 5 minutes.
The problem here is that the people who are writing the announcements – and/or the admins who ask them to send out an announcement – are forgetting that the recipients of the announcements aren’t the only ones affected.
I have clients who need to know what’s going on. They need to know how long they’re going to be out of business, and when.
There is a huge difference between my sending an email to my client that says this:
“Hello. Sometime within the next 24 hours, your website and other services will be unavailable for up to half an hour.”
…and sending an email that says this:
“Hello. At sometime between 6pm and 6:30 pm, your website and other services will be unavailable for approximately 3 to 5 minutes.”
A huge difference… and the specific information – which the admin apparently had – should have been in the original announcement. If the window needed to be longer, fine. Just tell us “between 6pm and 9pm” or whatever’s appropriate.
The admin maintained that he was unable to send out details, but in saying that he couldn’t send out details, he gave me the details:
The original announcement said 1/2 hour. Now he says 3-5 minutes. Why didn’t the original announcement say that?
And why didn’t the original announcement say that 65 machines would be affected? More importantly, does he mean that all Dream Host servers are involved? If that’s true, why not say so?
Since the original announcement said that this was a power strip upgrade, and it would take a half an hour, I asked why it would take so long to do that, and if it wouldn’t be possible to install redundantly and then just cut over. To which the admin replied:
Reasonable enough. But this information wasn’t in the original announcement. The thing is, though, that I would not have asked that question if the original announcement had contained the correct estimated downtime of 3 to 5 minutes.
As for being more IDC- or server-specific, the admin said:
In the “High Priest” world, it’s all or nothing.
I didn’t ask for a separate announcement for each machine, yet the admin jumps to that conclusion, and tries to make it look like I’m being totally unreasonable.
All I asked for was the information they have-- the information that should have been included in the original announcement.
The time window in the original announcement could have been – depending on the time of day it was written – as long as 24 hours.
The original announcement didn’t say how many machines were involved. It could have been two. Since they chose not to tell us, how could I have known that 65 were involved? The admin acts as though I should have somehow magically known that many machines were involved, and that predicting individual downtimes wouldn’t be safe or practical.
All I’m asking for is that the announcements contain the specific information that they do have – most of which the admin was able to give me in his reply.
I shouldn’t have to write back to TS after an announcement and get the information that the admin apparently did have, and which should have been in the announcement in the first place.
This has happened time after time after time.
And Dream Host staff agrees with me that this is a problem.
[quote]I understand your argument, and I hear what you’re trying to say. But
bashing and publicly insulting someone’s work is hardly going to encourage
them to change to your liking.
I’m not insulting someone’s work-- I’m criticising it.
The admin (in repeated incidents) insults me with his patronizing tone, and then I guess I’m supposed to cower off in a corner, saying “Thank you sir, may I have another”? Please.
I don’t have a short fuse. But when this kind of problem continues for months, and the response is repeated browbeating, I don’t expect to have to apologize for reacting asa I did.
If the admin is insulted because they think they’re doing a perfect job, and there’s no reason to give the users the information they need and have a right to, there’s not much I can do about that.
I have talked to Dream Host privately – and nicely – for long enough.
The longer this inadequacy goes on, and the more times I am forced to write to TS just to get some minimal information out of them, the more one or two certain admins are able to privately dismiss me as some sort of crank/malcontent.
It’s already apparent from another response yesterday that more than one support person is reading my messages. One remarked (without my prompting) that he knows “how much you like details.”
This says to me that the support people think that users who ask for information are a PITA, or at least strange. Otherwise, why would he mention that at all?
All users want and need information about what’s going on. My asking for information is completely unremarkable, but somehow the admin felt a need to express – in not so many words – that I am not normal in asking for details. High Priest Syndrome once again.
By airing these difficulties publicly, others will know exactly what’s going on, as will Dream Host management.
And, as I mentioned, I have been encouraged to post publicly about this problem, and other problems, by Dream Host management.
A member of staff has told me that if there is a message in the forum that he can show to his higher-ups, he can point to it and say, “See? There’s a problem we need to take care of.”
Posting publicly is entirely approriate, and apparently entirely necessary in this situation.
Dream Host has asked me to do it, so I’ll keep doing it.
[quote]I’m personally glad they put the 30 minutes head on the maintenance task.
This allows room for error, and if difference services are hosted on
various machines, then the outage would have amounted to 30 minutes or so.
I’ve got mail, ftp, web, web-redirects, mysql running on 5 different
machines. That’s 25 minutes downtime according to the calculations
provided by the support staff. That sounds just about right for all users
I should imagine.
I think that’s an inaccurate way to look at the announcement.
All the announcement said was 1/2 hour. That makes it sound as though it was likely for each server/domain to be down for a half and hour, when that just wasn’t the case. The real downmtime was estimated to be 3-5 minutes.
Sure, it’s always necessary to offer a worst-case scenario, but that’s not the way in which 1/2 hour was couched.
Saying when the downtime will occur – within a window that’s smaller than 24 hours – combined with an estimated minimum and maximum downtime is the right way to do it, and it’s information that we as users need and deserve.
And yes, you apparently do have a hard-working admin staff.
Please let them all know – even the butthead(s) – that I feel that way.
My point is that all the technical excellence in the world can be blown out of the water when communication breaks down, or when admins don’t think users need information.
And you made a very important point that I brought up in my reply to Wil – that we users have clients of our own, and we have to be able to let them know when and for how long they’re going to be out of business, or what’s being done to fix a problem.
As for suggestions-- when this problem occurred on another occasion, I posted a checklist of the items that should be required in every announcement. Here’s a repost from memory:
A brief description of the incident. Is it planned or was
Whether the entire DH system is affected, or…
The particular IDC(s) affected if it’s not DH-wide, or…
The machine(s) affected, if it’s not an IDC-wide issue.
The start time of the incident.
If it’s unplanned, when did the outage start?
It it’s a planned upgrade/etc., when is it scheduled
The end time of the incident.
If it’s unplanned, when was the outage resolved? Or…
When is the outage estimated to be resolved?
If it’s a planned upgrade/etc., when is it scheduled to
Optionally – as a follow-up message – the actual start or
end/resolution time for the planned or unplanned outage, if
it differed substantially from the estimated time.
If this is a recurring problem, what’s being done to finally
fix it? (a la the “helper” added to Hoggle, or dealing
with the connectivity problems at Softaware)
A somewhat more detailed description of the problem. This
area can be used to offer any details that aren’t
appropriate for the other fields.
The number of legs – collectively – belonging to any cats
which may inhabit the residence of the announcement writer.
This is mandatory.
(I get the feeling I left something out. Let me know…)
It would be easy enough to print this and tack it up above the monitor of anyone who initiates or writes announcements.
But easier still would be the other suggestion I made a while back – an HTML form that’s used to generate announcements. Using a form would allow enforcement of the checklist, and would make writing an announcement easy. Popup lists of incident types, services, IDCs, machines, dates, times, etc. would make accuracy easy.
This form could be used as a standalone, well before the trouble-ticket system you folks have mentioned is actually built. It can optionally be plugged into your list server, or it can just generate text that can be copied and pasted into a mail form, etc.
And it would be extremely easy to set up. As a matter of fact, I volunteer to create the first draft of the form. I’ll send you a URL when it’s done, and you and the admins and other staff can tell me what needs to be added, changed, etc.
If you want to, you could email me a complete list of your IDCs, and machines at each, if you feel comfortable doing so. If not, you can always add them in-house.
Hey, I won’t be doing this just for you (or just to get better announcements out of you) – I’ll also use it to send out announcements to my own clients. I’ll be doing this within the next week. And don’t try to talk me out of it.
See? I’m not a malcontent by definition-- Malcontents don’t offer to help fix things.
Seriously, I think that if you go back and read a sampling of my posts over time, you’ll see what I mean. And you’ll see that I started out addressing the issue with humor (the “Communication Clinic” posts #1 and #2).
Over time, though, I’ve just gotten frustrated with the lack of responsiveness on the part of one or two admins, which is the complete opposite of the reponsiveness of others… in addition to the vagueness relapse.
So, if you’ve read all my posts, I think you’ll see that my pissoffedness was a long time coming.
I’m creating an HTML form that will hopefully save admins time, while allowing them to create announcements that actually have more detailed and more accurate info.
I don’t like griping; I much prefer getting things fixed. So hey, I’m putting my code where my mouth is.
Yeah. I did not realise or appreciate the full extent of the situation when composing my original reply. However I am reluctant to pass further judgment without knowing the full details from both parties - which to be honest, I haven’t the time or energy to be vaguely interested in . And besides, I see that people are reacting to your unhappiness, and I’m preety sure that anything I would or could do could not alter the situation. The problem is being delt with far better and more capable hands.
I think investing in an announcemnt ticket system would be far better, Bob. I’ve seen this in use on my uplink provider and works effectively. It’s reassuring getting that Ticket closed message in my mailbox to inform me that the problem has been fixed and all is well!
Yeah, a full-fledged ticketing system would be optimal, but until DH gets that up and running (they’ve said they’re working on it? thinking about it?), a form will hopefully get the information out better, and save them time in the process.
i don’t really feel the need to get sucked further into this argument. my position is this: we were simply trying to be nice and mention the possiblity that there might be some downtime. in reality if we hadn’t said anything in the first place, it’s likely that very few people would have noticed at all.
we mentioned a time of ‘30 minutes’ since that’s the maximum time we expected. in reality, the actual downtime on almost all machines was around 3 or 4 minutes.
the changes that we were making benefit our customers more than anything else; we’re making everything so that we can remotely power-cycle machines and get a serial connection to machines (also remotely).
i am not always the most tactful of people, but I do care a great deal about our customers and our company. I work hard to fix problems, but there are always a lot of problems to fix. I try to prioritize, and frankly, the 3 minutes of downtime (or even the 30 minutes of downtime) will be worth it in the long run since it will result in a great reduction of future downtime. pretty much everyone on the admin team puts in way over 40 hours a week, with weekends on call, late night / overnight shifts, constant pager duty (with about 300 messages or more per day), phone calls in the middle of the night, and constant threats to our already pathetic social lives. So it’s fair to say that we’re sometimes a bit grumpy. so yes, it’s understandable that some other people in the company are a bit nicer… they’re not the ones waking up in the middle of the night when your server goes down.
It’s simply not possible for us to predict what machine we’ll have to unplug when, nor are we able to access anything except a text console at the data center. That means that we’re at the mercy of other people to actually make announcements. This ended up resulting in some communication problems.
Honestly, though, i’d much rather have a company that’s working to reduce downtime than one that is good about announcing their downtime. We make an effort to make our announcements as detailed as possible, but there are cases where it’s simply not possible or practical to do so. We were trying to do the right thing; in the future I’m just not going to announce minor downtimes like this since it seems to cause so much controversy.
Personally, I try to be results oriented. There are a number of customers who respect this and treat me and my time with respect, and I appreciate this and work my hardest to fix their problems. I can at times be short with people, but a good amount of the time I actually FIX their problem. I enjoy challenges, and I’ve certainly gone far out of my way for many customers. I’m certainly far from perfect; I make mistakes all the time (that’s the only way to learn really).
I don’t think it’s at all fair to say that i have a BOFH attitude… however bob - what did you say your username was?? i think i can fix that ‘over quota’ problem you were having… devilish grin
[oh yes the admin in the comments above is me in case anyone was wondering]
[also note that i’ve spent a 9 hour day in the data center today]
[also note that i had spent a 10 hour day in the data center before writing my initial message to bob_w]
[further note that both of these days were fueled by too much coffee and not enough sleep]
a postscript… please don’t take my previous comments to mean that the rest of the dreamhost team doesn’t work long / strange hours, or doesn’t put up with a lot of the same stuff that the admin team does.
in a small company such as ours, pretty much everyone has to pitch in and help out with all kinds of work, and almost everyone has to put up with some (or most) of the things i mentioned regardless of their job title.
also i noticed in an earlier post the implication that we don’t think that the users need to know information that we’re somehow privy to. the announcement that went out had about as much information as we knew regarding which machines, approximate downtime etc. sometimes it’s not possible to predict these things absolutely. while we may have accidentally omitted a detail or two, the omission was not because we think our users can’t understand it.
that said, we do need to balance the need to satisfy some peoples’ insatiable curiosity with the need to keep it simple for many (most) of our customers.
there is some information that’s simply not relevant to most users that was omited. this isn’t taken out because we think our customers are stupid; however i don’t see a point in giving information that has the potential to confuse more people than it enlightens. a lot of the language we’d have to use would be confusing not because it’s technical, but because it involves a knowledge of the way our system works specifically that most / all of our users don’t have.
in any event, i don’t really feel the need to continue this discussion further, nor do i think further discussion will be constructive.
you said the magic word! (beer) i’ll certainly let you buy me one if i’m in your neck of the woods as long as i can buy you one back for patiently answering so many questions on the discussion board and being an all around nice guy.
Will, I stand by every point I made in my deleted and previous posts, but…
I do apologize for being so rough on you.
I should know better than to reply to a thread like this immediately after reading.
I’m just hoping that you’ll come to understand that communication and technical excellence aren’t mutually exclusive. And if you don’t arrive at that conclusion, I’ll just have to learn to live with less information than I’m used to getting from a host.
Thanks for being willing to post in the forums. I hope this won’t discourage you and others from continuing to do that.
If you have a favorite local beer hall that sells gift certificates, let me know, and I’ll send you one.