One of our two offices is located in the same building as the data center, albeit on a different floor. On any given day you can probably find one of our administrators doing something within the data center itself (we have full, 24/7 access).
This doesn't impact most issues, though - it's only in (relatively rare) cases of outright hardware failure that we ever have to have actual "hands on" access to the servers (we can do much of the usual stuff - run commands, etc. from anywhere in the world - you've not lived until you've power-cycled a server while stuck in LA traffic).
The 40+ minute delay you speak of is mostly comprised of the time it takes for us to notice the downtime, plus (in the event of a power-cycle) some time - maybe 10-15 minutes or so - for the server to actually come back up. In some cases it can take a little longer if, for example, an fsck is occurring.
So, really, the greatest amount of time is shared between A) being notified that a given server/service is down and B) finishing up whatever we're working on at the time (ie. another problem).
Some issues receive higher priority, of course. If an entire server is down it will be resolved more quickly than if, for example, a single Apache instance has gone offline.
Anyhow... I would say that 40-60 minutes is longer than normal. If a server is dead in the water, we usually find out about it and get it fixed a lot quicker than that. Even dead Apaches don't usually take that long to fix.
- Jeff @ DreamHost
- DH Discussion Forum Admin