System Status in Panel?!


#1

I’ve been checking out the system status in the control panel for the last few months and I’m not sure it’s entirely accurate, for example: I have a site on charm that just went down (http ftp telnet ssh) for at least 14 minutes (no idea how long it was out before I noticed). 14 minutes out of a day would be 99.03% uptime right? However the status in the control panel is showing 100% for this day.

Can one of you good people gimme an idea of how the System Status works, how accurate it is? And if it’s not accurate, why bother, or better yet, when’s the new, more accurate version coming?

thanks

[color=#0000CC]jason[/color]


#2

Well, I haven’t quite thought about this until now. I just set up a newly registered domain yesterday (31 December 2002) but the Historical Uptime Status for http is “Up for 10 days 4 hours”!! What gives?


#3

The uptime is for the machine or webserver, not for the domain itself.


#4

But then the System Status page is listed by domains! And is it actually that MOST clients should have ALL their domains on the same machine?


Alleged Cybersquatter - knows a thing or two about domain names
http://www.nameslave.com


#5

It’s by domain since it would presumably not make sense to list it by machine (many customers don’t know their machine name).

In most cases, customer domains will be on the same machine, but that’s certainly not the case always, which is why the system status page displays things this way.

w/r/t accuracy, our stats should be pretty accurate from our perspective. Obviously if there’s a network problem between you and us, our monitoring systems may not detect it.


#6

Is there a minimum “floor value” for downtime? For instance, all my domains were not accessible again just now (around :10 p.m. Eastern time) for a couple of minutes. As a check measure, I am able to access yahoo.com and many other websites during that period. What gives?


Alleged Cybersquatter - knows a thing or two about domain names
http://www.nameslave.com


#7

I’m not sure exactly how often each machine / webserver is checked, but I think it’s every minute or two. It’s possible that a brief outage wouldn’t be noticed. It’s also possible that there was a network outage between you and us.

A few things to look for next time…

  1. What’s the exact error? Is the connection being refused, or is it not responding at all?

  2. Can you ping your site (at the time)? Traceroute?


#8

It happens AGAIN as we speak (10:52 a.m. Eastern Time). I’m not that familiar with ping or traceroute, but I did a couple at http://www.bungi.com/cgi-bin/trace:

Traceroute Output

FROM www.bungi.com TO nameslave.com.

traceroute to nameslave.com (66.33.209.92): 1-30 hops, 38 byte packets
1 main.bungi.com (207.126.97.9) 1.95 ms 1.58 ms 1.55 ms
2 ser3-5-0.orpa1.pf.isc.org (192.83.249.250) 4.3 ms 3.59 ms 3.54 ms
3 pos1-0-0.orrc5.pf.isc.org (192.5.4.162) 4.48 ms 3.67 ms 3.67 ms
4 t3-0-2-0.orsf2.pf.isc.org (192.5.4.249) 16.1 ms 11.8 ms 15.3 ms
5 GigabitEthernet1-0-107.edge1.paix-sjo1.Level3.net (209.245.146.249) 11.6 ms 11.5 ms 17.9 ms
6 GigabitEthernet3-0.core1.SanJose1.Level3.net (209.244.3.245) 11.9 ms 15.9 ms 17.2 ms
7 ae0-55.mp1.SanJose1.Level3.net (64.159.2.129) 29.0 ms 12.6 ms 12.2 ms
8 so-2-0-0.mp1.LosAngeles1.Level3.net (209.247.9.113) 28.2 ms 27.8 ms 34.1 ms
9 pos8-0.core1.LosAngeles1.Level3.net (209.247.10.194) 20.0 ms 34.8 ms 31.1 ms
10 6-0.ipcolo1.LosAngeles1.Level3.net (209.244.10.42) 19.7 ms 20.3 ms 26.2 ms
11 gw-l3.sd.dreamhost.com (63.208.231.118) 21.9 ms 23.0 ms 23.1 ms
12 basic-pat.apok.dreamhost.com (66.33.209.92) 29.7 ms 68.9 ms 36.9 ms

FROM www.bungi.com TO megacity.com.

traceroute to megacity.com (66.33.210.180): 1-30 hops, 38 byte packets
1 main.bungi.com (207.126.97.9) 1.88 ms 1.41 ms 2.66 ms
2 ser3-5-0.orpa1.pf.isc.org (192.83.249.250) 53.5 ms 27.8 ms 25.9 ms
3 pos1-0-0.orrc5.pf.isc.org (192.5.4.162) 22.1 ms 33.4 ms 3.66 ms
4 t3-0-2-0.orsf2.pf.isc.org (192.5.4.249) 13.9 ms 10.8 ms 10.6 ms
5 GigabitEthernet1-0-107.edge1.paix-sjo1.Level3.net (209.245.146.249) 11.3 ms 11.3 ms 11.3 ms
6 GigabitEthernet3-0.core1.SanJose1.Level3.net (209.244.3.245) 12.5 ms 11.6 ms 11.6 ms
7 ae0-53.mp1.SanJose1.Level3.net (64.159.2.65) 12.1 ms 12.4 ms 12.7 ms
8 so-2-0-0.mp1.LosAngeles1.Level3.net (209.247.9.113) 20.0 ms 19.5 ms 19.3 ms
9 pos8-0.core1.LosAngeles1.Level3.net (209.247.10.194) 19.5 ms 20.2 ms 22.1 ms
10 6-0.ipcolo1.LosAngeles1.Level3.net (209.244.10.42) 19.4 ms 19.3 ms 19.4 ms
11 gw-l3.sd.dreamhost.com (63.208.231.118) 201 ms 237 ms 259 ms
12 basic-vat.apok.dreamhost.com (66.33.210.180) 21.8 ms 22.0 ms 21.6 ms

I don’t know how to interpret these, except that for megacity.com, there seems to be some delay within DreamHost for a little bit longer. Please advise. Thanks.

ALL my domains are still not accessible after writing all these.


Alleged Cybersquatter - knows a thing or two about domain names
http://www.nameslave.com


#9

prufrock! If you could put the downtime info in the other thread ("…SIGNIFICANTLY") and the panel problems here that’d keep the answers to the problems seperate if we get any solid one. And… hi! =)

yep, down for between 35-40 minutes and the web panel ended up reporting it 18 minutes or so (I was gonna check for a solid number but the panel’s not coming up =o )

Hep!

(not being a harpy old woman, just hoping this helps get the stuff back together)

[color=#0000CC]jason[/color]


#10

I’m currently getting the ‘how ironic’ message in the status now for a site on charm.

Should be interesting to see what it reports as I’ve documented a pantload of downtime in the past 2 weeks. With what I’ve personally witnessed (site down while I’m sitting there) it’s down to 99.1% in the last 2 weeks.

[color=#0000CC]Can a grizzled veteran tell me what’s up with the system status in the control panel[/color]? I haven’t seen this weird behavior with it until the last 2 weeks or so. Now it’s commonplace for the percentage to be off, for it to not be picking up downtime while_it’s_hapening or adjusting the percentages properly well after the fact. And there’s that ironic message I’ve seen 3 or 4 times in the last few weeks.

thanks

[color=#0000CC]jason[/color]


#11

Hey Jason,

Just letting you know the system status in the panel is “fixed” now… i.e. it shouldn’t give that “how ironic” message anymore at least. The problem was recently we made some improvements to our monitoring system, switching to a new server and a new database and new software. But, the web panel page hadn’t been updated to look at the new database!

I’m not sure if the uptime information given there is very accurate right now, since we’ve only started using the new database pretty recently… but after a little while it’ll get better as time goes by.

Of course, there will always be some downtime that our system can’t report… things like network problems upstream from us and internal misconfigurations that result in downtime for a specific site but not the whole service in general. But this should still be a somewhat useful tool for the monitoring it does provide.

josh!