Accurate bandwidth tracking and weblog analysis

It appears dreamhost relies solely on weblog analysis to determine your bandwidth usage and not a lower level mechanism?

Is this not potentially very innacurate?

For example, if a user requests a large file and cancels soon after, the weblog will show a “200 OK” message with the full filesize, even though a much smaller amount of data was actually transferred. With this in mind the presence of any large files on a dreamhost customer’s site could be abused to create wildly innacurate bandwidth reports.

I just did a simple test, I downloaded a couple of files from one of my domains and cancelled the downloads before completion. The access logs showed the actual bytes transferred, not the full filesize.


Save [color=#CC0000]$50[/color] on DreamHost hosting using promo code [color=#CC0000]SAVEMONEY[/color] ( Click for promo code details )

What browser were you using?



Save [color=#CC0000]$50[/color] on DreamHost hosting using promo code [color=#CC0000]SAVEMONEY[/color] ( Click for promo code details )

I tried with both firefox and IE and got the same result.

[quote] - samurai [05/Sep/2006:06:19:41 -0700] “GET /test/ HTTP/1.1” 200 336 “-” "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)" - samurai [05/Sep/2006:06:19:53 -0700] “GET /test/test.iso HTTP/1.1” 200 678873088 “-” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)”[/quote]
Notice I cancelled the 650mb file just 12 seconds after I got the index listing, but the full filesize shows.

Strange, I just tried again using both Firefox and IE 6 with the same result as my previous test.

[quote] - - [05/Sep/2006:06:29:29 -0700] “GET / HTTP/1.1” 200 2061636 “-” "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20060728 Firefox/" - - [05/Sep/2006:06:30:00 -0700] “GET / HTTP/1.1” 200 2256944 “-” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”[/quote]
I stopped both downloads after about 2MB, the full filesize is 23.6MB


Save [color=#CC0000]$50[/color] on DreamHost hosting using promo code [color=#CC0000]SAVEMONEY[/color] ( Click for promo code details )

I want to know why your logs are more accurate than mine.

Perhaps it is due to different configurations on the servers themselves, my sites are on the ‘Bixel’ server.

I would be interested in hearing from others, to determine whether yours or mine is the default behaviour for DreamHost servers.


Save [color=#CC0000]$50[/color] on DreamHost hosting using promo code [color=#CC0000]SAVEMONEY[/color] ( Click for promo code details )

My site is on ‘look’. - gordaen [06/Sep/2006:09:55:09 -0700] “GET /testmr2/mt baker road.mpg HTTP/1.1” 200 833840 “-” “Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv: Gecko/20060508 Firefox/” - gordaen [06/Sep/2006:09:55:25 -0700] “GET /testmr2/mt baker road.mpg HTTP/1.1” 200 71864 “-” “Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/418.8 (KHTML, like Gecko) Safari/419.3” - gordaen [06/Sep/2006:09:59:50 -0700] “GET /testmr2/mt baker road.mpg HTTP/1.1” 200 1551492 “-” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)” - gordaen [06/Sep/2006:09:59:11 -0700] “GET /testmr2/mt baker road.mpg HTTP/1.1” 200 666488 “-” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20060728 Firefox/”

On “rollo”

File is actually 42.6mb, stopped it after a few seconds each time in four different browsers and two different operating systems.

Check out Gordaen’s Knowledge, the blog, and the MR2 page.

Should I file a low priority support ticket to inquire as to why my logs report this way?

That might be a good idea, it’s possible your particular server is mis-configured.


Save [color=#CC0000]$50[/color] on DreamHost hosting using promo code [color=#CC0000]SAVEMONEY[/color] ( Click for promo code details )

Sent! I’ll make sure to keep you guys updated when I get a response.

Hello Jonathan,

[quote]I’ve discovered that the apache access logs on my server will report
the full filesize of a file that is only partially transferred to a
user (due to user disconnect or cancellation, for example). This
causes analog to report inflated statistics regarding bandwidth use.


My apologies for the delay in getting to your ticket. We have been a bit
swamped and are working hard to catch up.

I have also had experience with apache and how weird it logs large files
that are only partially downloaded. I’m afraid it isn’t just a config on
your server, because all servers are configured exactly the same with us.
Apache does at least put in a special status code for such content, 206

  • Partial Content. It is up to the stats program to work with the status
    code and filter out the junk. Analog I’m afraid doesn’t do this, neither
    does webalizer. The only program I’ve seen do this yet would be awstats.
    For example, on a site that I host, that has lots of linux source
    packages and livecd isos:

Webalizer shows 2TB of traffic this month, which is just insane and not
true. It doesn’t filter out the 206 status code logs:

Awstats on the other hand is smart enough to know this, and will sort the
206 logs into a different category:
There is shows that there is really only 90GB of downloads, while the 206
status codes make up over 1800MB of junk.

That is on a normal configured apache2 system as well. So there really
isn’t anything we can do for this I’m afraid, I’ve searched on my own
before for a way to fix this but haven’t had any luck. If you need to
have dead-one perfect total download calculations, you might need to
switch to a different stats program.

As for the actual traffic used, according to the admins this is
calculated over the network and so this shouldn’t have any negative
effect on your traffic numbers with us at DreamHost. As for others not
having this problem, they either don’t have that many large files, they
aren’t often downloaded or they are more often downloaded to completion.

Let me know if you have further problems or questions.


[14:54] * Now talking in #apache
[14:56] If a user begins downloading a file and cancels before it is completed without attempting to resume the download, should access.log report the full size of the file that was partially transferred or only the bytes transferred?
[14:57] mod_log_io
[14:57] you need to use that
[14:57] fajita: mod_logio
[14:57] chipig: mod_logio is
[14:58] mod-logio logs the real number of bytes transfered
[14:58] otherwise, it is only the full file size

Oh the mystery!

“Note that in httpd 2.0, unlike 1.3, the %b and %B format strings do not represent the number of bytes sent to the client, but simply the size in bytes of the HTTP response (which will differ, for instance, if the connection is aborted, or if SSL is used). The %O format provided by mod_logio will log the actual number of bytes sent over the network.” []

Mystery has been solved! Thanks to Sabrejack and JustinRK from the dreamhost chatroom on freenode. Apache1 and Apache2’s logging behavior differ, and without the use of mod_logio they will continue to differ in this way.