FYI - annoying spider


#1

Possible unidentified baidu japan spider is messing up my stats.
#reqs 372 #pages 372 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
Or I have many fans in Japan who use windows vista, but what are the odds of that?
ip range:
37 2.04% 119.63.193.131
33 1.45% 119.63.193.130
33 1.00% 119.63.193.195
29 1.02% 119.63.193.132
28 0.55% 119.63.193.194
26 0.69% 119.63.193.196
The spider that identifies itself is more in line with other search engines and also in china rather than japan.
ref:
https://www.google.com/search?q=119.63.193&aq=f&oq=119.63.193&aqs=chrome.0.57.523j0&sourceid=chrome&ie=UTF-8


#2

Take a look at the user-agent in your http logs.


#3

I noted the user agent above:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
User agents are easy to masquerade.
From the Analog stats.


#4

Also, no normal web browser has exactly 1 request for every 1 page served. It’s almost always more than 1.


#5

The reason I asked is because I’m curious if you see the user-agent changing on each IP.


#6

The anomalous agent in the ‘browser report’ has been the same for at least a couple weeks. It’s up to 416/416 now, while the number for ‘MSIE 7’ in the ‘Browser Summary’ stands at 450/440 (reqs/pages). In ‘operating system report’ ‘Unkown Windows’ is at 449/439. This user agent is legitimate for MSIE7 win vista I believe, and Vista is not listed in OS report. The noted 5 IP addresses only add up to 165 reqs, but there may be others using the same spider or others in the IP range that don’t show up in the top 20.

Top 5 IP’s in ‘Host Report’ are 1) ISP where I’m located 2) MIT 3) ISP near me 4) msnbot 5) yandex bot. 6 and 8-13 are ‘baidujp’ in the suspected IP range. 7 is a company called 'bezeqint in israel. The host for suspect IP’s is listed by number only in http://ip-address-lookup-v4.com/ but show up as baidujp in http://www.ip-tracker.org/lookup/whois-lookup.php . In the ‘Organization Report’ there are 208 reqs for 202.46.x.x and 208 for 119.x.x.x so that is an exact match to the 416 for the odd user agent. 202.46.x.x doesn’t show up in my top 20 hosts though, so they must be using more IP’s.

All the 202.46.x.x’s in my Analog Daily Report are registered to Shenzhen Sunrise Technologies by Max Ma. http://sstef3782.en.china.cn/
My primary UI, BTW is “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31” and others I use would be Firefox Mac, or Bodhi Linux / Ubuntu Studio, so I’m not getting confused by my own stats here.