Not even online yet and getting bad bots and hackers


#1

Already picking up bad bot traffic and hackers.
Not even online.
I dont have a robots.txt yet, It will probably be done in htaccess
Robots.txt will be a whitelist only
Meaning it will only allow the big three search engines, the rest denied.
Therefore no directories exposed in robots.txt
I dont have wordpress
Opinions would be appreciated

178.137.87.242 - - [18/Feb/2016:08:48:27 -0800] “GET /robots.txt HTTP/1.1” 301 526 “-” "Mozilla/5.0 (Windows 10; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0"
178.137.87.242 - - [18/Feb/2016:08:48:27 -0800] “GET / HTTP/1.1” 301 505 “-” "Mozilla/5.0 (Windows 10; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0"
178.137.87.242 - - [18/Feb/2016:08:48:27 -0800] “GET /xmlrpc.php?rsd HTTP/1.1” 301 533 “-” "Mozilla/5.0 (Windows 10; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0"
178.137.87.242 - - [18/Feb/2016:08:48:27 -0800] “GET /blog/robots.txt HTTP/1.1” 301 535 “-” "Mozilla/5.0 (Windows 10; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0"
178.137.87.242 - - [18/Feb/2016:08:48:28 -0800] “GET /blog/ HTTP/1.1” 301 515 “-” "Mozilla/5.0 (Windows 10; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0"
178.137.87.242 - - [18/Feb/2016:08:48:28 -0800] “GET /wordpress/ HTTP/1.1” 301 525 “-” "Mozilla/5.0 (Windows 10; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0"
178.137.87.242 - - [18/Feb/2016:08:48:28 -0800] “GET /wp/ HTTP/1.1” 301 511 “-” "Mozilla/5.0 (Windows 10; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0"
94.23.3.191 - - [18/Feb/2016:09:17:19 -0800] “GET /index.php/admin/ HTTP/1.1” 301 518 “-” "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11"
187.34.13.183 - - [18/Feb/2016:11:21:29 -0800] “GET / HTTP/1.1” 301 489 “http://top1-seo-service.com/try.php?u=http://domain.com” "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"
162.243.162.87 - - [17/Feb/2016:14:24:30 -0800] “GET / HTTP/1.1” 301 478 “-” "CRAZYWEBCRAWLER 0.9.10, http://www dot crazywebcrawler dot com"
179.184.46.56 - - [15/Feb/2016:10:04:46 -0800] “GET /?author=1 HTTP/1.1” 301 504 “-” "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0"
179.184.46.56 - - [15/Feb/2016:10:04:47 -0800] “GET /administrator/index.php HTTP/1.1” 301 530 “-” "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0"
192.99.107.178 - - [15/Feb/2016:10:40:36 -0800] “GET /robots.txt HTTP/1.1” 301 449 “-” "Mozilla/5.0 (compatible; meanpathbot/1.0; +http://www.meanpath.com/meanpathbot.html)"
192.99.107.178 - - [15/Feb/2016:10:40:38 -0800] “GET / HTTP/1.1” 301 433 “-” “Mozilla/5.0 (compatible; meanpathbot/1.0; +http://www.meanpath.com/meanpathbot.html)”


#2

[quote]Robots.txt will be a whitelist only
[/quote]

Keep in mind that it is up to search engines to honor any requests you make here. It is not a foolproof method.

If a bad bot chooses to ignore your robots.txt file, there is nothing you can do about it via robots.txt.

I’m not sure how you are picking up bots and hackers if you aren’t online yet.


#3

Thanks, yes I recently learned that and a few other things about robot.txt recently
Sorry, yes the start page is online.
Oh, I have my site locked down. No indexing at all.Just the “start page”.
Only example.com is visible in url. I checked. No indexing of legitimate files
or directories. Also no indexing in the url.
You can see in the log someone manually probing for files
and directories. If you call that hacking.
I dont know if the log is showing the full GET header sent.
I figure its a script since they queried 7 times in two seconds. If I am looking
at that log right.
Some others have no idea whats running on the site and they are searching for admin
If I am looking at it right.


#4

[quote]Some others have no idea whats running on the site and they are searching for admin
If I am looking at it right.[/quote]

Possibly because they know you are running WordPress and they know that’s a legitimate file.

This isn’t necessarily a bad thing. If you type in the name of any website + “login” to Google, it will return the login page for you. This is useful (supposedly, I guess) to take you right to the login page if that’s where you want to go.


#5

If you are offline, it’s not possible to be hack even if you had been connected one before.


#6

Dreamhost has some information on blocking robots with the robots.txt file https://help.dreamhost.com/hc/en-us/articles/216105077-How-can-I-control-bots-spiders-and-crawlers- but again, it is up the the bot crawler to honor the settings that you set.

If you find that there are bots that are being too abusive or look malicious you could block some of the bot IPs in your htaccess file to prevent the bot from crawling your site(s)

Example on htaccess rule to block IPs

#Block Malicious IPs Order Deny,Allow Deny from 1.2.3.4 #IPv4 block Deny from 5.6.7.8 Deny from 1234:5678:a:bcd::ef:1234:1234 #IPv6 block Deny from 1234::a:bcd::/48

Dreamhost has some more information on this here https://help.dreamhost.com/hc/en-us/articles/216363167-How-do-I-deny-access-to-my-site-with-an-htaccess-file-

Hope this helps!