00:00:00 baiduspider 220.127.116.11 ~ China 01:18:27pm 01:18:27pm No
Last URL: /2010/page/13/
User Agent: Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Host: (n/a) n/a
So the above is what has been scraping my site for the last 3 weeks continuously! So I’ve resorted to banning the IP range & made several entries in my robots.txt file (although I don’t believe that this bot is acting in any compliance to such)
Although I’m not overly concerned for its presence, however the spam comment hit rate has skyrocketed since this has been interrogating me. China with it’s known history of human rights abuse appears to be scraping the internet, but what is being done with all this information? (not that I’d say there’s anything of any importance on this site!)
I admit that blocking such traffic goes against the grain of how I feel about internet freedom,but when a spider is crawling your site without disclosing a reason (and as I said my spam hit has increased) then I’m not going to allow such traffic as it has the potential to effect other services.
[scott@archbang-netbook ~]$ whois 18.104.22.168
% [whois.apnic.net node-5]
% Whois data copyright terms
inetnum: 22.214.171.124 - 126.96.36.199
descr: Beijing Baidu Netcom Science and Technology Co., Ltd.
descr: Baidu Plaza, No.10, Shangdi 10th street,Haidian District Beijing,100080
status: ALLOCATED PORTABLE
changed: email@example.com 20090715
person: Nan Wang
address: Baidu Plaza, No.10, Shangdi 10th street,Haidian District Beijing,100080
changed: firstname.lastname@example.org 20100322
person: Jacky Chang
address: 10th Floor No.6 2nd North Street Haidian District Beijing,100080
changed: email@example.com 20071227
Now I’m no expert when it comes to APNIC requirements, but just take a look at some of the fields… I have my doubts that this is even a legit compay. 😐
[scott@archbang-netbook ~]$ dig 188.8.131.52
; < <>> DiG 9.8.1 < <>> 184.108.40.206
;; global options: +cmd
;; Got answer:
;; ->>HEADER< <- opcode: QUERY, status: NXDOMAIN, id: 49764 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;220.127.116.11. IN A ;; AUTHORITY SECTION: . 10756 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2011111001 1800 900 604800 86400 ;; Query time: 63 msec ;; SERVER: 192.168.1.254#53(192.168.1.254) ;; WHEN: Fri Nov 11 13:48:49 2011 ;; MSG SIZE rcvd: 105
So only time will tell if I got it right...
© 2011, Scott Evans.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.