Nonprofit Politics Random Ramblings

Baiduspider China

00:00:00 baiduspider ~ China 01:18:27pm 01:18:27pm No
Last URL: /2010/page/13/
User Agent: Mozilla/5.0 (compatible; Baiduspider/2.0; +
Host: (n/a) n/a

So the above is what has been scraping my site for the last 3 weeks continuously! So I’ve resorted to banning the IP range & made several entries in my robots.txt file (although I don’t believe that this bot is acting in any compliance to such)

Although I’m not overly concerned for its presence, however the spam comment hit rate has skyrocketed since this has been interrogating me. China with it’s known history of human rights abuse appears to be scraping the internet, but what is being done with all this information? (not that I’d say there’s anything of any importance on this site!)

I admit that blocking such traffic goes against the grain of how I feel about internet freedom,but when a spider is crawling your site without disclosing a reason (and as I said my spam hit has increased) then I’m not going to allow such traffic as it has the potential to effect other services.

[scott@archbang-netbook ~]$ whois
% [ node-5]
% Whois data copyright terms

inetnum: -
netname: Baidu
descr: Beijing Baidu Netcom Science and Technology Co., Ltd.
descr: Baidu Plaza, No.10, Shangdi 10th street,Haidian District Beijing,100080
country: CN
admin-c: WN141-AP
tech-c: JC2179-AP
mnt-lower: MAINT-CNNIC-AP
mnt-routes: MAINT-CNNIC-AP
changed: 20090715
source: APNIC

person: Nan Wang
address: Baidu Plaza, No.10, Shangdi 10th street,Haidian District Beijing,100080
country: CN
phone: +8610-59927164
fax-no: +8610-62684273
nic-hdl: WN141-AP
changed: 20100322
source: APNIC

person: Jacky Chang
nic-hdl: JC2179-AP
address: 10th Floor No.6 2nd North Street Haidian District Beijing,100080
country: CN
phone: +8610-82602288-7280
fax-no: +8610-62684273
changed: 20071227
source: APNIC

Now I’m no expert when it comes to APNIC requirements, but just take a look at some of the fields… I have my doubts that this is even a legit compay. 😐

[scott@archbang-netbook ~]$ dig

; < <>> DiG 9.8.1 < <>>
;; global options: +cmd
;; Got answer:
;; ->>HEADER< <- opcode: QUERY, status: NXDOMAIN, id: 49764 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ; IN A ;; AUTHORITY SECTION: . 10756 IN SOA 2011111001 1800 900 604800 86400 ;; Query time: 63 msec ;; SERVER: ;; WHEN: Fri Nov 11 13:48:49 2011 ;; MSG SIZE rcvd: 105

So only time will tell if I got it right...