9 Comments

  1. baidu is a chinese search engine, much the same as Google. Therefore I would suggest that frequent spidering would be normal from them. I doubt that its related to the spam, unless spammers are using baidu results to find blogs to spam…

    • Hi Bruce, I only did a quick Google search on the name and there’s lots of info (that I did’t bother to read!) but even the changes I’ve made the bot is ignoring my robots.txt and unless I messed up the format in my .htaccess it’s getting trough that too 😐

  2. jpope

    I have been seeing the same spider crawling my site quite a bit lately as well. And looking through my Drupal log, I can see just what pages it’s trying to hit. *None* of the pages exist. :/ I’ve also noticed that when this ‘spider’ started hitting my site, my spam comments started increasing drastically as well. Search engine or not, I’m blocking these IP ranges as well.
    It wouldn’t bother me that much (Mollom is an **excellent** spam block) but, all of the pages that it’s trying to find (along with the login and register pages, which don’t have active links anywhere on my site) makes me somewhat nervous.

  3. jpope

    Also, as I’ve been digging into this a little more, when you run a whois on an official Baidu.com ip address, (220.181.111.86, 123.125.114.144, 220.181.111.85 are a few that the ‘host’ command pulls up) you get a completely different whois report than you do on the 180.76.5.* (I’m also getting hits from 180.76.6.*) ips. But, seeing that it’s not really a search engine bot, I doubt anything you put in robots.txt will help here. 😉

    • Yes it’s clearly a non compliant spider purely for the purpose of flooding sites with spam. My spam count since this started is now well above 250, now that doesn’t sound like a lot but I did a fresh install of WordPress back in September so my spam count was reset!

      I’ve noticed that this is still scouring my blog even after placing IP blocks for the ranges that it’s claiming to be from, so as a guess its most likely using ToR to hide behind.

      • jpope

        After a few days worth of keeping an eye on this, it seems that I’ve mostly stopped the “spider” from accessing my site. At least in the apache access log, all the spider gets are 403’s. A quick and easy command to see all of the ip’s that the spider is using:
        sudo grep “Baiduspider” /var/log/apache2/access.log | cut -d’ ‘ -f1,9 | sort | uniq -c | sort -nr
        Also, the amount of spam that I get seems to have dropped somewhat since blocking the ip ranges. :/
        As you said, “only time will tell”. 😉

        • Yes likewise so far so good but around 250 spam comments blocked is a testament to both Akismet and WP-reCapcha stopped them before they had a chance to flood this little blog of nothing! 😉

  4. kar

    Baidu seems Email harvester and try to steal other important data. In my webpage i always see it in rss file, where it can get mail id. But fortunately all mails are just dummy in there. I think it uses huge ip range. Like 220.* and 180* etc. When i try to block these ip range these ip range go wild and attack site with different ip range. I don,t understand how it bypass htaccess file.

    • Well just when I thought all was going well however I’m now being targeted by a new one based in North Carolina 🙁 if I am to block this IP range would take out valid visitors to my site!

Comments are closed.