Do You Know Your Web Crawler Bots?

Do You Know Your Web Crawler Bots?

Martin Starkie Online Security, Server Side & Web Technical

The Good, The Bad, And The Web Bots

If you think that human beings are the only ones controlling the web – think again. Bots make up more than half of all internet web traffic, performing tasks and running scripts at faster speeds than any human could ever muster. While many bots offer benefits, there are, unfortunately, just as many that do the opposite. This latter group are known as web crawler bots or, more colloquially, bad bots.

Both types of bots crawl webpages to source and index content. Getting the good bots to scuttle to your site can see your business surge up the rankings pages and increase customer traffic, but their troublesome twins need ousting. These bad bots suck up CPU power and slow your website speeds right down, creating puzzling database queries and sending confused users heading for the hills – or worse, to a rival site.

The solution? Learn to differentiate between the good and the bad. Google and Bing, for example, have their own bots, and given the enormous popularity of this pair of search engines, you don’t want to be pushing them away. What you really need to look for are the spam-shovelling bots.

Which are some of the most common web crawling bad bots out there? 

Some of the bad bots worth knowing about include:

  • Baidu

    Baidu is essentially the Chinese equivalent of Google, which is why it often roams freely and without obstruction. Unless you particularly want people on the other side of the world scouring your site, this bot tends to do more harm than good.

  • sosospider

    Another bot powered by a Chinese search engine, sosospider needs kicking out in order to ensure your website operates at full capacity.

  • ahrefsBot

    Hosted in Ukraine, ahrefsBot creates a huge waste of bandwidth when it comes creeping onto your website.

  • linkdexbot

    Many website operators have reported linkdexbot as consuming far too much CPU, causing head-scratchingly low speeds for users.

  • megaindex

    Link building site Mega Index have a bot that causes more trouble than it’s worth for most website hosters. To get the best out of your site, you’ll need to block it.

So what do I do if there’s a bad bot sniffing about?

So now you know your web crawler bots, you probably have a much better idea of whether you’re under threat. At CMS Live, we have many effective techniques for blocking these bad bots and source IP addresses, including hardware firewall, IPtables and htaccess – all of which are included in our managed web hosting service, with specialised solutions available for both Magento and WordPress sites. 3rd party services, such as Cloudflare, for example, are also useful for filtering traffic before it reaches our data centres, allowing us to comfortably maintain your site’s high speeds.

DO YOU KNOW YOUR WEB CRAWLER BOTS?

Even if you just want to check everything below the hood is a-ok, CMS Live can help.

The last thing you need is bad bots hampering your business, so let CMS Live take care of it for you. We have the technological know-how to put these web crawler bots offline and keep your website up to speed.

Get in touch today

Get in touch with our helpful team today on 01282 618210, and we’ll be delighted to take a look.

Share this Post