Includes free SSL certificate, unlimited bandwidth & fully optimised for WordPress.
Fully PCI-DSS compliant hosting, backups every 4 hours & free DDoS protection.
Unlimited bandwidth, WHM & cPanel control panel & fully white label.
Great value domain names available.
Create a professional looking website in minutes. Over 190 templates.
Speed up the delivery of your site.
Beautifully Simple. Massively Configurable. Scales from 1 VPS cloud, all the way up to a load balanced cluster.
Fully managed, high-spec dedicated servers deployed on our gigabit network, backed by our superior 24/7 UK support.
For our clients only. Open a ticket you can track and reply to.
Instantly talk to one of our support team, we're here to help.
Talk to us: 020 8050 1337 (Monday–Friday 9am – 8pm)
Business plans have access to 24/7 emergency phone support.
Search our extensive archive of guides to help you with your hosting account.
or go directly to our support site:
Search engines use technology known as spiders to search the web (nice, huh?). A spider is an agent (also called a bot - short for robot) that will connect to your website and download a copy of all of your pages (or try to) in order to populate the search engine it is working for. However, spiders are supposed to obey certain rules, and they are definitely not supposed to thrash your website to the point that is causes a denial of service, or uses up all of your bandwidth.
By adding special instructions to a file called .htaccess (the full stop in front of it is intentional) you can instruct your web server to deny requests from specific spiders.
Solution 1 - ban by IP address
If a file does not already exist at public_html/.htaccess you can create an empty one.
Add this to the top of the file, replacing x.x.x.x with the IP address of the bad spider bot.
allow from all
deny from x.x.x.x
Very often bots use a range of IP addresses. For example Baiduspider, a Chinese spider which causes many of our customers to experience problems, appears to use a range of addresses from 18.104.22.168 to 22.214.171.124 to spider sites in the UK. In order to completely block this range, you can add:
allow from all
deny from 126.96.36.199/24
deny from 188.8.131.52/24
If you only want to apply these rules to a particular directory path within your website, then you can add
allow from all
deny from 184.108.40.206
This would block 220.127.116.11 from being able to access http://yourwebsite.com/documents/notforbots
You can read more about the apache 2.2 mod_access directives here.
Solution 2 - ban by User Agent
If you know how the spider is identifying itself when you can block requests on the basis of the User-Agent HTTP request header.
So, how do you find out what User-Agent is hitting your site so hard? If you look in your raw apache logs in the Logs section of cPanel.
Then you can download the logs that have been collected so far today, by clicking on the domain in question. Once you have downloaded, and uncompressed the .gz file you will have to load the file up in a text editor and do some detective work. Some people use Excel or OpenOffice or other spreadsheet software to parse the fields in the file. However, this is an advanced article so we're going to assume you know how to do that!
18.104.22.168 - - [22/Jul/2013:20:07:48 +0100] "GET /special-events/action:month/cat_ids:9/tag_ids:37,26/ HTTP/1.0" 500 7309 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
Each line entry will look a little like the above. The last quote delimited string is the User-Agent header:
"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
The bit we are interested in is Baiduspider/2.0.
We're not really interested in which version of Baiduspider is hitting us, so we're just going to block everything that matches Baiduspider in the User-Agent header. To do this, we would add this to the top of our .htaccess file
BrowserMatchNoCase baiduspider banned
Deny from env=banned
This would block all requests from the Baiduspider bot, as long as it issued it's tell take User-Agent header.
You can read more about the apache mod_setenvif module directives here.