How To Block Bad Bots From Abusing Your Website
Internet and bots are like Romeo and Juliet. They can't be tear apart. Although there are millions of netizens surfing the internet, they are still not as many compared to bots that exist and crawl all over the internet.
As a matter of fact, most of the traffic on the internet currently is generated by bots. Not only that, massive hit from bots will also consume your bandwidth and inevitably slow down your website.
And in this article, I will share with you information about bots and how to identify bad bots. I also will share with you how to block bad bots from abusing your website. You might also want to read best useful tips for your WordPress.
What Is A Bot
Bots or also known as web robots, spiders or crawlers is a software that runs automated tasks or scripts over the internet. Bots typically utilized to perform simple and repetitive tasks at a much higher rate than a human possibly can. It often comes in the form of malware, although there are also good bots. Here are some of the examples of what bots can do:
- Search bot is used by a search engine such as Google to crawl websites for indexing purpose.
- A botnet can be used for denial-of-service (D-dos) attacks or virus spreading.
- Social bots are used by the social media companies to automated posts.
- Chatbots are artificial-intelligence powered applications which are designed to mimic conversations with real people.
- There are also bots that crawl websites for scrapping and contents stealing.
How To Identify Bots
Before you can start blocking bots, there are either 1 of 2 things you need to know. It is the IP address of where these bots come from or their User Agent String. This info can typically found from Raw Access Logs from within your cPanel.
First, open up your cPanel and navigate to your Raw Access Logs tool which usually located in Visitor Stats section. Next, open this tool and download the log you wish to. Then, uncompress it using any file achiever such as WinRar or 7-zip if needed, and open it with any plain text editor.
From this log, you can get a lot of information from your visitors including bots such as IP address and User Agent string. IP address v4 is a series of 4 groups of numbers separated by dots. While IP address v6 is a series of 4 groups of alphanumerics separated by colons.
The User Agent String, on the other hand, is just simply the name that the program accessing your site goes by. You don't need to know the whole User Agent String. You only need some part of the user agent string that is unique to that particular bot which distinguishes it from other bots. For example, Google search engine bot goes by "Googlebot/2.1 (+http://www.google.com/bot.html)".
3 Easy Ways To Block Bad Bots
There are 3 easy ways to block bad bots from abusing your website. First is by using Robots.txt. Another two ways are by adding certain scripts in your .htaccess file.
Robots.txt is 1 of the conventional way to block bad bots from abusing your website. First, open your cPanel and navigate to your File Manager. Next, open your File Manager and select your preferred domain. Then locate your Robots.txt in the root of your website document. If there are no Robots.txt, simply create a new file and name it to Robots.txt.
If you are using WordPress, you may have similar code like below already written on your Robots.txt. This code basically means every bot is allowed to crawl every part of your website. But they are not allowed to crawl wp-admin directory with the exception of admin-ajax.php within the wp-admin directory.
The asterisk (*) symbol is used to declare that this command is meant for every single bot. If you want to specify it only for certain bot, simply replace asterisk symbol with the bot's User Agent String.
Let's say, for example, you want to block AhrefsBot completely from your website, here is what you will enter in your Robots.txt. Please keep in mind that some bots do not honor Robots.txt.
Using the .htaccess File
The second way to block bad bots from abusing your website is by using the .htaccess file. I've been using this method for several years on multiple websites which effectively decrease bots up to 85%.
To access the .htaccess file, you need to open your File Manager and select your preferred domain. Please keep in mind that .htaccess file is a hidden file. So you might need to check the "Show Hidden Files (dotfiles)" box while opening your File Manager.
Before you make any changes to your .htaccess file, please make a copy so you can revert back in any case. Once you open your .htaccess file, place the script below in that file and save it.
By using the script above, we will block all incoming bots to the website with the exception of 6 bots. It is Bingbot, Googlebot, MsnBot, MSR bot, Twitterbot, and Yandexbot. Feel free to add more bots as per your need.
Using IP Address
Beside from 2 methods stated above, there is another way to block bots which also by using htaccess. While second method using User Agent String, this method relies on IP address to block the bots. Not only you can block single IP address, you may as well blocking the entire range of IP address using this method. Please take a look at the code below.
In the above code, we will block 2 IP addresses which are 22.214.171.124 and 126.96.36.199 from accessing our website. Apart from that, we will also block IP address of range 188.8.131.52 to 184.108.40.206 by using CIDR (Classless Inter-Domain Routing) notation. Take a look at the 6th line.
Please keep in mind that by blocking a range of IP address, you not only will block bots. But you may as well block anything coming from this IP address range including human visitors.
Generally speaking, there are many kinds of bots scatter around the internet. From good bots which help various tasks to bad bots with malicious intent. I hope this article will help you to block bad bots from abusing your website.
As a final reminder, don't forget to check your Raw Access Log and Awstats from time to time. If you have ever had any experience dealing with bots, please kindly share it with us in the comment section.
He is a web developer, a programmer, and a computer technician. He obsessed with coding and enjoy learning new things. In his spare time, he likes to play online games, musical instruments or watching anime and movies.