How AI Crawlers and Bad Bots Are Draining Your WordPress Site

The Bot Traffic Problem Nobody Talks About
Most WordPress site owners focus on human visitors. They track sessions, optimize conversion funnels, and obsess over bounce rates. Meanwhile, a significant portion of their server resources is being consumed by traffic that is not human at all.
Estimates from security researchers consistently put automated bot traffic at between 40 and 50 percent of all web traffic. For WordPress sites specifically, the number tends to be higher because WordPress is the most widely deployed CMS on the internet, which makes it the most targeted.
Not all of this traffic is harmful. Search engine crawlers from Google, Bing, and others are bots you want visiting your site. But mixed in with the legitimate crawlers is a long tail of traffic that ranges from wasteful to actively dangerous.
What Is Actually Hitting Your Site
Bot traffic broadly falls into six categories, and understanding the difference between them is the first step toward managing them effectively.
Search engine crawlers are the bots you want. Googlebot, Bingbot, and similar crawlers index your content so it appears in search results. They are generally well-behaved, respect your robots.txt file, and identify themselves accurately.
AI training scrapers are a newer and increasingly significant category. Bots operated by AI companies crawl the web to collect training data for large language models. Some of these, like GPTBot and Common Crawl, are transparent about their identity. Others are not. The traffic volume from AI scrapers has grown substantially since 2023 and shows no sign of slowing.
Vulnerability scanners probe your site for known weaknesses. They look for outdated plugin versions, exposed admin paths, default credentials, and misconfigured files. Most of this traffic is automated reconnaissance, and while it does not directly harm your site, it is the precursor to targeted attacks.
Content scrapers copy your pages, posts, and product listings for republication elsewhere. This affects your SEO by creating duplicate content, and it can undermine your business if competitors are scraping your pricing or product data.
Spam bots target your comment forms, contact forms, and registration pages. Even with CAPTCHA in place, many spam bots are sophisticated enough to bypass basic protections.
Fake trusted bots are perhaps the most insidious category. These bots claim to be Googlebot or another legitimate crawler in their user agent string, but they are not. A real Googlebot will have a matching reverse DNS record that resolves back to google.com. A fake one will not.
Why WordPress Sites Are Particularly Exposed
WordPress's popularity is both its strength and its vulnerability. Because so many sites run WordPress, attackers and scrapers build tooling specifically for it. Common attack paths include the XML-RPC endpoint, the REST API, the wp-login.php file, and known plugin vulnerabilities.
The plugin ecosystem compounds the problem. A site running 20 plugins has 20 potential attack surfaces, and vulnerability scanners know exactly which versions of which plugins have known exploits. If you are running an outdated version of a popular plugin, bots will find it.
Server resource consumption is another concern that often goes unnoticed. Each bot request consumes CPU, memory, and bandwidth. On shared hosting, this can directly affect page load times for real visitors. On managed hosting, it can push you into higher usage tiers.
The Limits of robots.txt
Many site owners believe that adding rules to their robots.txt file is sufficient to stop unwanted crawlers. It is not.
robots.txt is a courtesy protocol. Legitimate, well-behaved bots like Googlebot respect it. Malicious bots, vulnerability scanners, and scrapers do not. They often specifically target paths that robots.txt marks as disallowed, because those paths are more likely to contain sensitive information.
robots.txt also does nothing to stop fake trusted bots. A bot claiming to be Googlebot can read your robots.txt and follow its rules while still being a malicious actor.
Effective bot management requires enforcement at the server level, not just a text file.
DNS Verification: The Key to Identifying Fake Bots
The most reliable way to verify whether a bot claiming to be Googlebot is actually Googlebot is through DNS reverse-lookup verification.
The process works like this: when a request arrives claiming to be Googlebot, you take the IP address of the request and perform a reverse DNS lookup. If the result resolves to a hostname ending in googlebot.com or google.com, and a forward lookup of that hostname resolves back to the original IP, the bot is legitimate. If either check fails, the bot is fake.
Google publishes this verification method in its own documentation. Bing, Apple, and other major search engines use similar approaches. A bot that fails DNS verification but claims to be a trusted crawler is, by definition, impersonating one.
This is the kind of multi-layer detection that separates effective bot management from simple user agent filtering.
Monitoring: Knowing What Is Happening Before You Block It
Blocking without monitoring is operating blind. Before you configure any blocking rules, you need visibility into what is actually hitting your site.
A good bot monitoring setup should show you:
- Which bots are visiting, identified by user agent and category
- The geographic origin of each request
- Whether each bot passed or failed DNS verification
- Which pages are being targeted most frequently
- Trends over time so you can spot new patterns
This data serves two purposes. First, it helps you make informed decisions about what to block. Second, it gives you a record of what was stopped and why, which is useful if you ever need to investigate an incident or verify that a legitimate crawler is not being incorrectly blocked.
Introducing Sera Bot Blocker
Sera Bot Blocker is a WordPress plugin built around the monitoring and blocking workflow described above. It gives you complete visibility into automated traffic hitting your site and the tools to act on what you see.
The Live Traffic Monitor
The centerpiece of Sera Bot Blocker is a real-time traffic monitor with an interactive geo-map. As bot requests arrive, they appear on the map with their origin country, IP address, user agent, and detection result. Every event is logged with the reason it was blocked or allowed, so you always have a complete record.
The monitor is designed to be useful at a glance. Threat summary cards show you totals for blocked requests, fake bot detections, and country blocks in the current period. The log below provides the detail you need to investigate specific events.
The Pro Registry
Sera Bot Blocker ships with a registry of 647 known bots organized into six categories: search engine crawlers, AI scrapers, vulnerability scanners, content scrapers, spam bots, and fake trusted bots.
Detection is multi-layered. User agent matching identifies known bots by their declared identity. DNS reverse-lookup verification checks whether bots claiming to be trusted crawlers actually are. Behavioral heuristics catch bots that attempt to evade signature-based detection.
The registry covers the major AI scrapers that have emerged since 2023, including bots operated by AI training organizations that have been identified through research and community reporting.
Country-Level Blocking
If you operate a site that serves a specific geographic market, country-level blocking lets you restrict traffic from regions you do not serve. This is particularly useful for sites that see high volumes of scanner traffic from specific countries.
Country blocks apply to all traffic from the selected region, not just known bots. Use this feature selectively and only for regions where you have no legitimate audience.
Fake Bot Auto-Escalation
When a bot fails DNS verification, Sera Bot Blocker automatically escalates it from a temporary block to a permanent one. A bot that claims to be Googlebot but cannot prove it is not just suspicious - it is actively impersonating a trusted crawler. Permanent escalation is the appropriate response.
Weekly Email Digest
The weekly digest summarizes blocked traffic from the past seven days: total blocks, top blocked bots, country distribution, and any new fake bot detections. It keeps you informed without requiring you to check the dashboard daily.
Log Export
The full event log can be exported as CSV or JSON. This is useful for feeding bot traffic data into your own analytics, generating reports, or integrating with external security tooling.
Custom Block Page
The 403 response served to blocked bots is fully customizable. You control the message, the layout, and the HTTP response code. For most use cases, a minimal response is preferable - the less information you give a blocked bot, the better.
Getting Started
Sera Bot Blocker is available at sera.guru/products/sera-bot-blocker [blocked]. It is a standalone plugin and does not require Sera Core, though it integrates with the Sera ecosystem if you use it.
After installation, spend a few days in monitoring mode before configuring any blocks. Let the live monitor show you what is actually hitting your site. The data you collect in that period will inform much better blocking decisions than any default configuration.
Start with the most clearly harmful categories: vulnerability scanners, fake trusted bots, and spam bots. These categories have no legitimate use case on your site. AI scrapers and content scrapers require more consideration - some site owners want to allow AI crawlers that respect their terms of service, while others prefer to block all of them.
Country blocking should be the last tool you reach for, and only when the data clearly supports it.
Conclusion
Bot traffic is not going away. AI scrapers are growing in volume, vulnerability scanners are becoming more sophisticated, and fake trusted bots continue to be a persistent problem. The sites that manage this traffic effectively are the ones that have visibility into what is happening and the tools to act on it.
Sera Bot Blocker provides both. The live monitor gives you the data. The Pro registry, DNS verification, and blocking controls give you the response.
View Sera Bot Blocker β [blocked]
The team behind the Sera WordPress ecosystem β building AI-powered tools for performance, security, SEO, and content creation.


