Hey everyone, the Sales Engineering Team at Pantheon has been working with several of our contract customers to help them understand what’s going on with their traffic patterns and we wanted to share some of our findings with you.
First, some clarity
Before we dig into types of requests that count towards your metrics, let’s first explain how we define a request, how requests are calculated, and what types of requests we count.
- There are two types of measurements, visits and pages served.
- A visit is a unique combination of IP address and User Agent within a 24-hour period. For example, if you are at home and visit your website from your laptop, and then again from your phone, those are counted as two unique visits. Alternative scenario, 10 users in a campus computer lab using the same browser, could register as 1 unique visit.
- Pages served are considered a single request against your site, which could be a standard HTML response, an API endpoint, or an RSS feed.
- We don’t count the common bots (or any that identify as crawlers) that regularly hit your site, considering bots like GoogleBot, Yahoo, Bing, SEMRush, etc. We also don’t count bots that identify as uptime monitors like Pingdom or New Relic.
- We don’t count static assets that are not generated by PHP. For example, icons, documents, or images stored in a theme folder or uploads (wp-content/uploads, sites/default/files, etc.) If the asset is dynamically generated, such as using Responsive Images in Drupal to render various images styles (the first time), then we do count that call.
- We don’t count redirects or errors (301/302, 4xx, 5xx).
Outside of these basic rules, the rest of the traffic is just part of your standard internet traffic, which we’ll dive into some of the differences below.
This topic has been discussed previously but warrants a mention here.
If you’ve set up an API endpoint or custom scripts on your site that other pages are calling, that’s going to be counted in your Pages Served. A few examples include:
- RSS feeds
- JSON feeds
- API endpoints
- PHP scripts
- Modules or plugins with embedded endpoints / direct scripts
- AJAX calls
A great example for modules or plugins that count against your pages served would be the Statistics module in Drupal core. Every node visit will additionally call
/core/modules/statistics/statistics.php, which will double the number of requests per view.
These pages will never show up in your Google Analytics reports unless you have done something special with the GA API and Virtual Pageviews.
Bots that are falsifying user agents are also going to count heavily to your Pages Served but not necessarily to your visits since they are commonly coming from a single IP and user agent. If you look into your logs, you may see some old versions of Chrome (as old as 31.x) requesting a single page from one or two IPs.
Common WordPress Patterns
WordPress sites have two common vectors that bad actors will always try out first.
The first is the XML-RPC page, WordPress introduced that a while back for offline content creation that would sync back to your site when you came online. Most of you aren’t using this and shutting it down would be a very good thing for the safety of your site. There are plugins out there that will do that for you, but we also offer protected paths on the platform for you to lock it down.
The second one is the login page. As a platform, we don’t have anything built in to stop your content owners and developers from logging in to your site but it does garner a lot of attention that can be less desirable. For our contract customers, we can help out with a customized CDN configuration that will white (or black) list IPs, regions, or even implement a full WAF implementation. Other techniques could include changing the URL of the login page, whitelisting the pages in PHP, and limiting the number of login attempts. You also want to consider enforcing strong passwords and multi-factor authentication.
Common Drupal Patterns
Drupal sites don’t have the same obvious pointy bits that WordPress sites do but we are commonly seeing huge spikes in traffic over search results. This was a common technique to cripple the database in the days before SOLR and Redis, but is rarely of concern in this day and age in terms of site stability. It is, however, probably still malicious and should be addressed. Mitigation is a lot harder in these instances and would probably rely on blacklisting suspicious IPs.
To a lesser degree, we do see some probing going on with the Drupal login pages so it’s worth mentioning, mitigation techniques would be the same as for WordPress.
Overall, there are many ways to identify, reduce, and deflect these requests to reduce any impact to your site, but these are still requests that are coming into your site - whether or not they’re being served a cached response from the Global CDN. The only way to reliably offset and reduce the number of malicious requests is through a site specific layer of protection, such as an additional CDN layer or WAF.