Pantheon Community

Tips for making site traffic stats more accurate

I know it’s pretty common for Pantheon’s site traffic stats to vary from other analytics that measure actual visitors. But our Pantheon stats are more than 4 times what Google says. This page pantheon.io/docs/traffic-limits says

Analytics suites (e.g. Google Analytics) are measuring fundamentally different things vs Pantheon’s request log. While analytics suites focus on measuring visits, our request log more comprehensively measures traffic.
We track every single request to the platform, whereas analytics tools will typically only track complete “pageviews” where an HTML page including a tracking snippit is completely loaded by a browser and can fire off a subsequent request to the analytics platform.

So I’m wondering if anyone has suggestions for where we can look to identify and hopefully stop any services that may be hitting the site and driving the apparent traffic up. This isn’t just an annoyance, the overcounting could potentially cost us hundreds of dollars a month, and we’re a nonprofit so we’re not making money off this traffic (whether it’s real or not).

Here are the monthly stats from the past 12 months from Pantheon and Google Analytics:

1 Like

Definitely interested in this thread, as this is something we’ve been dealing with, too. As I understand it, a lot of the traffic we’re not seeing is bot-related. We put CloudFlare on top of our busiest site to try to mitigate some of the numbers, but I haven’t seen our Pantheon numbers drop. :frowning:

The frustrating bit is that those numbers are what Pantheon uses for determining the right tier of hosting package and we’re ending up paying for traffic we can’t really control or want to support.

1 Like

My previous experience with an issue where the Pantheon metrics were wildly different than GA ended up being an uptime checker that someone had enabled and pointed at a site with a very frequent check. I ended up finding it through the nginx access logs when some of the requests were logged because the cache expired.

1 Like

Thank you, that’s a great suggestion! We’ve been using Pindgom for years, since before we were on Pantheon. That could well be it. I’ll report back.

I use StatusCake on all of my Pantheon customers, but at 5 minute intervals. The rogue check was running every 2 minutes on top of my valid checks. That added up quick!

Well it turns out Pingdom isn’t active on our site anymore, so it’s not that. :thinking:

Any other suggestions? Or tips for how to get to our log files to look at the history?

You can grab your nginx-access.log, but it may not be helpful. If the requests aren’t making it past Fastly they won’t be there and also any container rotation that site goes through will nix those logs.

I’m also trying to gather data on what causes some client sites to have Pantheon metrics up to three times the metrics reported by other tools. The nginx-access.log was not helpful. Any other suggestions?

General question: If Pantheon does not make raw data available, how do they recon we’re supposed to conduct traffic audits?

Here’s another general question: The page on traffic counts says:

Pantheon excludes automated traffic from legitimate crawlers and bots that would otherwise count towards your website’s total traffic. We do this by examining the user-agent of traffic, as well as the source IP address.

Source: https://pantheon.io/docs/traffic-limits#what-about-bots

Do we know what those legitimate crawlers and bots are? That of course implies that there’s an answer to blocking illegitimate bots and crawlers.