Pantheon Community

Tips for making site traffic stats more accurate

@anne : my apologies–this is the script I was referring to: It’s not the same as the one you found, but it does get existing logs from all application containers.

Re: your site only showing 5 pages served in 60 days…do you have folks logging in regularly to add/edit content? Those authenticated users would likely see uncached content (perhaps unless you have Redis set up, but I’d be surprised to see that so low). If you’re serving entirely anonymous traffic, then yes, we do see the CDN serving huge numbers of cached pages that never hit Pantheon’s appservers. It’s part of what keeps a site up during heavy traffic spikes, even on our smallest plans.

@sparklingrobots, I’ve followed the instruction on the #automate-downloading-logs link, which is essentially the same kind of script at the github project that I linked to earlier. Same results. (Although today, we’re up to 7 lines.)

So far today, I’ve spent more than an hour trying to get Goaccess running. < sigh > Still doesn’t work. I’m putting effort into that because most lines in the log don’t make sense to me. Here’s an example: - - [23/Nov/2019:15:31:18 +0000] “\x04\x01\x00\x19\xBC}I\x1D\x00” 400 166 “-” “-” 0.102 “-”

(Aside: I’ve lost track of how much time I’ve spent trying to follow instructions provided to download all logs, then understand the data that is there, but I’d estimate 8-12 hours for three sites. That’s non-billable time. Frustrating doesn’t even begin to describe it.)

You ask: Re: your site only showing 5 pages served in 60 days…do you have folks logging in regularly to add/edit content?

The answer is no. I’m the only user account. So the site is serving virtually entirely anonymous traffic. If that’s so, and the CDN cache is serving virtually all pages, then how am I supposed to figure out how to identify and block the bots (or whatever they might be) that I’d like to keep out?

Thanks so much for hearing us out. We do appreciate it!

1 Like

Thank you so much for trying to find a solution to this, @sparklingrobots!

I feel like there’s a fundamental problem with Pantheon charging based on traffic but not giving us any way to see the traffic you measured. It can’t just be a black box where we’re supposed to trust you to decide how much to charge us, right? That just doesn’t seem like a good way to do business.

Without access to this we have no way of knowing if someone is hitting our site with malicious scans or something else that might not only be inflating our metrics but also possibly endangering the rest of your customers as well as us.


Your point about “no way of knowing” is excellent. One concern is that perhaps there’s something in the code that’s being exploited that hard to discover without access to complete logs.


In my previous go-around with this a suggestion was that I turn off caching for the entire site temporarily to allow me to see more in my nginx-access.log. Take with a grain of salt.

1 Like

Just wanted to provide a short update: I’ve included all of the folks commenting here on an internal feature request to allow access to the Global CDN logs.

While I can’t say when or if this will come to pass, thanks go to each of you for your honest feedback. I hear your frustration. <3

1 Like

I’d love to find a solution to this. Not accessing the logs. I can do that via FTP. I need to find a solution to Pantheon’s site traffic metrics being radically higher than Google Analytics. Pantheon has already increased our monthly charge, and the new plan is not tenable. We will have to move to another hosting platform if we can’t figure out what is going on. Any advice would be appreciated, because I’d hate to leave Pantheon, but we cannot afford what we are currently being charged. Help!


In lieu of some way to see what Pantheon’s metrics are based on (I agree with @johndubo that log files ain’t it, even if we could get them more easily), it seems like there should at least be a way to appeal when those stats are having a financial impact on us. And in the case of an appeal, I think it should be Pantheon’s responsibility to investigate for potential errors in their data and signs of malicious activity such as crawlers that we can’t do anything about.

I just don’t see how we can be charged for bandwidth that we aren’t actually using in any way that we are aware of or in control of.


Totally agreed that we customers need more information from the traffic metrics, especially since that’s what the pricing is based on. We should be able to get a detailed report of the traffic breakdown for any time period, including all URLs and traffic counts for those URL requests.


I concur 100% with this request.


Is there any movement on this? I installed Cloudflare in front of our site and I don’t think it is going to make much difference. Pantheon’s CDN metrics are about 10x what New Relic and Google Analytics is giving.

There has to be a way for me to figure out why that is. Has to. I have already called Acquia to move over to them because I cannot afford to have to go to a Performance XL plan for my little site that, according to Google Analytics, had 60,000 pageviews in the month of November. Pantheon logged 1.49 million. What the hell? What. The. Hell?!?!? I am so frustrated.


I’m also very interested in getting better visibility into the CDN metrics. The Pantheon metrics report multiples times what Google Analytics reports.


Hey Ruby - we’re definitely open to appeals, and where we can identify bots, crawlers, and status checkers we do exclude those from the metrics. We do this proactively, but also can’t catch everything, so having customers surface issues is definitely helpful.

However, there’s also just a lot of traffic that won’t show up in Google (see the docs for examples) which we do need to account for in the measurements, so it’s never going to be an exact match. Depending on the site and its usage pattern it can unfortunately differ by quite a lot.

I know that’s difficult when it impacts a budget. Over time we’ll be able to show more and more visibility in the metrics UI, but we do have to be consistent and fair with how our pricing works.

1 Like

CloudFlare isn’t necessarily a good tool to reduce your usage of Pantheon. I’ve seen them actually increase the amount of pages served from Pantheon due to pre-fetching.

I know it’s no fun to have a shocker traffic bill. It sounds pretty unusual to have 60k “pageviews” tracked in GA, but 1.5M “pages served” from Pantheon, but there could be a high volume of API calls, or clients that don’t report back to GA, or a number of other causes.

Unfortunately from our end we have to be consistent and fair with what we measure and how we charge. In the future we’ll be able to provide increasing detail and insight in the metrics area, but it will never be perfect, and it also cannot possibly match Google since we’re fundamentally measuring different things.

1 Like

Thanks Josh. This is a Drupal platform we are using. Could things like having, in the sidebar on every page, a mini calendar and a quicktab (module) with the most popular pages and most recent Disqus comments be driving up our number of “pages served”? I’ve wondered if that has more to do with this issue than any sort of malicious traffic or bots.

1 Like

If your site makes AJAX or other types of requests that will run up the traffic count.

Disqus probably not specifically since those requests will be to them (not Pantheon), but I’ve definitely seen cases where something “programatic” like this results in off-the-charts kind of numbers.

1 Like

Both of those blocks I mention use AJAX. Would this run up the numbers on Cloudflare, too? Because the numbers there are sky high as well.

Thanks for the help Josh. I really appreciate it.

1 Like

Yesterday we were notified that Pantheon is moving us from Performance Small to Large, increasing our monthly cost by 260% based on your secret metrics that we have no evidence to believe are accurate.

We will appeal and request an audit, but that kind of individual solution is not an effective way to address a problem that is clearly impacting many customers. Charging us based on mysterious black box statistics is an irresponsible business practice, and failing to offer transparency into what is behind these statistics is potentially a large security hole.

I’ve always been such a fan of Pantheon and have been referring people to you for years and years. I just can’t express how disappointed I am about this.


@ruby - the metrics are definitely accurate. Our pricing is based on the amount of traffic served by the platform. Every request to the platform is logged and this is the source of the numbers. This does not (and won’t ever) align with other data sources that attempt to measure views. It’s apples and oranges.

@johndubo - Yes, the AJAX requests are likely a big part of your numbers. As per the above, AJAX requests (as well as RSS, XML, JSON, etc) are all going to show up in our traffic stats because they are requests we serve. However, they’re not going to be in any “pageview” metrics like Google Analytics.

Thanks for everyone’s feedback on this threat. I understand that this is difficult for some customers, and I especially empathize with folks who are operating under budget constraints, especially long-time customers for whom this is coming as a surprise. It’ll frankly be a lot easier for new customers who find out right away that they’re on the wrong plan.

Even though it’s difficult, our pricing structure has to be enforced or else it’s not really meaningful. Our goal is to do this consistently. Happy to take additional feedback on what you think would make the process more fair or equitable.

1 Like

Is there an ETA on when we’ll be able to see these metrics on a request by request basis? Does Fastly provide raw access logs?