Pantheon Community

Tips for making site traffic stats more accurate

Thanks @sparklingrobots! I feel as though in the meantime the automatic plan “right-sizing” should be suspended if there hasn’t been a consensus reached by pantheon. We are getting closer to the end of this month and a lot of sites will be automatically bumped up without a thorough explanation.

6 Likes

Add me to the list of people who is frustrated by not being able to clearly explain to clients why Pantheon is reporting a magnitude greater level of traffic than anywhere else – and wants to bump them to the next plan level, which can be a whopping 70%-100% increase in price. When I asked support for help in figuring out why Pantheon’s metrics show a huge spike in Dec and Jan (even though GA does not), it took 28 days and multiple requests for them to get back to me – with an analysis of just one day in Feb.

If that one day is indicative, then maybe the problem is that bots are continually trying to access different URLs with “wp-login.php” in them – even though it’s a Drupal site. Isn’t there a simple rule that Pantheon can put in place to block these sorts of requests on Drupal sites?

It’s embarrassing because the client wants to know what’s going on and is concerned about a 70% increase in hosting costs they didn’t budget for, and all I can tell them is I’m going to have to spend billable hours playing whack-a-mole with bots using an incomplete set of logs.

One of the original benefits of Pantheon – which I touted to clients when convincing them to migrate over from other hosting options – was being able to focus on building a great Drupal site rather than spending hours on low-level sysadmin tasks like this.

I already had one client switch to a Pantheon competitor after the new traffic rules, and now this client is asking the same question. In both cases, I brought these clients to Pantheon years ago.

I really hope you can take a hard look at whether this is the right way to bill for hosting, whether the price jumps are justified, whether your support staff has the tools they need to help when questions about traffic come up, and whether you’re providing developers with tools to make their jobs easier, not harder.

10 Likes

Thanks all for the good contributions to this topic. I just started diving into this last night based on the automated “Dear Agency” email subject: Notification: Visit limit exceeded for a site supported by your agency … I’ve been carefully reading this thread and not coming out of it with a good feeling.

Something doesn’t feel right that the sites I was informed about have not seen increases in more common measurements of Site Traffic and yet there is a disproportionate increase with the Pantheon Metrics for Site Traffic.

The pricing comparison between plans shows the basic and performance small plans at the same levels for the Pantheon Metrics. If I am an existing Pantheon customer or a potential customer evaluating the platform it certainly doesn’t seem clear that the Pantheon Metrics will be different than more common measurements in use. I am sure it’s problematic when customer have questions about this after the fact.

It is clear that the level of effort to monitor, properly evaluate or remediate negative factors on a technical level is not trivial.

It’s been stated earlier in this thread that Pantheon is working on providing more actionable reporting information and tools. In the meantime is anyone aware of any external tools which could provide this information without adverse impacts?

I really have appreciated many advantages of the Pantheon Platform, particularly that I generally don’t need to worry about the sysadmin aspects of managing servers. I also don’t think the cost benefit advantage of the platform is out of line.

However, a lot of the issues brought up in this thread make it clear there is is a need to make some corrections to remove the ambiguity with these metrics.

3 Likes

I know I don’t have a vote, but I would second that.

Or at the least until we can be provided better tools to equate the Pantheon Metric results with some other common metrics.

1 Like

@amit One note about your comment with regard to bots continually trying to access different URLs with wp-login.php in them on your Drupal site…

According to @kyletaylored’s comment from Dec. 18, 2019, those requests should not be counting against your Pantheon metrics, since those would be 404 responses:

We don’t count redirects or errors (301/302, 4xx, 5xx).

@gravelpot They counted because they were hitting URLs for Views that won’t return a 404 because of the way Views works. For example, one of the site’s Views is for a blog at /blog and, as is typical, has category pages at /blog/{category}. So all the attempts by bots to hit /blog/wp-login.php weren’t returning 404s.

I’ve now added code to settings.php to look for wp-login.php in any URL and return a 404. But we haven’t seen a big drop in metrics, so we are shooting in the dark.

In the meantime, the client’s site was automatically bumped from Performance Small to Performance Large at the end of Feb, despite Pantheon support having told me it would be Performance Medium and having taken a full month to get back to us with a partial analysis. I asked for an escalation & audit on Fri, but haven’t received any acknowledgement.

3 Likes

Well… this is horrifying. What a great example of how hard it is to catch this moving target.

If this is the case, it makes me think that it might be a good best practice for anyone running Drupal on Pantheon to be sure they have enabled fast_404 and added wp-login.php to the list of patterns that function tries to match.

1 Like

Crazy, but can confirm I see that on my views too. A little Googling landed me on the views404 module. Going to take that one apart and see what they are doing to address it!

General note: If you have concerns about a specific plan right-sizing, reply to the email you received and let us know what your concerns and questions are and we’ll take it from there.

^ I am doing this, and also opened a ticket.
My plan bumped from $125 to $450 which there is no way I can afford. I do not have the time to be scrambling to find a new host.
I use Pantheon all day at work and love it – we are a partner – but this is no good.
(this is a personal site of mine and apparently is getting hit by Chinese and other bots).

From that ticket below:

Our colleague worked on getting the data you requested . He ran two reports, January 23, 2020 and February 6, 2020. On both days, there was a steady amount of foreign bot traffic identified user agents (identify as older Chinese consumer smartphones with outdated versions of Android). These bots generate a lot of unique visitors because they come from a very wide range of IP addresses, bundled with 4 or more user agents. For January 23, these bots contributed to around 98% of the total page visits (unique visitors) recorded.

Unfortunately, we are limited to blocking bots at the platform or Global CDN level, as these can be legitimate user agents that are spoofed, but what makes identifiable is by the volume of requests and IP range in a short amount of time.

The best course of action long-term is to implement a WAF. If you have any troubles finding the right WAF for you, you can always reach out to your account manager or the support team about our Advanced CDN product that can be used to filter these types of requests (GeoIP blocking, whitelist / blacklist IP ranges, etc) prior to reaching the Global CDN.

Alternatively, you can always block these requests at the application level, at the cost of a performance trade-off until a WAF is put in place. If you block IP addresses at the application level, it will break the CDN level caching, requiring Drupal to respond to all requests – which could cause performance during a flood of requests. But any error codes returned by Drupal (400/500) would not be counted in your metrics tab.

2 Likes

I received similar information as to why my site was receiving too much traffic. Its reassuring that someone else is having this same issue but bad at the same time. I looked into using the advanced CDN and that was far too much money for a personal site. I cant remember exactly what they priced it out as but I know it was in the thousands. Sooo that solution was out for me and I’m guessing for you as well. Another option is Cloudflare but you still have to pay money on top of your plan to get that working. My question for pantheon is, if the IP’s are coming from malicious sources then why are they not stopped right then and there. We cant control or monitor these issues until its too late.

1 Like

I would not mind doing the Cloudflare route, if the $20 plan will do what is needed, and as long as there are easy to follow instructions on how do it (these are not that easy): https://pantheon.io/docs/cloudflare

But yes, I am not sure why:

  1. Pantheon includes bot traffic in the pricing model.

  2. Pantheon can’t block the bots.

Probably there is a good reason for #2, not sure there is one for #1. IMHO they should eat the bot traffic and include it in the overall pricing. Raise everything 10% to do that if necessary.

HUGE fan of Pantheon and it’s corporate culture that has been so great for devs like me and customers in general.

Companies get big, then sometimes the “suits” take over and wreck everything. I hope that is not happening here.

2 Likes

It’s also worth considering that on a more traditional platform if your site is targeted for a high volume of traffic you might have some options on scaling the infrastructure up to cope with the bump, or you might have a way of blocking that traffic to reduce the spike, or worst case, the site is inaccessible for a day or three. Other hosting companies don’t just come back and say “sorry, your site’s traffic has dramatically increased without you being aware of it, you now have to pay us 3x - 10x more, there’s no way for you to be aware that such a traffic spike is happening, there’s no way to know when the traffic drops back allowing you could go back to previous rates, and there’s no option around it.” It’s just a really unfortunate way of doing business.

3 Likes

Interesting. Those outdated-Android-using Chinese bots were the same ones driving up visits on one of my sites that was flagged by Pantheon for overages.

1 Like

Has anyone tried using http:bl to block traffic using Project Honey Pot? https://www.drupal.org/project/httpbl

@DamienMcKenna To the point raised by one of the Pantheon support folks that was quoted earlier, wouldn’t this require disabling caching on your site?

Without disabling caching, the GlobalCDN (Fastly) layer wouldn’t be aware of the blacklist, and therefore wouldn’t block that traffic, but pages served from GlobalCDN still count towards Pantheon’s metrics.

Following up to my March 5, 8:21 AM comment:

I have not yet heard back from Pantheon. I will ping them again today

I am thinking of trying Cloudflare’s Pro Plan ($20/month) . I chatted with them this morning, explained the situation, and pointed them to this discussion.

Person I chatted with was quite knowledgable. He/she said:

there is two option

  • you can get the pro plan ($20) and see if the WAF can help eliminate this.
  • in the case this does not work you can add Rate limiting, if you can create a rule that stops the exact behaviour of those specific Bots, you are sorted

and also:

With the WAF you will be able to do this then
But you will have to create your own rule to block those user agents for instance
You can do this on the pro plan
nonetheless, this is for you to do it, if you are looking for the help of a Solution engineer we have the Enterprise plan for that

If the $20 plan can do this, it’s more than worth it. Not 100% clear how difficult it will be to create the blocking rules, but will see how it goes I guess. Seems like any rule I would do on my Drupal site on Pantheon would disable caching so that’s no good.

This doc confuses me a bit: https://pantheon.io/docs/cloudflare …I guess I would do Option 1?

But anyhow, in a December 19 comment above , Josh said this:

CloudFlare isn’t necessarily a good tool to reduce your usage of Pantheon. I’ve seen them actually increase the amount of pages served from Pantheon due to pre-fetching.

So not sure what the answer is, if that ^ is correct.

1 Like

@Rick, I could be wrong, but I don’t think that option 1 on that doc would buy you anything in terms of the WAF protection you’re looking for. That is literally only using the DNS functionality of Cloudflare.

This configuration routes traffic to Pantheon’s Global CDN exclusively. Unless you’re paying for advanced Cloudflare features or if you have custom configurations (e.g. many page rules) you’d like to keep, turn off Cloudflare’s CDN so that only DNS hosting services are used

1 Like

Ah OK, thanks. I will go back and ask Cloudflare about that.

We run Cloudflare in front of one of our Pantheon-hosted sites and use option #2 with no issues, other than the fact that you can’t clear the Cloudflare cache from your Pantheon site dashboard, the way you can with Global CDN. You may want to reduce your cache TTL if content timeliness is an issue for your site.

1 Like

@gravelpot: Yes, that is a consideration.

Ultimately this is something Pantheon needs to fix at the platform level, or the community need to come to terms with the fact that Pantheon is no longer a good hosting solution for sites with small budgets.

4 Likes