Pantheon Community

Tips for making site traffic stats more accurate

I would not mind doing the Cloudflare route, if the $20 plan will do what is needed, and as long as there are easy to follow instructions on how do it (these are not that easy): https://pantheon.io/docs/cloudflare

But yes, I am not sure why:

  1. Pantheon includes bot traffic in the pricing model.

  2. Pantheon can’t block the bots.

Probably there is a good reason for #2, not sure there is one for #1. IMHO they should eat the bot traffic and include it in the overall pricing. Raise everything 10% to do that if necessary.

HUGE fan of Pantheon and it’s corporate culture that has been so great for devs like me and customers in general.

Companies get big, then sometimes the “suits” take over and wreck everything. I hope that is not happening here.

2 Likes

It’s also worth considering that on a more traditional platform if your site is targeted for a high volume of traffic you might have some options on scaling the infrastructure up to cope with the bump, or you might have a way of blocking that traffic to reduce the spike, or worst case, the site is inaccessible for a day or three. Other hosting companies don’t just come back and say “sorry, your site’s traffic has dramatically increased without you being aware of it, you now have to pay us 3x - 10x more, there’s no way for you to be aware that such a traffic spike is happening, there’s no way to know when the traffic drops back allowing you could go back to previous rates, and there’s no option around it.” It’s just a really unfortunate way of doing business.

3 Likes

Interesting. Those outdated-Android-using Chinese bots were the same ones driving up visits on one of my sites that was flagged by Pantheon for overages.

1 Like

Has anyone tried using http:bl to block traffic using Project Honey Pot? https://www.drupal.org/project/httpbl

@DamienMcKenna To the point raised by one of the Pantheon support folks that was quoted earlier, wouldn’t this require disabling caching on your site?

Without disabling caching, the GlobalCDN (Fastly) layer wouldn’t be aware of the blacklist, and therefore wouldn’t block that traffic, but pages served from GlobalCDN still count towards Pantheon’s metrics.

Following up to my March 5, 8:21 AM comment:

I have not yet heard back from Pantheon. I will ping them again today

I am thinking of trying Cloudflare’s Pro Plan ($20/month) . I chatted with them this morning, explained the situation, and pointed them to this discussion.

Person I chatted with was quite knowledgable. He/she said:

there is two option

  • you can get the pro plan ($20) and see if the WAF can help eliminate this.
  • in the case this does not work you can add Rate limiting, if you can create a rule that stops the exact behaviour of those specific Bots, you are sorted

and also:

With the WAF you will be able to do this then
But you will have to create your own rule to block those user agents for instance
You can do this on the pro plan
nonetheless, this is for you to do it, if you are looking for the help of a Solution engineer we have the Enterprise plan for that

If the $20 plan can do this, it’s more than worth it. Not 100% clear how difficult it will be to create the blocking rules, but will see how it goes I guess. Seems like any rule I would do on my Drupal site on Pantheon would disable caching so that’s no good.

This doc confuses me a bit: https://pantheon.io/docs/cloudflare …I guess I would do Option 1?

But anyhow, in a December 19 comment above , Josh said this:

CloudFlare isn’t necessarily a good tool to reduce your usage of Pantheon. I’ve seen them actually increase the amount of pages served from Pantheon due to pre-fetching.

So not sure what the answer is, if that ^ is correct.

1 Like

@Rick, I could be wrong, but I don’t think that option 1 on that doc would buy you anything in terms of the WAF protection you’re looking for. That is literally only using the DNS functionality of Cloudflare.

This configuration routes traffic to Pantheon’s Global CDN exclusively. Unless you’re paying for advanced Cloudflare features or if you have custom configurations (e.g. many page rules) you’d like to keep, turn off Cloudflare’s CDN so that only DNS hosting services are used

1 Like

Ah OK, thanks. I will go back and ask Cloudflare about that.

We run Cloudflare in front of one of our Pantheon-hosted sites and use option #2 with no issues, other than the fact that you can’t clear the Cloudflare cache from your Pantheon site dashboard, the way you can with Global CDN. You may want to reduce your cache TTL if content timeliness is an issue for your site.

1 Like

@gravelpot: Yes, that is a consideration.

Ultimately this is something Pantheon needs to fix at the platform level, or the community need to come to terms with the fact that Pantheon is no longer a good hosting solution for sites with small budgets.

4 Likes

I keep thinking about this quote from a Pantheon support ticket:

Here’s what bugs me about this… there is a direct implication here that with the right tools (Advanced CDN or some other WAF), a customer can block this traffic, but that Pantheon won’t block it because its character as “bot” traffic is dynamically determined by looking at the given request volume in a specific timeframe, not just specific user agents or specific IP addresses.

I’m not very familiar with WAF configuration options, but if Advanced CDN or some other WAF would allow a customer to block that traffic (and I’m assuming here that they don’t mean playing whack-a-mole by constantly having to manually adjust blacklist rules as these traffic patterns mutate), then why is it not possible (and desirable) for Pantheon to do the same thing at the Global CDN/platform level?

If these patterns can be identified with such high confidence, what legitimate customer on the platform actually wants this traffic??

One thing that would be very interesting for Pantheon to provide in the name of transparency would be an analysis of what percentage of total daily traffic to the platform is comprised of this exact same kind of traffic that was confidently identified as bots in @rick’s ticket.

4 Likes

There’s likely a plug-in that you could use though. We’re thinking of going this route… Might also modify the Pantheon advanced cache plug-in so that it ties into cf

You’re saying “plugin,” so I assume you’re on WordPress, and I can’t speak to that. There is a Drupal module, but unfortunately the Drupal 7 version doesn’t work with Cloudflare’s current API.

Hacking the Pantheon advanced page cache module/plugin would certainly also be an option, but also seems unnecessary given why this is even an issue in the first place.

@gravelpot agree, and from my previous comment I am not sure why they cannot do option #1 below. If they really need more income to pay for the bots, raise the price on everyone by X%. The question, I guess, is what is X? If it is 10% or less, that is a no brainer to me.

I am not sure why:

  1. Pantheon includes bot traffic in the pricing model.
  2. Pantheon can’t block the bots.

Probably there is a good reason for #2, not sure there is one for #1. IMHO they should eat the bot traffic and include it in the overall pricing. Raise everything 10% to do that if necessary.

@Rick From Pantheon’s business perspective, eating the cost of bot traffic seems like a pretty high risk if you can’t control/predict it. But by that same logic, shifting that cost to the customer should be seen as similarly risky and unsustainable.

As a customer who passes on hosting costs to their clients, it is starting to feel irresponsible to even agree to pay for such a model.

4 Likes

UPDATE:

Kyle Taylor reached out to me on Slack and worked with me to get setup on Cloudflare.

As of a moment ago I am setup there ($20 plan) and he wrote some blocking rules. Will see how it looks over the next week.

He did this with me a sort of a guinea pig to help update docs so that others can more easily do it on their own.

I will post back here to let you know how it works. I will create my own Google doc to show you my settings.

The only glitch in moving over was it took 15 minutes or so for the SSL to get up on Cloudflare, so site was down then. Probably there is a way to pre-provision that.

2 Likes

Hi, all—thanks again, as always, for sharing your concerns here. We continue to listen closely and are working to get answers for you, as well as developing some resources to help out.

At this time, we’re reviewing on a case-by-case basis. If you’d like us to take another look at a specific plan, please hit reply on the email you received.

If you have done that and weren’t able to get the answers you wanted, please drop me a private message on the forums with your site details & I’ll coordinate another look for you.

How to send a private message: click my user image & then click the “Message” box:

I also wanted to mention that I unset the selected solution–I’m trying to make sure that selected answers lead folks to actionable results.

1 Like

UPDATE:

Sorry to not have updated here sooner. Putting Cloudflare in front is working great. There are two parts to this:

  1. Hook up Cloudflare to your Pantheon site. That is relatively simple and inexpensive ($20/month). It does require that you run your DNS through Cloudflare, which may not be an option if you only need to do this on a subdomain and do not want to move DNS for the main domain.

  2. Write Firewall Rules to block what needs to be blocked. That looks easy to do in Cloudflare, but how exactly how to determine what IPs and/or user agents should be blocked I am not sure (I got help from Pantheon with it).

Don’t know enough yet about how to do #2 to explain it. It involves getting the Pantheon log files and looking to see what is attacking and then blocking the right things in the Firewall Rules. I will try to understand this better and come back with more.

1 Like

Just dropping back in here to note that it’s been many months, and as far as I can tell there has been no substantial movement or communication from Pantheon regarding any further tools, workarounds, or support on this issue. Anything that would show that Pantheon “values it’s non-profit clients”, the notional CDN traffic dashboard… no-shows all around.

It’s a real bummer, because before this started we recommended Pantheon to every Drupal and Wordpress client as a reliable service with great dev tools and customer support. Now as a developer I need to go work up hosting estimates for every new client because my project managers are tired of writing “bad news, Pantheon is reaching into your pocket” emails every month, and I don’t blame them.

Anyway, I’m beyond hope that something will change at this point, just registering the fact that Pantheon is losing business because of the way all this was handled, and furthermore left a very bitter taste in several mouths.

1 Like

Thank you so much for saying this. I agree 100%. The only reason I stopped posting on this thread is that I have given up on Pantheon. I’ve tried personal back channels, etc. and it’s just a lot of runaround that puts the blame on us for the weird, unaccountable way they account for site traffic.

I also used to refer people to them frequently for years. It’s really kind of heartbreaking. This used to be such a great tool for us, and I know there are a lot of good people behind it. But we just can’t live with the ridiculous pricing and the lack of consideration for smaller users.

I’m hoping that something like DDEV or DevShop will grow into a viable alternative. The only reason we haven’t left Pantheon already is that we have a big migration from D7 coming up and I can just start that project on a new host.