How to Stop non-caching UAs from endless Server Push?

Is there code to limit an IP from getting more than one set of pushed files per visit or only once per html or php page?

Browsers will cache files and stop further server file pushing. However some robots pretending to be browsers are not caching and continue to request sometimes hundreds of the same pushed files.

I gather that Apache’s mod_http2 tries to avoid re-sending a pushed resource, but it is only per-connection, so typical one-connection-per-request robots won’t benefit. Description from HTTP/2 Guide:

The module will keep a diary of what has been PUSHed for each connection (hashes of URLs, basically) and will not PUSH the same resource twice. When the connection closes, this information is discarded.

Thanks, I’ve read similar. Again, none of that seems to be accurate, at least the way it’s set-up at Dreamhost.

Example: bot coming from AWS posing as browser might request 90 pushed files all in a second or two in one connection. I push 9 files, so the bot will keep requesting the same 9 over and over again until the instance limit is likely reached at their AWS account.

Other bots are config’d better and don’t abuse this, but the several that do are filling my server logs. One of them is the bot from Google looking for Weblight (which I block.)

So if the documentation is correct, Dreamhost may have something mis-config’d.

They’ve historically customised all the heavy gear so I’d put money on this being the issue.

A manual fix in the interim would be logging the requests within your script/s and refusing to honour any repeats by dropping the request on a per-day basis or whatever.

Edit: Oh, wait… it’s the logs that are pissing you off. That’s going to require you track the bots and block them by IP (or other signature) until the pool is closed. I’d set a script to auto-rewrite the .htaccess file as each particularly badbot is encountered.

Thanks, I considered that,. That’s not what I want to do. The request limit shouldn’t need the IP, just a rule to limit the Push files to once per connection. The Push files are already in the (above) code so I’m looking for a wrapper to accomplish this.

I’ve seen variations of this before, just forgot how it’s done. Posted in Stack Overflow but no responses.

I have no practical experience with HTTP2 so this was a good thread to kick off some research. In addition to the HTTP2 link above, you may have seen this:
https://httpd.apache.org/docs/2.4/mod/mod_http2.html#h2pushdiarysize
And here is code for managing how the header is modified with push detail.

Abusive bots probably aren’t using cookies so the only way to identify a repeat offender might be with the client IP address. Sure, that isn’t a perfect solution either but what is the chance that you’re missing truly legitimate traffic on the same IP that’s NAT’d with a bot? Casualties of war, I say.

Regarding:

The request limit shouldn’t need the IP, just a rule to limit the Push files to once per connection.

That relates to the H2PushDiarySize Directive :

This directive toggles the maximum number of HTTP/2 server pushes that are remembered per HTTP/2 connection. This can be used inside the <VirtualHost> section to influence the number for all connections to that virtual host.

That might help to get around DH configurations?

I hope something here was helpful.

Thanks. If I used a dedicated server I could do that. On shared hosting I can’t.

I’m looking for a bit of RegEx to accomplish this via existing rules in Htaccess.

Earlier today a bot requested the same dozen pushed files 7k times. The bot’s IP and its UA are blocked so it’s getting all 403s, yet it keeps coming. I blame GitHub for making these stupid file scrapers available.