1) blocks all requests for a specific asset if a session has requested it too many times in a row, and 2) doesn't have a cooldown system in place to allow said session to access the asset again after some time has passed.
your guesses were all very reasonable, and that's what I checked for the people on discord, to no avail :/
also our rate-limit system does have a cooldown and a grace period both, to handle both bursts and clients that stop misbehaving... :/ also if you get caught, refreshing the page would show a very clear "you are a banned" for a couple of minutes
my best guess is one of:
a. ISP fuckup of some kind (we use BGP and if their ISP is flapping routes a lot, when this should be every few hours at most, that will break established TCP connections)
b. Some kind of MITM system bug causing in-transit TLS checks to fail every now and then (you'd be surprised at how many ISPs run those, on top of people's local AVs also running bootleg ones)
c. Some reverse proxy bug on our end, though we use HAProxy, which isn't exactly uncommon, as Wikipedia and other similarly large websites also do (
because it is reliable)
So at this point we're kind of stuck on the way investigating why, because the next step is looking at packet captures, which is obviously not feasible in our context