When it comes to rate-limiting a web service there are two approaches:
- Option 1: Throw HTTP 429 when QPS exceeds the rate-limit
- Option 2: Throttle the request i.e., put it in some queue and delay it deliberately
With Option 1, my worry is that the threads never get a break – a malicious attacker may not care about HTTP 429 and keep bombarding the application with more requests. But do we care or am I overthinking? What are the consequences?
With Option 2, the queue may become unbounded if we are not careful. Thus we need to drop requests after a certain point.
My question is which of these approaches is better and why?