I have a React frontend and a Nodejs backend that uses authentication via an OIDC service provider. After a user goes through the SSO authentication flow I store the token (containing a refresh token with a 5 hr expiration) in a HTTPOnly cookie with a similar expiration time. I also store a JWT token (1 hr expiration) in a cookie that’s generated by my backend so that I don’t have to keep sending the OIDC token to the service provider on every request. On every request I verify the JWT token. If the JWT token has expired I use the OIDC refresh token to refresh the OIDC token (new access token, new id token, new refresh token) and generate a new JWT token. My web server is configured such that if the cookie containing the OIDC token is missing, it will redirect to /api/login which will start a new SSO authentication flow.
Now it’s time to explain the edge case. Because of the nature of React rendering, multiple API requests from a client can arrive to my backend in a cluster. And in a situation where the JWT has expired, the first request of the bunch will refresh the token and get a new token. But the other requests queued up in the event loop are still using the old refresh token and will get a 500 error from the OIDC service provider when they attempt to refresh their token.
In addition, testing this scenario in Chrome has revealed that the browser has a tendency to aggressively cancel requests that it feels are no longer needed (e.g. full page refresh). For example I am using SWR in my React app which performs revalidations (refetches) in the background. If there is a request in progress and the page gets refreshed, Chrome will cancel that request. And if that request is one where the the OIDC token gets refreshed, the new refresh token won’t get set in a cookie because the request was canceled. Therefore all subsequent requests will have an invalid, expired refresh token until the cookie expires. There are other ways in which Chrome will cancel requests and through testing I have manage to get this scenario to occur fairly consistently.
I have come up with a solution. I have a Map cache in my backend, with the key being the old refresh token and the value being an object that contains the new refresh token and an expiration date (5 mins). When a token gets refreshed, an entry is made in the Map. For requests that are still using the old refresh token, they can check the cache and if found, their request will be treated as valid (along with setting cookies with the new refresh token and JWT cookie). When I finally receive a request with the new refresh token, the entry will be removed from the cache. There is also a setInterval function that will periodically perform maintenance and remove expired entries. I am hoping with the short expiration time it will minimize the risk of a replay or timing attack.
Other ideas that I have rejected:
- keepalive: Using keepalive when performing the fetch will prevent Chrome from cancelling the request but it won’t prevent the other requests from being invalidated because they will still be using the old refresh token.
- exponential backoff algorithm: Combining keepalive with a exponential backoff algorithm may work but it would result in a poor UX as the user will have spend time staring at an error message as requests that come back with a 503 error have to be retried. Also when there’s an failed attempt to refresh the token with the OIDC service provider using an invalid refresh token, I would like to be able to delete the cookie containing the refresh token (forcing the user to go through the SSO auth flow again) which I wouldn’t be able to do with this approach.
So my question is: is there a better approach?
6