I have a public bucket in S3 with a Cloudfront distribution. Some access to the bucket occurs directly via S3 API, and some occurs via Cloudfront.
In S3 bucket access logs, how can I identify whether a particular request came via Cloudfront or not?
(For example, say I want to measure how much of the access was via Cloudfront versus how much was direct to S3?)
1
h/t Anon Coward
Cloudfront usually sets the user agent string “Amazon CloudFront”, even if the distribution origin is an S3 bucket, and this string can be found in the S3 bucket access logs. By default CloudFront doesn’t forward the requester user agent string, because CloudFront is less able to cache responses if the responses might be tailored to slightly different user agents.
AWS also publish their own IP ranges (including specific prefixes used by CloudFront), although these ranges evolve over time and so are more complicated to work with (e.g. there is a change notification service to handle).
As of late 2022, AWS recommends configuring CloudFront to sign S3 requests. This may give more reliable discrimination between CloudFront and direct (especially unsigned) requesters.