Parquet files are stored in AWS S3 with prefixes like /fruit=.../year=.../month=.../day=.../
.
Their data are queried via AWS Athena, with a table in which fruit
is typed as an enum
:
'projection.fruit.type'='enum',
'projection.fruit.values'='apple,banana,cherry',
Later we could have Parquet files under prefix /fruit=date/year=...
.
But their data won’t be visible from Athena until the projection is updated with apple,banana,cherry,date
.
How to keep the Athena partition up to date with the current S3 prefixes?
Ideally in real time, but refreshing it once or twice a day might be an acceptable compromise for now.
The cardinality is not too high, hundreds of values at most.
I’ve seen that there is a projection type of injected
which may be a good fit, but I’m unsure, and I am wondering if there could be any drawbacks.