Can you shard based on a key that has duplicate values, or does it need to be unique like a primary key? I have a table where data, including a Date, is added daily. The table’s structure is:
id | Information | Date
Every day, new entries with information and a date are added, like:
1 | XXXXX | Aug 21 2023
2 | XXXXY | Aug 21 2023
3 | XXXXZ | Aug 21 2023
4 | AXXXY | Aug 22 2023
5 | BXXXY | Aug 22 2023
6 | CXXXY | Aug 22 2023
I aim to fetch the most recent data based on the Date (for instance, Aug 22). Can I use the Date to range shard the data?
Additionally, I believe updates might be quicker if sharded by the id
column. For instance, if id 1
gets new data today, its Information and Date will be refreshed. I get a daily list of ids with updated data. My primary need is to access the latest updated data for the day. What’s the optimal sharding method for this?
Yes you can shard on a key that has duplicate values. This is the case with composite primary key where the first column is HASH sharded. This one and only this one determines the shard. It is also the case with non-unique secondary indexes.
For example, for your use case, you can define the table with primary key(id hash)
and it will be sharded on a hash of the primary key – easy to update per primary key. For the use case that gets new data, you can add a secondary index on (date desc)
. You can even create this index as (date desc) include (information, id)
so that it can do an Index Only Scan without going to the table. It will be a bit larger, and updates will take a bit longer, but the queries will be faster.