I am creating a dynamodb and stumbled upon the basic concept here and want to clarify. in the dynamo db , I have a primary key , demonstrated by id (see example below) . based on the docs , in dynamo db, this primary key is used to generate some hash internally , to determine partition this record will be in . so question , I have is , if I’m generating an unique value for id , then I assume the hash of that will be unique as well . so will each record be in a separate partition ? in that case a sort key wouldn’t make sense, right?
my understanding is , if id in my case was some user_id (refer to Table 2 below) , and that user has several address and he/she lived there in different dates. now , that user id will have it’s own partition and I can use sort key , may be address+date, to organize records in some order , within that user_id partition . but in the previous case (Table 1) , if my id is unique and each record is in it’s own partition then sorting doesn’t make sense. is my understanding correct here?
Table 1
id | date | category | quantity
----------------------------------------------------------
8h7ggfs5 | 2024-08-28 | electronics | 20
3gre423g | 2024-07-27 | clothing | 10
2liu51gw | 2024-06-26 | furniture | 15
...
Table 2
user_id | date | address | dependents
----------------------------------------------------------
1 | 2024-08-28 | 123 simple lane | 2
1 | 2023-05-2 | 56 Woodward blvd | 2
2 | 2024-07-27 | 32 green st | 1
3 | 2024-06-26 | 77 lucky st. | 5
...
Table 2
user_id(partitionkey) | order_id(sort_key) | items
----------------------------------------------------------
1 | 32 | 2
1 | 56 | 2
2 | 32 | 1
3 | 77 | 5
...
There isn’t actually a 1 to 1 relationship between physical partitions and partition key values.
Think of the internal hash function as returning (1..n) where n
is the number of partitions currently used by your table.
You don’t get to specify how many partitions are in use, that’s controlled by DDB. But roughly it’s dependent on how much data is contained and how much Read/Write capacity units are being used.
A single partition can hold 10GB of data and support 3000 RCU / 1000 WCU.
A 100GB table is going to be split into at least 10 physical partitions, even if every row has the same partition key (only allowed if no local secondary indexes [LSI]). A 1GB table that’s read using 30000 RCU will also be split into at least 10 physical partitions.
However, given the design of DDB and the available access methods GetItem(), Query(), Scan() you can consider the data divided into logical partitions by partition key. The only operation that crosses logical partitions is Scan(). Everything else is restricted to a single logical partition.
Even if your table is small enough to backed by a single physical partition.
If your partition key is unique, then having a sort key doesn’t add any value at all; by including a “random” date as the sort key, you won’t ever be able to use GetItem().
1