Let us say that we have an AWS DynamoDB table called Table
. Below is the JSON representation of the data we store in Table
.
{
"attr1": <attribute 1>,
"attr2": <attribute 2>,
"attr3": {
"sub_attr1": <sub-attribute 1>,
"sub_attr2": <sub-attribute 2>,
"sub_attr3": <sub-attribute 3>,
},
}
Now I want to be able to use these sub-attributes as filters. In other words, I want to be able to select 1 or more of the values of these sub-attributes, and retrieve all samples/documents from this table that hold these value/s for their respective sub-attribute/s.
For example, let us say that we wanted to filter based on the value of sub_attr2
being value2
. In Python, one way to do this is the following:
from boto3.dynamodb.conditions import Attr
data = table.scan(
FilterExpression=Attr('attr3.sub_attr2').eq('value2')
)
However, the problem with this approach is that using scan
can be expensive in terms of read capacity consumption because it reads every item in the entire table and then discards unwanted data.
A better alternative to scan
is query
, however, the query operation in DynamoDB is designed to work with primary key attribute values and cannot be used to directly filter on nested attributes. To be able to use query
, we would first need to flatten the data by promoting all of these nested sub-attributes to top-level attributes, and create secondary indices on each promoted attribute.
My question is, are there other ways we can go about this? In other words, are there any ways in which we can keep the table design as proposed, without flattening it, and still be able to filter the contents of Table
in a quick, efficient, and cost-effective manner?
Thanks in advance for all your responses.
1