Background
I have a complex analytics application using conventional relational setup where different entities are updated using a CRUD model.
However, our system is quite event driven, and it’s becoming a real misery to manage history, and apply data migrations respecting that history.
So we’re moving toward a quasi-noSQL approach in which I’m using JSON to represent data with a linked list of patches representing the history.
As far as data structure goes it’s a very elegant solution and will work well with other services. As far as django goes, I’m now struggling to use the ORM.
My database is PostgreSQL.
Situation
Previously I had something like (pseudocode):
class Environment(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
class Analysis(models.Model):
created = models.DateTime(auto_now_add=True)
environment = models.ForeignKey(
"myapp.Environment",
blank=True,
null=True,
related_name="analyses",
)
everything_else = models.JSONField()
Whereas I now will have:
class Environment(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
class Analysis(models.Model):
created = models.DateTime(auto_now_add=True)
everything = models.JSONField()
Where everything
might be a JSON object looking like…
{
"environment": "2bf94e55-7b47-41ad-b81a-6cce59762160",
"other_stuff": "lots of stuff"
}
Now, I can still totally get the Environment
for a given Analysis
because I have its ID, but if I do something like this (again pseudocode):
for analysis in Analysis.objects.filter(created=today).all():
print(Environment.objects.get(id=analysis.everything['environment'])
I of course get an N+1
error because I’m querying the database for the environment on every iteration.
In a simple script I could put all the ids in a list then do a single query for them all… but that doesn’t work well with all of djangos builtins like API serialiser methods and admin methods. I really want to be able to modify the queryset to get what I want.
Question
Given a queryset like Analysis.objects.filter(created=today).all()
, how can I adjust this queryset to pre-fetch the related environment objects, now that their IDs are in a JSONField?