Quick disclaimer that I am new to MongoDB.
So, I have a collection called ‘Films’ where documents have the following structure:
_id: ObjectID('6651be669853a31636ac8f5e')
adult: "False"
original_language: "en"
genres: "[{'id': 68, 'name': 'Action'}, {'id': 21, 'name': 'War'}]"
...
I am interested in the genres field, which is an array of embedded documents, where each has an ‘id’ and a ‘name’ field. Note, that there can be more genres than 2. I want to get a list of all the distinct genre names in my Films collection. Based on similar questions I found online, I figured I have to use an aggregation pipeline, so I tried the following: (I am using Python in Sublime Text):
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
uri = "..."
client = MongoClient(uri, server_api=ServerApi('1'))
db = client['FilmDB']
pipeline = [
{"$unwind": "$genres"},
{"$group": {"_id": "$genres.name"}},
{"$project": {"_id": 0, "name": "$_id"}}
]
distinct_genres_cursor = db["Films"].aggregate(pipeline)
distinct_genres = [genre['name'] for genre in distinct_genres_cursor]
print(distinct_genres)
However, I get [None]
as output. I have tried a couple of different but similar methods, and I keep getting an empty list for distinct_genres.
What am I doing wrong? Any help would be greatly appreciated!