My collection with fields:
[
{
"_id": xxx,
"field": {
"subfield": [
{
"k1": "a",
"k2": "b",
...
},
{
"k1": "a",
"k2": "b",
...
},
{
"k1": "c":
"k2": "d",
},
...
]
},
"kn": "z",
...
}
]
If subfield.0.k1
is a
then subfield.0.k2
will always be b
, etc.
I am trying to get the distinct values of k1
along with their value counts, and then the unique (singular) value of k2
that is associated with k1
. But due to the amount of documents and variable-length arrays, trying to $unwind
the matched documents in an aggregation pipeline kills the process or maxes out object size.
I managed to work within limits to retrieve value counts in PyMongo:
from collections import Counter
from pymongo import MongoClient
db = MongoClient(...).get_database(...).get_collection(...)
query = [
{"$match": {"kn": "z"}},
{"$group": {"_id": "$field.subfield.k1", "count": {"$sum": 1}}}
]
results = list(db.aggregate(query)) # this returns a list of dicts where grouped unique _id is an array
results = [d["_id"] * d["count"] for d in results] # expanding these list _ids enables us to get value counts
results = [element for sublist in results for element in sublist]
results = Counter(results)
I’m looking for a way to find the unique value of k2
associated with each value of k1
. I could find_one()
each returned counter value, but this is too time-consuming.
3
You can use an aggregation pipeline with $reduce
operator.
- Filter documents by condition kn: “z” using the
$match
operator. - Project the subfield array from object using the
$project
operator. - Map the subfield array to get only k1 and k2 again using the
$project
operator. - Group documents into one array using the
$group
operator. - Accumulate counts of k1 while ensuring the unique association with k2 using the
$reduce
operator. - Convert accumulated object back to an array using the
$objectToArray and $unwind
operator. - Project fields k1, count, and k2 using the
$project
operator.
db.collection.aggregate([
{
$match: {
kn: "z"
}
},
{
$project: {
subfield: "$field.subfield"
}
},
{
$project: {
subfield: {
$map: {
input: "$subfield",
as: "sf",
in: {
k1: "$$sf.k1",
k2: "$$sf.k2"
}
}
}
}
},
{
$group: {
_id: null,
subfield: { $push: "$subfield" }
}
},
{
$project: {
subfield: {
$reduce: {
input: "$subfield",
initialValue: {},
in: {
$mergeObjects: [
"$$value",
{
$arrayToObject: {
$map: {
input: "$$this",
as: "sf",
in: [
"$$sf.k1",
{
count: {
$add: [
{ $ifNull: ["$$value.$$sf.k1.count", 0] },
1
]
},
k2: "$$sf.k2"
}
]
}
}
}
]
}
}
}
}
},
{
$project: {
subfield: {
$objectToArray: "$subfield"
}
}
},
{
$unwind: "$subfield"
},
{
$project: {
k1: "$subfield.k",
count: "$subfield.v.count",
k2: "$subfield.v.k2"
}
}
]);
1