I have list of dicts.
Each dict has the same items with a different value. It is a question code with the answer.
So question “a” can be answerd with a scale of 1-7. If you do not want to answer that question, it is an “X” in the answer sheet.
example:
[
{"a":1,"b":2,"c":"X"},
{"a":1,"b":"X","c":3},
{"a":1,"b":2,"c":"X"}
]
My goal is to know, if there is any question, that has less than 5 or more than 50 valid answers. A valid answer is anything except “X”.
So I need to count now, how often any item is not equal “X”.
In that case, for “a” it would be 3, for “b” it would be 2 and for “c” it would be 1.
I could loop through it.
Make an array for each item in the dicts. Then loop trough the list, inside that loop through the dict and add +1 to the item in the array. Afterwards check the counter, if there is any item < 5 or > 50.
But there is probably some sweet python code (as always) that does this in like 3 lines. While my loop would probably be 5 times bigger.
And is faster. I have to repeat that check a few thousand times. So speed is kinda important. Not super duper important, but nice to have
4
As everything is loaded in memory, I would just iterate the keys, counting the number of 'X'
per key. Assuming your list of dictionnary is ld
I would try:
keys = ld[0].keys() # ignore this if you already have a list of the keys
missing = {k: sum(d[k] == 'X' for d in ld) for k in keys}
With your sample data, it gives as expected:
{'a': 0, 'b': 1, 'c': 2}
Of course this only makes sense if no dictionary after the first one contains a new key…
1
There are several solutions. You can use the collection.Counter
and iterate through the items in your list, checking if the value is 'X'
.
from collections import Counter
data = [
{"a":1,"b":2,"c":"X"},
{"a":1,"b":"X","c":3},
{"a":1,"b":2,"c":"X"}
]
counts = Counter(k for d in data for k, v in d.items() if v=='X')
for k, v in sorted(counts.items(), key=lambda tup: tup[1]):
if v < 5:
print(f'{k} has less than 5 "X" responses ({v})')
if v > 50:
print(f'{k} has greater than 50 "X" responses ({v})')
If you want a sledgehammer, you could use pandas and load the data into a data frame and then sum when values are equal to ‘X’
import pandas as pd
df = pd.DataFrame(data)
counts = df.ne('X').sum(axis=0).sort_values()
print('Keys with less than 5 "X" responses:')
print('n'.join(repr(counts[counts.lt(5)]).split('n')[:-1]))
print('Keys with more than 50 "X" responses:')
print('n'.join(repr(counts[counts.gt(50)]).split('n')[:-1]))
1