I have a dataframe from a large excel sheet with patient data outcomes that includes 20+ columns and thousands of rows. I would like to take out statistics based on multiple criteria. I would like to count all the samples that fulfill many different criteria, not only one or two (10+ criteria) including not matching some criteria.
The data looks something like this, but with many more rows and columns.
Genepanel | Causative | Fenotype | Gender |
---|---|---|---|
PID | Negative | SCID | Male |
IBMFS | ETV6 | Trombocytopenia | Female |
IBMFS | Negative | Neutropenia | Female |
PID | Negative | Hypogamma | Male |
For example I would like to know how many patients fulfil the criteria:
Genepanel == PID, Causative != Negative Gender==Male and Fenotype == Hypogamma
So for this example, this equals 1
I have read so many different suggestions, using groupyby() and value_counts() but can’t figure out how to use multiple criteria in an easy way.
Since the criteria change often, I would like to write a code that is easy to modify depending on the criteria.
I am a newbie to all programming, so the simpler the better..