I have a dataframe with thousands of rows, and around 20 different columns. Simplified looking like this:
d = {'PIN': [445588,445588,445588,668989,668989,212121], 'LID': [124, 124, 125, 625, 625, 325], "Results": [8,5,3,23,11,45]}
df = pd.DataFrame(data=d)
Each PIN occurs many times, and for each PIN, the same LID can occur many times, but with other columns (in this simply version Results) are different.
I want to:
- Count the number of unique LID for each PIN.
- Extract each of the PIN into a new dataframe, that can be exported into excel. So in the end I want to have a separate file for each PIN (with each all LID on separate rows). I want all of the original columns to be included, just basically split the dataframe, one for each unique PIN.
I managed (by some help from some of you) to count the number of unique LID for each PIN.
df = all_data.groupby("PNR")
df = df.agg({"LID":"nunique"}).reset_index()
print(df)
But how can I extract all the data, into separate files?
df = pd.DataFrame(df.groupby(['PNR','LID']),columns=["Results"])
df.to_excel("alla_LID_data.xlsx")
The output file makes no sense…
Is it better to use some sort of split, or a loop to create separate files, instead of grouping the unique PIN?
Sofie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.