I was working with a dataset where categorization for Education an individual was categorized by number (e.g. 3 for an associate degree). For easier categorization, I decided to change each number to what it actually represents. To do so, I am using the replace function in pandas. However, I have noticed that it is not consistent with replacement. It replaced the 0 and 1 for males and females but failed to do so for all education levels. I have attached my code below and would appreciate any help. `# Rename columns for clarity
dataset_clean2 = dataset_clean.rename({“D4″:”EDUCATION”, “D5RANGE” : “AGE”}, axis=’columns’)
Replace Numbers with Categories For Simplicity
dataset_clean2[‘BORNUSA’].replace({1: ‘YES’, 0: ‘NO’}, inplace=True)
dataset_clean2[“EDUCATION”].replace({
1: ‘NOT HS GRAD’,
2: ‘HS GRAD’,
3: ‘ASSOCIATES’,
4: ‘UNDERGRAD’,
5: ‘GRAD’
}, inplace=True)
dataset_clean2[‘GENDER’].replace({
0: ‘F’,
1: ‘M’
}, inplace=True)
dataset_clean2[‘TOTGRNG’].replace({
0: “NONE”,
1: “1-49 USD”,
2: “50-99 USD”,
3: “100-199 USD”,
4: “200-299 USD”,
5 : “300-399 USD”,
6 : “400-499 USD”,
7 : “>500 USD”
}, inplace=True)`
And I also attached a screenshot of my output I am recieving.
I tried doing str.replace and that did work in replacing the terms. However, for some that did have a value it gave a NaN output.
Amey Bharambe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.