I have a CSV file containing measurements of sample objects. I’ve been able to filter the list using Pandas and the Python statistics module. I’m trying to then exclude objects that are either less than 2 standard deviations or greater than 2 standard deviations from the mean and then save the final list to a new csv file. Printouts to the shell confirm the filtering and statistical calculations are working as expected. However, the new CSV file still contains all of the original objects.
I’ve tried researching the issue but haven’t identified what error I’m making or how to fix it. Would appreciate any assistance.
The following is my code:
import statistics
import pandas as pd
import csv
data = pd.read_csv('/Users/myname/Documents/Data/MyData.csv')
count_row = data.shape[0] # Gives number of rows (items)
print ("N =", count_row)
mean = data["Area"].mean().round(2)
print ("Average = ", mean)
sd = data["Area"].std().round(2)
print("SD =", sd)
Lower2SD = mean - (2*sd)
print ("Mean - 2SD =","%.2f" % Lower2SD)
Upper2SD = mean + (2*sd)
print ("Mean + 2SD =","%.2f" % Upper2SD)
print ("Mean +- 2SD = ","%.2f" %Lower2SD, "to", "%.2f" % Upper2SD)
new =list(filter(lambda x: x<Lower2SD, data["Area"]))
print()
print("Value(s) of specimens below the Mean -2SD: ")
print(new)
new =list(filter(lambda x: x>Upper2SD, data["Area"]))
print()
print("Value(s) of specimens exceeding the Mean +2SD: ")
print(new)
adjusted=list(filter(lambda x: x>Lower2SD or x<Upper2SD, data["Area"]))
print()
print("Value(s) of specimens within 2SD: ")
print(adjusted)
global header
header = ["Area"]
with open('/Users/myname/Documents/Data/AdjustedData.csv', 'a', encoding='UTF8', newline='') as f:
writer = csv.writer(f)
writer.writerow(header)
for i in adjusted:
writer.writerow([i])
1