A way around “None of [‘index’] are in the columns error without changing read.csv line, resulting in C error: Expected 15 fields in line 6, saw 16

I keep encountering the same problem, ‘KeyError: “None of [‘site’] are in the columns”‘

I’m going to share my entire code, because i don’t know if there have been any problems with my read.csv until now, but perhaps i have been unknowingly troubleshooting them.

If any trained eyes out there have time to go through it, i would be eternally grateful.

For context: C, H, E, G, and D are species types, and all the sites are in Scotland. I use ‘site’ and ‘county’ pretty interchangeably which will need amending, and the dataset was imported from https://opendata.nature.scot/datasets/snh::waxcap-sites/explore?location=55.056158%2C1.905751%2C5.00&showTable=true. I suck at coding, but i want to be an employable environmental scientist so i’m trying my best 😀

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>import pandas as pd
import numpy as np
from collections import Counter
import spacy
import matplotlib.pyplot as plt
import collections, functools, operator
import itertools
import functools
from functools import reduce
import pyproj
nlp = spacy.load('en_core_web_sm')
dataframe = pd.read_csv('Grassland_Fungi.csv', low_memory=False)
countycolumn = dataframe.iloc[:,6]
indicatorscolumn = dataframe.iloc[:,33]
C = dataframe.iloc[:,11]
H = dataframe.iloc[:,12]
E = dataframe.iloc[:,13]
G = dataframe.iloc[:,14]
D = dataframe.iloc[:,15]
# Function to remove decimal error from 'East Ross.'
def EastRoss(countycolumn):
for error in countycolumn:
county = error.replace(".","")
yield county
# Function to county site occurences
def county_occurrence(countycolumn):
print('County Occurrence')
countylist = []
for county in countycolumn:
countylist.append(county)
a = Counter(countylist).keys()
b = Counter(countylist).values()
alist = []
blist = []
for a, b in zip(a, b):
alist.append(a)
blist.append(b)
df_SO = pd.DataFrame(list(zip(alist, blist)), columns = ['site', 'number'])
sorted = df_SO.sort_values('site')
return sorted
# Function to create dataframe presenting indicators by county
def indicatorcount(countylist, indicatorlist):
print('Sum of indicators per county')
df = pd.DataFrame(list(zip(countylist, indicatorlist)), columns = ['site',
'indicators'])
df2 = df.groupby('site').sum()
return df2
# Function to sum CHEGD instances per county
def CHEGD_funct(countylist, Clist, Hlist, Elist, Glist, Dlist):
print('Sum of C H E G D frequency per county')
Cdict = pd.DataFrame(list(zip(countylist, Clist)), columns = ['site', 'C'])
Hdict = pd.DataFrame(list(zip(countylist, Hlist)), columns = ['site', 'H'])
Edict = pd.DataFrame(list(zip(countylist, Elist)), columns = ['site', 'E'])
Gdict = pd.DataFrame(list(zip(countylist, Glist)), columns = ['site', 'G'])
Ddict = pd.DataFrame(list(zip(countylist, Dlist)), columns = ['site', 'D'])
Cdf = Cdict.groupby('site').sum().reset_index()
Hdf = Hdict.groupby('site').sum().reset_index()
Edf = Edict.groupby('site').sum().reset_index()
Gdf = Gdict.groupby('site').sum().reset_index()
Ddf = Ddict.groupby('site').sum().reset_index()
dataframes = [Cdf, Hdf, Edf, Gdf, Ddf]
mergedf = reduce(lambda left,right: pd.merge(left,right,on=['site'],
how='outer'), dataframes).fillna('void')
return mergedf
# Function to sum CHEGD instances per county
def mean_chegd(siteoccurence, CHEGD):
print("mean CHEGD frequency per county")
merge = siteoccurence.reset_index(drop=True).merge(CHEGD.reset_index(drop=True),
how="right")
cols = ['C', 'H', 'E', 'G', 'D']
out = (merge[cols].div(merge['number'],
axis=0).combine_first(merge).reindex_like(merge)).set_index('site')
return out
# EastRoss error correct
countylist = []
for item in EastRoss(countycolumn):
countylist.append(item)
# Site occurrence
siteoccurence = county_occurrence(countylist)
print(siteoccurence)
# Indicators per county
indicatorslist = []
for u in indicatorscolumn:
indicatorslist.append(u)
indicators = indicatorcount(countylist, indicatorslist)
print(indicators)
for item in indicatorcount(countylist, indicatorslist):
print(item)
# CHEGD per county
Clist = []
Hlist = []
Elist = []
Glist = []
Dlist = []
for c in C:
Clist.append(c)
for h in H:
Hlist.append(h)
for e in E:
Elist.append(e)
for g in G:
Glist.append(g)
for d in D:
Dlist.append(d)
CHEGD = CHEGD_funct(countylist, Clist, Hlist, Elist, Glist, Dlist)
print(CHEGD)
# Mean CHEGD per county
mean = mean_chegd(siteoccurence, CHEGD)
print(mean)
# Prepare average for visualisation
average = mean.drop('number', axis=1).T
print(average)
average.columns = average.columns.str.strip()
average.columns = [col.replace("-", "") for col in average.columns]
average.set_index('site',inplace=True)
</code>
<code>import pandas as pd import numpy as np from collections import Counter import spacy import matplotlib.pyplot as plt import collections, functools, operator import itertools import functools from functools import reduce import pyproj nlp = spacy.load('en_core_web_sm') dataframe = pd.read_csv('Grassland_Fungi.csv', low_memory=False) countycolumn = dataframe.iloc[:,6] indicatorscolumn = dataframe.iloc[:,33] C = dataframe.iloc[:,11] H = dataframe.iloc[:,12] E = dataframe.iloc[:,13] G = dataframe.iloc[:,14] D = dataframe.iloc[:,15] # Function to remove decimal error from 'East Ross.' def EastRoss(countycolumn): for error in countycolumn: county = error.replace(".","") yield county # Function to county site occurences def county_occurrence(countycolumn): print('County Occurrence') countylist = [] for county in countycolumn: countylist.append(county) a = Counter(countylist).keys() b = Counter(countylist).values() alist = [] blist = [] for a, b in zip(a, b): alist.append(a) blist.append(b) df_SO = pd.DataFrame(list(zip(alist, blist)), columns = ['site', 'number']) sorted = df_SO.sort_values('site') return sorted # Function to create dataframe presenting indicators by county def indicatorcount(countylist, indicatorlist): print('Sum of indicators per county') df = pd.DataFrame(list(zip(countylist, indicatorlist)), columns = ['site', 'indicators']) df2 = df.groupby('site').sum() return df2 # Function to sum CHEGD instances per county def CHEGD_funct(countylist, Clist, Hlist, Elist, Glist, Dlist): print('Sum of C H E G D frequency per county') Cdict = pd.DataFrame(list(zip(countylist, Clist)), columns = ['site', 'C']) Hdict = pd.DataFrame(list(zip(countylist, Hlist)), columns = ['site', 'H']) Edict = pd.DataFrame(list(zip(countylist, Elist)), columns = ['site', 'E']) Gdict = pd.DataFrame(list(zip(countylist, Glist)), columns = ['site', 'G']) Ddict = pd.DataFrame(list(zip(countylist, Dlist)), columns = ['site', 'D']) Cdf = Cdict.groupby('site').sum().reset_index() Hdf = Hdict.groupby('site').sum().reset_index() Edf = Edict.groupby('site').sum().reset_index() Gdf = Gdict.groupby('site').sum().reset_index() Ddf = Ddict.groupby('site').sum().reset_index() dataframes = [Cdf, Hdf, Edf, Gdf, Ddf] mergedf = reduce(lambda left,right: pd.merge(left,right,on=['site'], how='outer'), dataframes).fillna('void') return mergedf # Function to sum CHEGD instances per county def mean_chegd(siteoccurence, CHEGD): print("mean CHEGD frequency per county") merge = siteoccurence.reset_index(drop=True).merge(CHEGD.reset_index(drop=True), how="right") cols = ['C', 'H', 'E', 'G', 'D'] out = (merge[cols].div(merge['number'], axis=0).combine_first(merge).reindex_like(merge)).set_index('site') return out # EastRoss error correct countylist = [] for item in EastRoss(countycolumn): countylist.append(item) # Site occurrence siteoccurence = county_occurrence(countylist) print(siteoccurence) # Indicators per county indicatorslist = [] for u in indicatorscolumn: indicatorslist.append(u) indicators = indicatorcount(countylist, indicatorslist) print(indicators) for item in indicatorcount(countylist, indicatorslist): print(item) # CHEGD per county Clist = [] Hlist = [] Elist = [] Glist = [] Dlist = [] for c in C: Clist.append(c) for h in H: Hlist.append(h) for e in E: Elist.append(e) for g in G: Glist.append(g) for d in D: Dlist.append(d) CHEGD = CHEGD_funct(countylist, Clist, Hlist, Elist, Glist, Dlist) print(CHEGD) # Mean CHEGD per county mean = mean_chegd(siteoccurence, CHEGD) print(mean) # Prepare average for visualisation average = mean.drop('number', axis=1).T print(average) average.columns = average.columns.str.strip() average.columns = [col.replace("-", "") for col in average.columns] average.set_index('site',inplace=True) </code>
import pandas as pd
import numpy as np
from collections import Counter
import spacy
import matplotlib.pyplot as plt
import collections, functools, operator 
import itertools
import functools
from functools import reduce
import pyproj

nlp = spacy.load('en_core_web_sm')
dataframe = pd.read_csv('Grassland_Fungi.csv', low_memory=False)

countycolumn = dataframe.iloc[:,6]
indicatorscolumn = dataframe.iloc[:,33]
C = dataframe.iloc[:,11]
H = dataframe.iloc[:,12]
E = dataframe.iloc[:,13]
G = dataframe.iloc[:,14]
D = dataframe.iloc[:,15]

# Function to remove decimal error from 'East Ross.'
def EastRoss(countycolumn):
    for error in countycolumn:
        county = error.replace(".","")
        yield county

# Function to county site occurences
def county_occurrence(countycolumn):
    print('County Occurrence')
    countylist = []
    for county in countycolumn:
        countylist.append(county)
    a = Counter(countylist).keys()
    b = Counter(countylist).values()
    alist = []
    blist = []
    for a, b in zip(a, b):
        alist.append(a)
        blist.append(b)
    df_SO = pd.DataFrame(list(zip(alist, blist)), columns = ['site', 'number'])
    sorted = df_SO.sort_values('site')
    return sorted


# Function to create dataframe presenting indicators by county
def indicatorcount(countylist, indicatorlist):
    print('Sum of indicators  per county')
    df = pd.DataFrame(list(zip(countylist, indicatorlist)), columns = ['site', 
'indicators'])
    df2 = df.groupby('site').sum()
    return df2

# Function to sum CHEGD instances per county
def CHEGD_funct(countylist, Clist, Hlist, Elist, Glist, Dlist):
    print('Sum of C H E G D frequency per county')
    Cdict = pd.DataFrame(list(zip(countylist, Clist)), columns = ['site', 'C'])
    Hdict = pd.DataFrame(list(zip(countylist, Hlist)), columns = ['site', 'H'])
    Edict = pd.DataFrame(list(zip(countylist, Elist)), columns = ['site', 'E'])
    Gdict = pd.DataFrame(list(zip(countylist, Glist)), columns = ['site', 'G'])
    Ddict = pd.DataFrame(list(zip(countylist, Dlist)), columns = ['site', 'D'])
    Cdf = Cdict.groupby('site').sum().reset_index()
    Hdf = Hdict.groupby('site').sum().reset_index()
    Edf = Edict.groupby('site').sum().reset_index()
    Gdf = Gdict.groupby('site').sum().reset_index()
    Ddf = Ddict.groupby('site').sum().reset_index()
    dataframes = [Cdf, Hdf, Edf, Gdf, Ddf]
    mergedf = reduce(lambda  left,right: pd.merge(left,right,on=['site'],
                 how='outer'), dataframes).fillna('void')
    return mergedf

# Function to sum CHEGD instances per county
def mean_chegd(siteoccurence, CHEGD):
    print("mean CHEGD frequency per county")
    merge = siteoccurence.reset_index(drop=True).merge(CHEGD.reset_index(drop=True), 
how="right")
    cols = ['C', 'H', 'E', 'G', 'D']
    out = (merge[cols].div(merge['number'], 
        axis=0).combine_first(merge).reindex_like(merge)).set_index('site')
    return out




# EastRoss error correct 
countylist = []

for item in EastRoss(countycolumn):
    countylist.append(item)

# Site occurrence 
siteoccurence = county_occurrence(countylist)
print(siteoccurence)


# Indicators per county
indicatorslist = []
for u in indicatorscolumn:
    indicatorslist.append(u)
indicators = indicatorcount(countylist, indicatorslist)
print(indicators)

for item in indicatorcount(countylist, indicatorslist):
    print(item)

# CHEGD per county
Clist = []
Hlist = []
Elist = []
Glist = []
Dlist = []
for c in C:
    Clist.append(c)
for h in H:
    Hlist.append(h)
for e in E:
    Elist.append(e)
for g in G:
    Glist.append(g)
for d in D:
    Dlist.append(d)
CHEGD = CHEGD_funct(countylist, Clist, Hlist, Elist, Glist, Dlist)
print(CHEGD)

# Mean CHEGD per county
mean = mean_chegd(siteoccurence, CHEGD)
print(mean)

# Prepare average for visualisation
average = mean.drop('number', axis=1).T
print(average)

average.columns = average.columns.str.strip()
average.columns = [col.replace("-", "") for col in average.columns]
average.set_index('site',inplace=True)

This is where i get the ‘KeyError: “None of [‘site’] are in the columns”‘

For a reproducible example, this is what the dataframe looks like at this point:

Site A B C D
Blue 4 13 9 11
Green 1 12 30 20
Yellow 12 2 3 3
Red 20 14 4 0

I have tried converting it into a dictionary, to see if ‘site’ is recognised as the index, which gives me this output (sorry its not from the same example);

(and imagine C is cosistently on the same line as the name, and H, E, G, D are all on respective new lines

{‘Angus’: C 0.606061 n
H 2.787879
E 0.757576
G 0.000000
D 0.000000
Name: Angus, dtype: float64, ‘Angus / East Perthshire’: C 1.0
H 8.0
E 3.0
G 0.0
D 0.0
Name: Angus / East Perthshire, dtype: float64, ‘Argyll’: C 0.582645
H 4.111570
E 0.367769
G 0.280992
D 0.028926
Name: Argyll, dtype: float64, ‘Ayrshire’: C 0.702970
H 3.326733
E 0.673267
G 0.168317
D 0.089109
Name: Ayrshire, dtype: float64, ‘Banffshire’: C 0.241379
H 2.965517
E 0.655172
G 0.000000
D 0.000000
Name: Banffshire, dtype: float64} <

which looks very, very wrong, because the ‘Name’: contains two column names.

I have tried:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>dataframe = pd.read_csv('Grassland_Fungi.csv', header=0, delim_whitespace=True, low_memory=False)
</code>
<code>dataframe = pd.read_csv('Grassland_Fungi.csv', header=0, delim_whitespace=True, low_memory=False) </code>
dataframe = pd.read_csv('Grassland_Fungi.csv', header=0, delim_whitespace=True, low_memory=False)

and a few other variations. but get this same ugly error.

Traceback (most recent call last):
File “/Users/macbook/Desktop/mushrooms/mushrooms.py”, line 16, in
dataframe = pd.read_csv(‘Grassland_Fungi.csv’, header=0, delim_whitespace=True, low_memory=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py”, line 948, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py”, line 617, in _read
return parser.read(nrows)
^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py”, line 1748, in read
) = self._engine.read( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py”, line 239, in read
data = self._reader.read(nrows)
^^^^^^^^^^^^^^^^^^^^^^^^
File “parsers.pyx”, line 825, in pandas._libs.parsers.TextReader.read
File “parsers.pyx”, line 913, in pandas._libs.parsers.TextReader._read_rows
File “parsers.pyx”, line 890, in pandas._libs.parsers.TextReader._check_tokenize_status
File “parsers.pyx”, line 2058, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 15 fields in line 6, saw 16

Hmmmm, my line 6 is an import, which i’ve tried, hashing out. I get that removing the header removes a field, but i’m not sure where to adjust to that in my code?

Will lots of my code need amending? And if so, is there a way around it without changing the import line?

Is ‘C error’ referencing my C index?

Thanks again, and sorry if this is structured terribly.

1

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật