I’m encountering a type error in my Python code when trying to process DST data from a URL using Pandas. Here’s the relevant snippet of my code:
url_jan = "https://wdc.kugi.kyoto-u.ac.jp/dst_realtime/202401/dst2401.for.request"
data_jan = pd.read_csv(url_jan, delim_whitespace=True, header=None)
data_jan.drop(columns=[1, 26], inplace=True)
columns = ['Date'] + list(range(1, 25))
data_jan.columns = columns
data_jan['Mean'] = data_jan.iloc[:,1:25].mean(axis=1)
df_mean = data_jan[['Date', 'Mean']]
What I Tried: I attempted to read DST data from a URL using Pandas, dropped unnecessary columns, and assigned column names where the first column should be ‘Date’ and subsequent columns should be integers from 1 to 24.
Expected Outcome: I expected the column names to be assigned correctly without any errors, allowing me to compute the mean across columns 1 to 24 and create a new DataFrame (df_mean) containing ‘Date’ and ‘Mean’ columns.
Actual Result: However, during the assignment of column names (data_jan.columns = columns), a type error occurred: “TypeError: can only concatenate str (not ‘int’) to str”. This error prevented the column names from being assigned as expected.
Raion Chan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Your table contains a non-numeric trailing row:
data_jan.iloc[:,1:25].tail()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
27 13 14 13 17 15 12 13.0 14.0 10.0 16.0 19.0 13.0 17.0 12.0 8.0 7.0 5.0 2.0 -3.0 -7.0 -10.0 -7.0 -1.0 -2.0
28 0 -5 -6 -4 -6 -5 -7.0 -9.0 -10.0 -6.0 -7.0 -10.0 -6.0 2.0 -1.0 -4.0 -5.0 -10.0 -10.0 -10.0 -8.0 1.0 7.0 10.0
29 9 8 2 -2 -5 -9 -11.0 -14.0 -14.0 -12.0 -8.0 -8.0 -8.0 -6.0 -6.0 -8.0 -7.0 -8.0 -9.0 -9.0 -4.0 3.0 1.0 -2.0
30 -1 1 -4 -9 -7 -6 -7.0 -6.0 -9.0 -10.0 -8.0 -4.0 -2.0 -2.0 -3.0 -6.0 -6.0 -7.0 -7.0 -11.0 -10.0 -3.0 -2.0 -1.0
31 Tue Apr 30 15:05:33 UTC 2024] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Get rid of it:
data_jan = data_jan.head(-1)
data_jan['Mean'] = data_jan.iloc[:,1:25].astype(float).mean(axis=1)
df_mean = data_jan[['Date', 'Mean']]
You could also ignore this trailing row while importing from your URL with the skipfooter
option of read_csv
:
data_jan = pd.read_csv(url_jan, sep='s+', header=None,
skipfooter=1, engine='python')
Output df_mean
:
Date Mean
0 DST2401*01RRX020 -4.125000
1 DST2401*02RRX020 -10.458333
2 DST2401*03RRX020 -4.083333
3 DST2401*04RRX020 -9.958333
4 DST2401*05RRX020 -11.833333
5 DST2401*06RRX020 -5.416667
6 DST2401*07RRX020 0.541667
7 DST2401*08RRX020 7.583333
8 DST2401*09RRX020 4.208333
9 DST2401*10RRX020 -0.166667
10 DST2401*11RRX020 -8.625000
11 DST2401*12RRX020 -3.083333
12 DST2401*13RRX020 -2.625000
13 DST2401*14RRX020 -9.083333
14 DST2401*15RRX020 -6.208333
15 DST2401*16RRX020 -3.125000
16 DST2401*17RRX020 0.666667
17 DST2401*18RRX020 -3.208333
18 DST2401*19RRX020 -10.041667
19 DST2401*20RRX020 -6.541667
20 DST2401*21RRX020 -1.583333
21 DST2401*22RRX020 3.208333
22 DST2401*23RRX020 -1.875000
23 DST2401*24RRX020 -0.791667
24 DST2401*25RRX020 4.333333
25 DST2401*26RRX020 -0.291667
26 DST2401*27RRX020 -1.875000
27 DST2401*28RRX020 7.916667
28 DST2401*29RRX020 -4.541667
29 DST2401*30RRX020 -5.291667
30 DST2401*31RRX020 -5.416667