Thiết kế website giá rẻ

Question

When reading a CSV file as a pandas dataframe, an error is raised when trying to select a subset of columns based on original column names (usecols=) and renaming the selected columns (names=). Passing renamed column names to usecols works, but all columns must be passed to names to correctly select columns.

<code># read the entire CSV

df1a = pd.read_csv(folder_csv+'test_read_csv.csv')

# select a subset of columns while reading the CSV

df1b = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3'])

# rename columns while reading the CSV

df1c = pd.read_csv(folder_csv+'test_read_csv.csv', names=['first', 'second', 'third'], header=0)

# select a subset of columns and rename them while reading the CSV;

# throws error "ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1']"

df1d = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3'], names=['first','third'])

# selects columns 1 and 2, calling them 1 and 3

df1e = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','third'])

# selects columns 1 and 3 correctly

df1f = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','second','third'])

</code>

<code># read the entire CSV df1a = pd.read_csv(folder_csv+'test_read_csv.csv') # select a subset of columns while reading the CSV df1b = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3']) # rename columns while reading the CSV df1c = pd.read_csv(folder_csv+'test_read_csv.csv', names=['first', 'second', 'third'], header=0) # select a subset of columns and rename them while reading the CSV; # throws error "ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1']" df1d = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3'], names=['first','third']) # selects columns 1 and 2, calling them 1 and 3 df1e = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','third']) # selects columns 1 and 3 correctly df1f = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','second','third']) </code>

# read the entire CSV
df1a = pd.read_csv(folder_csv+'test_read_csv.csv')
# select a subset of columns while reading the CSV
df1b = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3'])
# rename columns while reading the CSV
df1c = pd.read_csv(folder_csv+'test_read_csv.csv', names=['first', 'second', 'third'], header=0)

# select a subset of columns and rename them while reading the CSV;
# throws error "ValueError: Usecols do not match columns, columns expected but not found: ['Col3', 'Col1']"
df1d = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['Col1','Col3'], names=['first','third'])

# selects columns 1 and 2, calling them 1 and 3
df1e = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','third'])
# selects columns 1 and 3 correctly
df1f = pd.read_csv(folder_csv+'test_read_csv.csv', usecols=['first','third'], names=['first','second','third'])

The CSV file test_read_csv.csv is:

<code>Col1,Col2,Col3

val1a,val2a,val3a

val1b,val2b,val3b

val1c,val2c,val3c

val1d,val2d,val3d

val1e,val2e,val3e

</code>

Col1,Col2,Col3
val1a,val2a,val3a
val1b,val2b,val3b
val1c,val2c,val3c
val1d,val2d,val3d
val1e,val2e,val3e

Wouldn’t it be a fairly common use case to select certain columns based on the original column names and then renaming only those columns?

Of course, it is possible to select the columns and rename them after loading the entire CSV file:

<code>df1 = df1[['Col1','Col3']]

df1.columns = ['first', 'third']

</code>

<code>df1 = df1[['Col1','Col3']] df1.columns = ['first', 'third'] </code>

df1 = df1[['Col1','Col3']]
df1.columns = ['first', 'third']

But I don’t know how and whether this can be integrated directly when reading the data. The same holds also for pd.read_excel().

Thiết kế website giá rẻ

Danh mục

Pandas read csv simultaneously passing usecols and names args