I keep experecing a KeyError: ‘Column not found: AVG’ within a CSV dataset.
This column does exist, and I have even changed the name to AVG, and I get KeyError: ‘Column not found: USSTHPI’ and again KeyError: ‘Column not found: AVG’ which is the old name of the column ‘USSTHPI’ and now the new name of the column ‘AVG’.
I am using Jupyter Notebook and I even tried in VSCode – same error presists.
Here is my code below:
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
income_data = pd.read_csv('/Users/adil/Desktop/SQL/house_poor_CSV/USSTHPI.csv')
mortgage_rate_data = pd.read_csv('/Users/adil/Desktop/SQL/house_poor_CSV/MORTGAGE30US.csv')
house_price_data = pd.read_csv('/Users/adil/Desktop/SQL/house_poor_CSV/MEHOINUSA672N.csv')
# Display the first few rows of each dataframe
income_data.head(), mortgage_rate_data.head(), house_price_data.head()
# Convert DATE columns to datetime and filter the relevant years
income_data['DATE'] = pd.to_datetime(income_data['DATE'])
mortgage_rate_data['DATE'] = pd.to_datetime(mortgage_rate_data['DATE'])
house_price_data['DATE'] = pd.to_datetime(house_price_data['DATE'])
income_data = income_data[(income_data['DATE'].dt.year >= 1985) & (income_data['DATE'].dt.year <= 2024)]
mortgage_rate_data = mortgage_rate_data[(mortgage_rate_data['DATE'].dt.year >= 1985) & (mortgage_rate_data['DATE'].dt.year <= 2024)]
house_price_data = house_price_data[(house_price_data['DATE'].dt.year >= 1985) & (house_price_data['DATE'].dt.year <= 2024)]
# Calculate the annual mean for mortgage rates and house prices
mortgage_rate_data['YEAR'] = mortgage_rate_data['DATE'].dt.year
annual_mortgage_rates = mortgage_rate_data.groupby('YEAR')['MORTGAGE30US'].mean().reset_index()
house_price_data['YEAR'] = house_price_data['DATE'].dt.year
annual_house_prices = house_price_data.groupby('YEAR')['AVG'].mean().reset_index()
# Merge the data
combined_data = pd.merge(income_data, annual_mortgage_rates, left_on=income_data['DATE'].dt.year, right_on='YEAR')
full_data = pd.merge(combined_data, annual_house_prices, on='YEAR')
# Calculate mortgage payments
full_data['loan_amount'] = full_data['USSTHPI'] * 1000 - 20000 # Adjusting house prices scale and subtracting down payment
loan_term = 30
n_payments = loan_term * 12
full_data['monthly_interest_rate'] = full_data['MORTGAGE30US'] / 100 / 12
full_data['monthly_payment'] = (full_data['loan_amount'] * full_data['monthly_interest_rate'] *
(1 + full_data['monthly_interest_rate'])**n_payments) / (
((1 + full_data['monthly_interest_rate'])**n_payments) - 1)
full_data['annual_payment'] = full_data['monthly_payment'] * 12
full_data['income_percentage'] = (full_data['annual_payment'] / full_data['MEHOINUSA672N']) * 100
full_data['income_percentage'] = full_data['income_percentage'].clip(lower=0)
# Visualization
plt.figure(figsize=(12, 6))
plt.plot(full_data['DATE'], full_data['income_percentage'], marker='o', linestyle='-', color='b')
plt.title('Percentage of Median Household Income Spent on Mortgage Payments (1985-2024)')
plt.xlabel('Year')
plt.ylabel('Percentage of Income Spent on Mortgage')
plt.grid(True)
plt.show()
Tried changing column name, splitting up cells, etc.
Shady is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.