I know this is a common question and have read through many of them. However, they don’t address my specific situation, so this is a variation of that question, as I’ll explain.
I’m new to Python, having only taken a single online class with MIT. But I’m a learn-as-you-go person, so I begin with other people’s code, modify for my specific needs, research and modify as necessary, and remember that for future use.
I’m having an issue with finding a solution to a problem I’m experiencing with a section of a Python script. I’ll post a picture of the error here as a snapshot of what’s happening, but will follow that up with any appropriate code, what I’ve tried, etc.
enter image description here
This section of the code is building a list of the five most recent games (“matches”) from each of the 32 NFL Teams. It’s drawing from a table in the format of 26 columns by 7,678 rows. Column headers are as follows:
[‘Date’, ‘Name’, ‘PointsScored’, ‘Pass_Attempts’, ‘Passes_Comp’, ‘Pass_Yards’, ‘Pass_Int’, ‘Sacks’, ‘Sack_Yards’, ‘Rush_Attempts’, ‘Rush_Yards’, ‘Penalties’, ‘Penalty_Yards’, ‘Result’, ‘PointsScored’, ‘Pass_Attempts’, ‘Passes_Comp’, ‘Pass_Yards’, ‘Pass_Int’, ‘Sacks’, ‘Sack_Yards’, ‘Rush_Attempts’, ‘Rush_Yards’, ‘Penalties’, ‘Penalty_Yards’, ‘home_or_away’]
Each row is a listing of performance metrics along with comparable statistics for each of the two teams in the game. I can provide more detail on the fields if necessary.
I searched numerous forums for a solution, but couldn’t find any that were specific enough to what I was experiencing to make use of any of them. What I believe makes my question unique is that I WANT to use ‘CIN’ as a string (it’s stored in the ‘Name” field in the table), and not a numeric value. It’s one of the team abbreviations, so it’s necessary for what I want to accomplish.
As I was working through the code, I ran into a similar error (in the same line of code), however that instance was related to the “Date” field. After some research, I realized that the dates in my imported CSV were in the format of dd/mm/yy, which the code wouldn’t accept. I suspect the “/” were the problem (but I’m not sure). So I converted the dates to an actual number and that solved that one (but again, I don’t know why). But now I’m seeing the error above. I reviewed the CSV for any anomalies and found none. I also tried loc in place of iloc, and got the same error.
However I DO know that the WASWASWAS…. in the error message are a concatenation of team abbreviations for “WAShington”. The import source is built using a three-character code rather than the entire team name throughout. In the code below, I highlighted the code in question between two double lines of quote marks (#).
I want to add the CSV for testing, but am uncertain how to attach a file. In a previous question, I was told NOT to include links, so I won’t use that here. So if someone can tell me how to add a file, please do so.
I’ll add as much code as I think is necessary, but it will be limited to what’s currently working. I don’t believe there’s anything later in the code that’s relevant, and when I tried to add the entire script before this one, I couldn’t post it because the editor said it “looked like spam.” But I can provide anything you want if someone let’s me know how.
As mentioned, I resolved a similar error by figuring finding a workaround for the date format issue. I went through the source data thoroughly to assure there were no errors, typos, incorrect character types, etc. I also read quite a few posts on this error but, as I said, they all pertained to converting TO numeric, rather than using a string value as-is.
Following is the code up to the line that’s failing. The very last line is where it fails.
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
# TEAM FEATURE EXTRACTION
# Load the datasets for the 2023 and 2024 seasons that include a "HomeWon" column.
# This column indicates if the home team won the game (1 for win, 0 for loss).
# (Right click source files to get the path and paste them within the quotes below)
data_2023_updated = pd.read_csv(r"C:UsersrichaPythonFootballMy_NFL_Forecaster2023.csv")
data_2024_updated = pd.read_csv(r"C:UsersrichaPythonFootballMy_NFL_Forecaster2024.csv")
#all_data = pd.read_csv(r"/Users/richardcartier/Documents/PythonProjects/My_NFL_Forecaster/MYSource.csv")
# Load the dataset containing the upcoming games schedule.
upcoming_games = pd.read_csv(r"C:UsersrichaPythonFootballMy_NFL_ForecasterThis_Weeks_Schedule.csv")
# Combine the data from the 2023 and 2024 seasons into a single DataFrame.
all_data = pd.concat([data_2023_updated, data_2024_updated])
#print(all_data)
# Now extract columns from the source csv file
raw_match_stats = all_data[[
'Date',
'Season',
'Week',
'Visitor',
'Home',
'Visitor_Score',
'Visitor_Pass_Att',
'Visitor_Pass_Comp',
'Visitor_Pass_Yds',
'Visitor_Pass_TD',
'Visitor_Pass_Int',
'Visitor_Sacks',
'Visitor_Sack_Yds',
'Visitor_Rush_Att',
'Visitor_Rush_Yds',
'Visitor_Rush_TD',
'Visitor_Pen',
'Visitor_Pen_Yds',
'Visitor_Won',
'Visitor_Line',
'Visitor_Covered',
'Home_Score',
'Home_Pass_Att',
'Home_Pass_Comp',
'Home_Pass_Yds',
'Home_Pass_TD',
'Home_Pass_Int',
'Home_Sacks',
'Home_Sack_Yds',
'Home_Rush_Att',
'Home_Rush_Yds',
'Home_Rush_TD',
'Home_Pen',
'Home_Pen_Yds',
'Home_Won',
'Home_Line',
'Home_Covered',
'OU_Line',
'OU_Result'
]]
# Determine number of Wins, Losses and Ties
raw_match_stats.loc[raw_match_stats['Home_Score'] == raw_match_stats['Visitor_Score'], 'home_team_result'] = 0
raw_match_stats.loc[raw_match_stats['Home_Score'] > raw_match_stats['Visitor_Score'], 'home_team_result'] = 1
raw_match_stats.loc[raw_match_stats['Home_Score'] < raw_match_stats['Visitor_Score'], 'home_team_result'] = -1
raw_match_stats.loc[raw_match_stats['Home_Score'] == raw_match_stats['Visitor_Score'], 'away_team_result'] = 0
raw_match_stats.loc[raw_match_stats['Home_Score'] > raw_match_stats['Visitor_Score'], 'away_team_result'] = 1
raw_match_stats.loc[raw_match_stats['Home_Score'] < raw_match_stats['Visitor_Score'], 'away_team_result'] = -1
# Split the raw_match_stats to two datasets (home_team_stats and away_team_stats)
home_team_stats = raw_match_stats[[
'Date',
'Home',
'Home_Score',
'Home_Pass_Att',
'Home_Pass_Comp',
'Home_Pass_Yds',
'Home_Pass_TD',
'Home_Pass_Int',
'Home_Sacks',
'Home_Sack_Yds',
'Home_Rush_Att',
'Home_Rush_Yds',
'Home_Rush_TD',
'Home_Pen',
'Home_Pen_Yds',
'Home_Won',
'Visitor_Score',
'Visitor_Pass_Att',
'Visitor_Pass_Comp',
'Visitor_Pass_Yds',
'Visitor_Pass_TD',
'Visitor_Pass_Int',
'Visitor_Sacks',
'Visitor_Sack_Yds',
'Visitor_Rush_Att',
'Visitor_Rush_Yds',
'Visitor_Rush_TD',
'Visitor_Pen',
'Visitor_Pen_Yds',
'Visitor_Won'
]]
home_team_stats = home_team_stats.rename(columns={'Home':'Name',
'Home_Score':'Points_Scored',
'Home_Pass_Att':'Pass_Attempts',
'Home_Pass_Comp':'Passes_Comp',
'Home_Pass_Yds':'Pass_Yards',
'Home_Pass_TD':'Pass_TD',
'Home_Pass_Int':'Pass_Int',
'Home_Sacks':'Sacks',
'Home_Sack_Yds':'Sack_Yards',
'Home_Rush_Att':'Rush_Attempts',
'Home_Rush_Yds':'Rush_Yards',
'Home_Rush_TD':'Rush_TD',
'Home_Pen':'Penalties',
'Home_Pen_Yds':'Penalty_Yards',
'Home_Won':'Result',
'Visitor_Score':'Points_Scored',
'Visitor_Pass_Att':'Pass_Attempts',
'Visitor_Pass_Comp':'Passes_Comp',
'Visitor_Pass_Yds':'Pass_Yards',
'Visitor_Pass_TD':'Pass_TD',
'Visitor_Pass_Int':'Pass_Int',
'Visitor_Sacks':'Sacks',
'Visitor_Sack_Yds':'Sack_Yards',
'Visitor_Rush_Att':'Rush_Attempts',
'Visitor_Rush_Yds':'Rush_Yards',
'Visitor_Rush_TD':'Rush_TD',
'Visitor_Pen':'Penalties',
'Visitor_Pen_Yds':'Penalty_Yards',
'Visitor_Won':'Result'
})
away_team_stats = raw_match_stats[[
'Date',
'Visitor',
'Visitor_Score',
'Visitor_Pass_Att',
'Visitor_Pass_Comp',
'Visitor_Pass_Yds',
'Visitor_Pass_TD',
'Visitor_Pass_Int',
'Visitor_Sacks',
'Visitor_Sack_Yds',
'Visitor_Rush_Att',
'Visitor_Rush_Yds',
'Visitor_Rush_TD',
'Visitor_Pen',
'Visitor_Pen_Yds',
'Visitor_Won',
'Home_Score',
'Home_Pass_Att',
'Home_Pass_Comp',
'Home_Pass_Yds',
'Home_Pass_TD',
'Home_Pass_Int',
'Home_Sacks',
'Home_Sack_Yds',
'Home_Rush_Att',
'Home_Rush_Yds',
'Home_Rush_TD',
'Home_Pen',
'Home_Pen_Yds',
'Home_Won'
]]
away_team_stats = away_team_stats.rename(columns={'Visitor':'Name',
'Visitor_Score':'Points_Scored',
'Visitor_Pass_Att':'Pass_Attempts',
'Visitor_Pass_Comp':'Passes_Comp',
'Visitor_Pass_Yds':'Pass_Yards',
'Visitor_Pass_TD':'Pass_TD',
'Visitor_Pass_Int':'Pass_Int',
'Visitor_Sacks':'Sacks',
'Visitor_Sack_Yds':'Sack_Yards',
'Visitor_Rush_Att':'Rush_Attempts',
'Visitor_Rush_Yds':'Rush_Yards',
'Visitor_Rush_TD':'Rush_TD',
'Visitor_Pen':'Penalties',
'Visitor_Pen_Yds':'Penalty_Yards',
'Visitor_Won':'Result',
'Home_Score':'Points_Scored',
'Home_Pass_Att':'Pass_Attempts',
'Home_Pass_Comp':'Passes_Comp',
'Home_Pass_Yds':'Pass_Yards',
'Home_Pass_TD':'Pass_TD',
'Home_Pass_Int':'Pass_Int',
'Home_Sacks':'Sacks',
'Home_Sack_Yds':'Sack_Yards',
'Home_Rush_Att':'Rush_Attempts',
'Home_Rush_Yds':'Rush_Yards',
'Home_Rush_TD':'Rush_TD',
'Home_Pen':'Penalties',
'Home_Pen_Yds':'Penalty_Yards',
'Home_Won':'Result'
})
# add an additional column to denote whether the team is playing at home or away - this will help us later
home_team_stats['home_or_away']= 1
away_team_stats['home_or_away']= 0
# stack these two datasets so that each row is the stats for a team for one match (team_stats_per_match)
team_stats_per_match = pd.concat([home_team_stats,away_team_stats])
print(team_stats_per_match)
print(team_stats_per_match.columns.tolist())
# Export the predictions to a CSV file
team_stats_per_match.to_csv('predictions.csv', index=False)
# At each row of this dataset, get the team name, find the stats for that team during the last 10 games, and average these stats (avg_stats_per_team).
avg_stat_columns = ['points_per_game','pass_att_per_game','pass_comp_per_game','pass_yds_per_game','rush_att_per_game', 'rush_yds_per_game']
stats_list = []
for index, row in team_stats_per_match.iterrows():
team_stats_last_five_matches = team_stats_per_match.loc[(team_stats_per_match['Name']==row['Name']) & (team_stats_per_match['Date']<row['Date'])].sort_values(by=['Date'], ascending=False)
# A Pandas axis refers to the data in rows (axis = 1) or columns (axis = 0)
stats_list.append(team_stats_last_five_matches.iloc[0:5,:].mean(axis=0).values[0:6])
2