I have been playing around with the Scikit learn library on python and have made a program that predicts crypto prices. Its fairly accurate and it get the predictions right most of the time but every so often it’ll predict wrong and have a massive fall of sometimes 10% and I lose all of my progress. I would appreciate and help or feedback on my program as im fairly new to python as this is my first real project.
Scikit learn python pipeline:
url = variable[f"{symbol}_url"] #getting the url for current predictions
response = requests.get(url) #using thee requests library
if response.status_code == 200:
data = response.json()
print(data)
if data:
ohlc_data = data[0]
#formating data
variable[f"{symbol}_open_usd"] = ohlc_data.get('open')
variable[f"{symbol}_high_usd"] = ohlc_data.get('high')
variable[f"{symbol}_low_usd"] = ohlc_data.get('low')
variable[f"{symbol}_close_usd"] = ohlc_data.get('close')
variable[f"{symbol}_volume_usd"] = ohlc_data.get('volume')
#closing price pipeline
data = pd.read_csv(variable[f"{symbol}_fileC"])
df = data[['Open', 'High', 'Low', 'Close', 'Volume', 'PredictiveClose']] # variable in .csv file
while variable[f"{symbol}_close_score"] < minimum_threshold: #for a timeoutr after it run over 10000 times and isnt above a good score
timeout_runs_count = timeout_runs_count + 1
X_train, X_test, y_train, y_test = train_test_split(df[['Open', 'High', 'Low', 'Close', 'Volume']], df['PredictiveClose'], test_size=0.25)
model = LinearRegression() # linear regression model from sklearn
model.fit(X_train, y_train)
y_pred = model.predict(X_test) #the prediction testing
model.score(X_test, y_test) #getting the score to see if i rerun the program or run again
today_data = {
'Open': variable[f"{symbol}_open_usd"], # todays data
'High': variable[f"{symbol}_high_usd"],
'Low': variable[f"{symbol}_low_usd"],
'Close': variable[f"{symbol}_close_usd"],
'Volume': variable[f"{symbol}_volume_usd"]
}
today_df = pd.DataFrame([today_data])
variable[f"{symbol}_close_prediction"] = model.predict(today_df[['Open', 'High', 'Low', 'Close', 'Volume']]) #making the prediction
variable[f"{symbol}_close_score"] = model.score(X_test, y_test)
if timeout_runs_count > timeout_runs:
variable[f"{symbol}_close_score"] = 2
print(f'Close Counter: {timeout_runs_count} / {timeout_runs}', end='r')
I have tried experimenting with other models and my program also prints out the high price prediction which is just a duplication of thew program above, any feedback or help would be apreciated.