I’m working on inserting data from a Pandas DataFrame into a PostgreSQL table using Python. The table structure is as follows:
CREATE TABLE sales (
id BIGINT NOT NULL, -- Primary Key
tahun INTEGER NOT NULL,
bulan NUMERIC NOT NULL,
v_kod VARCHAR(10) NOT NULL,
o_kod VARCHAR(10) NOT NULL,
amaun_rm NUMERIC NOT NULL,
dt_updated TIMESTAMP WITHOUT TIME ZONE NOT NULL
);
Here is my Python code:
import pandas as pd
import psycopg2
# Load the CSV data
forecast_results = pd.read_csv("sales.csv")
# Filter the DataFrame to include only rows from September 2024 onward
filtered_forecast_results = forecast_results[(forecast_results['tahun'] > 2024) |
((forecast_results['tahun'] == 2024) & (forecast_results['bulan'] >= 9))]
# Define the vot_kod value to be inserted
vot_kod_value = 'AAA'
# Connect to PostgreSQL
conn = psycopg2.connect(
dbname="my_database",
user="my_user",
password="my_password",
host="localhost",
port="5432"
)
cur = conn.cursor()
for index, row in filtered_forecast_results.iterrows():
# Convert the year and month to integers, but keep o_kod as a string
tahun = int(row['tahun'])
bulan = int(row['bulan'])
o_kod = str(row['o_kod '])
# Check if the row already exists
cur.execute("""
SELECT 1 FROM sales
WHERE tahun = %s AND bulan = %s AND v_kod = %s AND o_kod = %s
""", (tahun, bulan, v_kod_value, o_kod))
exists = cur.fetchone()
if not exists:
# If the row does not exist, insert it
sql_query = """
INSERT INTO sales (tahun, bulan, v_kod, o_kod, amaun_rm, dt_updated)
VALUES (%s, %s, %s, %s, %s, NOW())
"""
values = (
tahun,
bulan,
v_kod_value,
o_kod,
round(row['predicted_amaun_rm'], 2)
)
cur.execute(sql_query, values)
# Commit the transaction
conn.commit()
# Close the cursor and connection
cur.close()
conn.close()
When I run this code, I encounter the following error:
NotNullViolation: null value in column "id" of relation "sales" violates not-null constraint
DETAIL: Failing row contains (null, 2024, 9, AAA, 123, 2931.48, 2024-08-16 08:39:52.462847).
What I’ve Tried:
- Excluding the id Column: I tried to exclude the id column from the INSERT statement, assuming PostgreSQL would auto-generate it. However, this led to the NotNullViolation error.
- Manual ID Generation: I haven’t manually specified an id because I thought it should be auto-incremented.
Questions:
- How can I properly insert the data while ensuring that the id column is populated correctly?
- Should the id column be set up as an auto-incrementing column in PostgreSQL? If so, how can I modify the table to achieve this?
- Is there a way to fetch and use the next available ID value within my Python code before insertion?
Any advice or solutions to handle the id column properly in this scenario would be greatly appreciated!