I have an existing Postgres SQL table with some features.
I want to use spark to :
- Read that table
- Create some additional columns
- Add those columns to the table.
Is there any way to make spark add the column to an existing database, without overriding the entire table, and without using additional libraries?
So far, the only ways I could think of to add the new columns to the database using spark were:
- To make spark add the columns to a SQL table by replacing the table with df.write.mode(‘override’), but that’s dangerous and inefficient.
- To write a transformer that uses psycopg and inserts the new column to the database, and then include the transformer at the end of the pipeline. It does seem like a better alternative, but it does feel a bit convoluted.