I’m reading data from a MySQL table in Spark. The table structure may like:
CREATE TABLE my_table (
id varchar(64),
content varchar(64),
primary key id(id)
)
And my code in Spark may like:
df = spark.read.format('jdbc').options(
url=f"jdbc:mysql://...",
driver='com.mysql.cj.jdbc.Driver',
dbtable='my_table',
user='',
password='',
partitionColumn='id',
numPartitions=100,
isolationLevel='NONE',
).load()
But seems Spark doesn’t support to use a varchar
column as partitionColumn
. So how can I partition dara in this table when reading by Spark? Consider this table was large.