I’m using DBeaver to connect to a Hadoop environment with Hive (port 10000) and Spark (port 10001). I’m encountering two issues:
- Spark SQL LIMIT syntax error:
When connecting to Spark and checking the “Use SQL to limit fetch size” option in DBeaver, it generates the following SQL:
SELECT x.* FROM junglescout.sales_estimates_v2_latest x
LIMIT 0, 200
This results in a syntax error:
SQL 错误 [42601]: org.apache.hive.service.cli.HiveSQLException: Error running query: [PARSE_SYNTAX_ERROR] org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near ','.(line 2, pos 7)
![DBeaver Spark SQL Error][]
How can I modify DBeaver’s behavior to generate correct Spark SQL syntax when this option is checked?
- Memory issues with large result sets:
When the “Use SQL to limit fetch size” option is unchecked, queries with large result sets cause errors and high memory usage (10GB+) on the server. The memory usage remains high even after the query errors out.
Is this memory usage caused by the Thrift server?
Is there a way to clear this memory without restarting the Thrift server?
How can I handle large result sets more efficiently in this setup?
Environment:
DBeaver version: 24.0.4
Spark version: 3.5.1
Hive version: 3.1.3
Hadoop version: 3.3.6
What I’ve tried:
For problem 1, I found a related issue on GitHub (https://github.com/dbeaver/dbeaver/issues/9725), but I’m still experiencing the problem despite it being marked as fixed.
For problem 2, I tried clearing the Spark RDD cache as suggested by ChatGPT, but it didn’t work:scala spark.sparkContext.clearCache() spark.sparkContext.getPersistentRDDs.foreach { case (_, rdd) => rdd.unpersist() }
Any help or suggestions would be greatly appreciated.
JasonDylan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.