I encountered an issue while using the latest version of Hive, 4.0.0. I am not sure if this is a bug. The details are as follows: I am using Hadoop version 3.3.6 and have set up a fully distributed cluster. The Hive version is 4.0.0, with one master server and one client. Currently, I am connecting to Hive via the Beeline client. When executing a join in Hive, I encountered the following error:
0: jdbc:hive2://node1:10000> SELECT emp.emp_id, emp.emp_name, dept.dept_name. . . . . . . . . . . . . .> FROM emp. . . . . . . . . . . . . .> JOIN dept ON emp.dept_id = dept.dept_id;INFO : Compiling command(queryId=root_20240619122107_48e50dbd-9e12-4743-9652-f266165811bc): SELECT emp.emp_id, emp.emp_name, dept.dept_nameFROM empJOIN dept ON emp.dept_id = dept.dept_idINFO : No Stats for default@emp, Columns: emp_name, dept_id, emp_idINFO : No Stats for default@dept, Columns: dept_name, dept_idINFO : Semantic Analysis Completed (retrial = false)INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:emp.emp_id, type:int, comment:null), FieldSchema(name:emp.emp_name, type:string, comment:null), FieldSchema(name:dept.dept_name, type:string, comment:null)], properties:null)INFO : Completed compiling command(queryId=root_20240619122107_48e50dbd-9e12-4743-9652-f266165811bc); Time taken: 0.894 secondsINFO : Concurrency mode is disabled, not creating a lock managerINFO : Executing command(queryId=root_20240619122107_48e50dbd-9e12-4743-9652-f266165811bc): SELECT emp.emp_id, emp.emp_name, dept.dept_nameFROM empJOIN dept ON emp.dept_id = dept.dept_idWARN : Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez) or using Hive 1.X releases.INFO : Query ID = root_20240619122107_48e50dbd-9e12-4743-9652-f266165811bcINFO : Total jobs = 1INFO : Starting task [Stage-4:MAPREDLOCAL] in serial modeERROR : Execution failed with exit status: 1ERROR : Obtaining error informationERROR :Task failed!Task ID: Stage-4Logs:ERROR : /tmp/root/hive.logERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTaskINFO : Completed executing command(queryId=root_20240619122107_48e50dbd-9e12-4743-9652-f266165811bc); Time taken: 4.461 secondsError: Error while compiling statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask (state=08S01,code=1)
I checked the log on the Hive server at /tmp/root/hive.log and found the following error information:
2024-06-19T11:27:54,947 INFO [HiveServer2-Background-Pool: Thread-110] common.LogUtils: Unregistered logging context.2024-06-19T11:27:54,947 ERROR [HiveServer2-Background-Pool: Thread-110] operation.SQLOperation: Error running hive queryorg.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:376) ~[hive-service-4.0.0.jar:4.0.0]at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:249) ~[hive-service-4.0.0.jar:4.0.0]at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90) ~[hive-service-4.0.0.jar:4.0.0]at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336) ~[hive-service-4.0.0.jar:4.0.0]at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_181]at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) ~[hadoop-common-3.3.6.jar:?]at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356) ~[hive-service-4.0.0.jar:4.0.0]at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_181]at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_181]at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181]at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]2024-06-19T11:27:54,960 INFO [00b77a5e-7cec-4e10-a0ae-f94450a56ab0 HiveServer2-Handler-Pool: Thread-54] operation.OperationManager: Closing operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=9c5817e6-131a-4e60-bfe3-faa066a61119]2024-06-19T11:27:54,960 INFO [00b77a5e-7cec-4e10-a0ae-f94450a56ab0 HiveServer2-Handler-Pool: Thread-54] operation.OperationManager: Removed queryId: root_20240619112748_9448d222-0018-4dd7-ba8a-4c30eba4ddd8 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=9c5817e6-131a-4e60-bfe3-faa066a61119] with tag: null2024-06-19T11:27:54,961 INFO [00b77a5e-7cec-4e10-a0ae-f94450a56ab0 HiveServer2-Handler-Pool: Thread-54] ql.Context: Deleting scratch dir: hdfs://mycluster/tmp/hive/root/00b77a5e-7cec-4e10-a0ae-f94450a56ab0/hive_2024-06-19_11-27-48_655_7337980361533661344-12024-06-19T11:27:54,961 INFO [00b77a5e-7cec-4e10-a0ae-f94450a56ab0 HiveServer2-Handler-Pool: Thread-54] cleanup.EventualCleanupService: Delete hdfs://mycluster/tmp/hive/root/00b77a5e-7cec-4e10-a0ae-f94450a56ab0/hive_2024-06-19_11-27-48_655_7337980361533661344-1 operation was queued2024-06-19T11:27:54,961 INFO [00b77a5e-7cec-4e10-a0ae-f94450a56ab0 HiveServer2-Handler-Pool: Thread-54] ql.Context: Deleting scratch dir: file:/tmp/root/00b77a5e-7cec-4e10-a0ae-f94450a56ab0/hive_2024-06-19_11-27-48_655_7337980361533661344-62024-06-19T11:27:54,961 INFO [00b77a5e-7cec-4e10-a0ae-f94450a56ab0 HiveServer2-Handler-Pool: Thread-54] cleanup.EventualCleanupService: Delete file:/tmp/root/00b77a5e-7cec-4e10-a0ae-f94450a56ab0/hive_2024-06-19_11-27-48_655_7337980361533661344-6 operation was queued2024-06-19T11:27:54,961 INFO [00b77a5e-7cec-4e10-a0ae-f94450a56ab0 HiveServer2-Handler-Pool: Thread-54] ql.Context: Deleting scratch dir: file:/tmp/root/00b77a5e-7cec-4e10-a0ae-f94450a56ab0/hive_2024-06-19_11-27-48_655_7337980361533661344-12024-06-19T11:27:54,961 INFO [00b77a5e-7cec-4e10-a0ae-f94450a56ab0 HiveServer2-Handler-Pool: Thread-54] cleanup.EventualCleanupService: Delete file:/tmp/root/00b77a5e-7cec-4e10-a0ae-f94450a56ab0/hive_2024-06-19_11-27-48_655_7337980361533661344-1 operation was queued2024-06-19T11:27:54,961 INFO [00b77a5e-7cec-4e10-a0ae-f94450a56ab0 HiveServer2-Handler-Pool: Thread-54] operation.SQLOperation: Closing operation log /tmp/root/operation_logs/00b77a5e-7cec-4e10-a0ae-f94450a56ab0/root_20240619112748_9448d222-0018-4dd7-ba8a-4c30eba4ddd8 without delay2024-06-19T11:27:54,970 INFO [EventualCleanupService thread 0] cleanup.EventualCleanupService: Deleted file:/tmp/root/00b77a5e-7cec-4e10-a0ae-f94450a56ab0/hive_2024-06-19_11-27-48_655_7337980361533661344-12024-06-19T11:27:54,971 INFO [EventualCleanupService thread 3] cleanup.EventualCleanupService: Deleted file:/tmp/root/00b77a5e-7cec-4e10-a0ae-f94450a56ab0/hive_2024-06-19_11-27-48_655_7337980361533661344-62024-06-19T11:27:54,989 INFO [EventualCleanupService thread 9] cleanup.EventualCleanupService: Deleted hdfs://mycluster/tmp/hive/root/00b77a5e-7cec-4e10-a0ae-f94450a56ab0/hive_2024-06-19_11-27-48_655_7337980361533661344-12024-06-19T11:27:56,432 INFO [NotificationEventPoll 0] HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_config_value: name=metastore.batch.retrieve.max defaultValue=50 2024-06-19T11:28:02,946 INFO [Scheduled Query Poller] HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=scheduled_query_poll2024-06-19T11:28:56,373 INFO [NotificationEventPoll 0] HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_config_value: name=metastore.batch.retrieve.max defaultValue=50
Through my research, I found some answers suggesting that setting set hive.auto.convert.join=false could resolve the issue. I tried this, and it indeed worked. However, my question is why the default setting hive.auto.convert.join=true does not work in Hive 4.0.0? I also tested the same in Hive 3.x versions and did not encounter any issues. Therefore, I am unsure about what parameters should be set to use mapjoin in Hive 4.0.0.
user25629049 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.