I am doing some work with Hadoop, Hive and Spark, in which I need to query tables that I create in Hive. I am trying to connect to Hive from a jupiter notebook with the pyspark library as follows:
import findspark
findspark.init()
from pyspark.sql import SparkSession
from pyspark.conf import SparkConf
spark = SparkSession.builder.appName("laboratorio2").config("hive.metastore.uris", "thrift://localhost:9083", conf=SparkConf()).enableHiveSupport().getOrCreate()
spark.sql("show databases").show()
This is what I return to the show databases
:
+------------+
|databaseName|
+------------+
| default|
+------------+
but actually I have one more database, this is the direct query in Hive:
I’m not sure if I’m configuring the SparkSession
correctly. I still try to create the SparkSession
without any configuration, but it has the same behaviour. does anyone have any idea how I can connect to the databases I already have created? The version of Spark I’m using is 2.4.7, and the environment I’m working in is a virtual machine with ubuntu server installed.