I’m trying to retrieve the data from Azure Event Hub using pysprak. the code just keeps running but doesn’t display any data
EH_CONN_STR = 'Endpoint=sb://event-hub-18-jul.servicebus.windows.net/;SharedAccessKeyName=eh_policy_18_july;SharedAccessKey=TOc+O/+U+QuuZ5R33HsiwUjsc1C8qRhCy+AEhFxkLRE=;EntityPath=ehub-18'
EH_NAMESPACE = 'event-hub-18-jul'
EH_NAME ='ehub-18'
KAFKA_OPTIONS = {
"kafka.bootstrap.servers" : f"{EH_NAMESPACE}.servicebus.windows.net:9093",
"subscribe" : EH_NAME,
"kafka.sasl.mechanism" : "PLAIN",
"kafka.security.protocol" : "SASL_SSL",
"kafka.sasl.jaas.config" : f"kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="{EH_CONN_STR}";",
"kafka.request.timeout.ms": "60000",
"kafka.session.timeout.ms": "30000",
"kafka.metadata.max.age.ms": "10000"
}
df = spark.read.format("kafka").options(**KAFKA_OPTIONS).load()
display(df)
Below is the snapshot of event hub data that I generated with yellow taxi and connection sting
“Endpoint=sb://event-hub-18-jul.servicebus.windows.net/;SharedAccessKeyName=eh_policy_18_july;SharedAccessKey=TOc+O/+U+QuuZ5R33HsiwUjsc1C8qRhCy+AEhFxkLRE=;EntityPath=ehub-18”
Could you please let me know if there is anything wrong in the way I’m reading the data?
First you need to check whether you enabled the kafka source in your event hub
If not enabled please enable it.
You enable only while deploying the event hub resource by selecting the
pricing tier as premium.
Next, after enabling use your code.
Also, you can use the event hub library to read data from event hub.
azure-eventhubs-spark_2.12
Import it from maven coordinate and use below code.
connectionString = "YOUR.CONNECTION.STRING"
ehConf = {}
ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)
df = spark
.readStream
.format("eventhubs")
.options(**ehConf)
.load()
For more information follow this documentation for more information.