I am trying to reader data into Jupyterlab. the challenge I have is that the data has double code around all the fields and the values are in double codes.
I tried removing them via excel. it is tedious task.
from pyspark.sql import SparkSession
import pandas as pd
from pyspark.sql import functions as f
spark = SparkSession.builder.appName(“AppNameEnd”).getOrCreate()
data = ‘C:/Users/username/Downloads/Big-data-hadoop-and-spark-developer-Project-1–master/Big-data-hadoop-and-spark-developer-Project-1–master/BDH Project 1/BDH Project 1/MarketAnalysisData.csv’
Peu Kgaphola is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.