Usually you want to read data from a file with spark, even from a set of files to support parallel processing. As already suggested in comments spark.read.csv
is what you should use to read csv file.
I added examples with temporary file, just to give you an inline working example. For real cases I recommend writing a real file.
You can provide a schema into the csv
function or include a header into your file. If no schema is provided, spark will name columns _cN
.
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" n""")
spark.read.csv(fp.name).show()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" n""")
spark.read.csv(fp.name, schema="Name string, Surname string, Address string, Phone string, Email string").show()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Name,Surname,Address,Phone,Emailn""")
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" n""")
spark.read.csv(fp.name, header=True).show()
<code>import tempfile
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" n""")
fp.close()
spark.read.csv(fp.name).show()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" n""")
fp.close()
spark.read.csv(fp.name, schema="Name string, Surname string, Address string, Phone string, Email string").show()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Name,Surname,Address,Phone,Emailn""")
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" n""")
fp.close()
spark.read.csv(fp.name, header=True).show()
</code>
import tempfile
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" n""")
fp.close()
spark.read.csv(fp.name).show()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" n""")
fp.close()
spark.read.csv(fp.name, schema="Name string, Surname string, Address string, Phone string, Email string").show()
with tempfile.NamedTemporaryFile(delete=False) as fp:
fp.write(b"""Name,Surname,Address,Phone,Emailn""")
fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" n""")
fp.close()
spark.read.csv(fp.name, header=True).show()
<code>+-------+-------+----------------+----+--------------------+
| _c0| _c1| _c2| _c3| _c4|
+-------+-------+----------------+----+--------------------+
|Gourav | Joshi |Karnataka, India|NULL|gouravj09@hotmail...|
+-------+-------+----------------+----+--------------------+
+-------+-------+----------------+-----+--------------------+
| Name|Surname| Address|Phone| Email|
+-------+-------+----------------+-----+--------------------+
|Gourav | Joshi |Karnataka, India| NULL|gouravj09@hotmail...|
+-------+-------+----------------+-----+--------------------+
+-------+-------+----------------+-----+--------------------+
| Name|Surname| Address|Phone| Email|
+-------+-------+----------------+-----+--------------------+
|Gourav | Joshi |Karnataka, India| NULL|gouravj09@hotmail...|
+-------+-------+----------------+-----+--------------------+
<code>+-------+-------+----------------+----+--------------------+
| _c0| _c1| _c2| _c3| _c4|
+-------+-------+----------------+----+--------------------+
|Gourav | Joshi |Karnataka, India|NULL|gouravj09@hotmail...|
+-------+-------+----------------+----+--------------------+
+-------+-------+----------------+-----+--------------------+
| Name|Surname| Address|Phone| Email|
+-------+-------+----------------+-----+--------------------+
|Gourav | Joshi |Karnataka, India| NULL|gouravj09@hotmail...|
+-------+-------+----------------+-----+--------------------+
+-------+-------+----------------+-----+--------------------+
| Name|Surname| Address|Phone| Email|
+-------+-------+----------------+-----+--------------------+
|Gourav | Joshi |Karnataka, India| NULL|gouravj09@hotmail...|
+-------+-------+----------------+-----+--------------------+
</code>
+-------+-------+----------------+----+--------------------+
| _c0| _c1| _c2| _c3| _c4|
+-------+-------+----------------+----+--------------------+
|Gourav | Joshi |Karnataka, India|NULL|gouravj09@hotmail...|
+-------+-------+----------------+----+--------------------+
+-------+-------+----------------+-----+--------------------+
| Name|Surname| Address|Phone| Email|
+-------+-------+----------------+-----+--------------------+
|Gourav | Joshi |Karnataka, India| NULL|gouravj09@hotmail...|
+-------+-------+----------------+-----+--------------------+
+-------+-------+----------------+-----+--------------------+
| Name|Surname| Address|Phone| Email|
+-------+-------+----------------+-----+--------------------+
|Gourav | Joshi |Karnataka, India| NULL|gouravj09@hotmail...|
+-------+-------+----------------+-----+--------------------+