I have a multiline flat file which I wish to convert to an rdd/dataframe as a 4 column dataframe, or rdd array via PySpark. The Spark code is,
from pyspark import SparkContext
import org.apache.spark.mllib.rdd.RDDFunctions._
path = '/mypath/file'
rdd = spark.sparkContext.textFile(path).sliding(4, 4).toDF("x", "y", "z", "a")
There is not a sliding()
function in PySpark. What is the equivalent? The input is
A
B
C
D
A2
B2
C2
D2
The desired output is
x | y | z | a |
---|---|---|---|
A | B | C | D |
A2 | B2 | C2 | D2 |