I am trying to figure out how to replace ‘skills’ that immediately follow ‘skills-specialization’. Here is what the data looks like. At least the column I am interested in.
Data:
group |
---|
skills |
skills |
skills |
skills |
skills |
skills |
skills |
skills |
skills-specialization |
skills |
skills |
skills |
job profile |
job description |
Here is what I’ve done so far. But as you can see, it only replaced two skills.
# Define the data
data = [("skills",), ("skills",), ("skills",), ("skills",), ("skills",),
("skills",), ("skills",), ("skills",), ("skills-specialization",),
("skills",), ("skills",), ("skills",), ("job profile",), ("job description",)]
# Create a DataFrame named 'df'
df = spark.createDataFrame(data, ["group"])
# Add a new column 'partition' indicating presence of "skills-specialization"
df = df.withColumn("partition", F.when(F.col("group") == "skills-specialization", 1).otherwise(0))
# Assign row numbers within each partition
windowSpec = Window.orderBy(F.monotonically_increasing_id())
df = df.withColumn("row_num", F.row_number().over(windowSpec))
# Create a new column 'replace' to mark rows where "skills" follows "skills-specialization"
df = df.withColumn("replace", F.when((F.col("group") == "skills") & (F.lag(F.col("group")).over(windowSpec) == "skills-specialization"), "skills-specialization").otherwise(F.col("group")))
# Update 'replace' column for all rows after the first "skills"
df = df.withColumn("replace", F.when(F.col("replace") == "skills-specialization", "skills-specialization").otherwise(F.col("replace")))
# Select the updated 'replace' column and drop temporary columns
df = df.select("replace").drop("partition", "row_num")
# Display the DataFrame
df.display()
My result:
group |
---|
skills |
skills |
skills |
skills |
skills |
skills |
skills |
skills |
skills-specialization |
skills-specialization |
skills |
skills |
job profile |
job description |
My goal is to get something like this:
End goal:
group |
---|
skills |
skills |
skills |
skills |
skills |
skills |
skills |
skills |
skills-specialization |
skills-specialization |
skills-specialization |
skills-specialization |
job profile |
job description |
New contributor
Nagi Tamana is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.