Relative Content

Tag Archive for pythonpyspark

Multiply a pyspark column with array for each row

I have a pyspark DataFrame with two columns. One is a float and another one is an array.
I know that the length of the array in each row is the same length as the the number of rows.
I want to create a new column in the DataFrame that for each row the result will be the dot product of the array and the column.

Derive quarters column using sales per quarter column in python

How can I convert a table of columns products and sales per Quarter to a table of 3 columns having Products, sales and Quarters using Pyspark. No date filed is defined in the table. The new Quarter column must be derived using the column name in the source table.

add list of dictionary to existing pyspark dataframe as new column in python

i have a list of dictionarys in python that i want to add as new column in pyspark dataframe such as all any single item in the list will tranclete to a cell in the new column.
when i try for example to transform it to pandas add the column and transform it to py spark the data becoume corrupted and i can figure out the correct functions combo to add the data without using pandas i tried using lit but or add a structType to no success
thanks for the help.

Merge PySpark table if condition is true

We try to update our table if a specific condition is fulfilled from the external data load. The code runs into several errors if we try this with our solution.