I’m struggling a bit trying to create a lookup table of items that are bought together.
I have a solution in “brute force” Python iterating over a DataFrame, but I’d prefer one in pure Pandas using some kind of aggregate trick if possible.
Here is what I have. I’ve made a small example with a list of invoice numbers and what item that was bought.
import pandas as pd
df = pd.DataFrame({
"Invoice": [1, 1, 2, 3, 4, 4, 4, 5, 5],
"Item": ["Apple", "Pear", "Banana", "Apple", "Apple", "Orange", "Pear", "Apple", "Orange"]
})
df
We can see that invoice number 4 has three items and invoice number 2 has just one.
And here is my solution that I’m not really happy with:
item_keys = []
item_values = []
for i, row in df.iterrows():
invoice = row["Invoice"]
item = row["Item"]
for item_2 in df[df["Invoice"] == invoice]["Item"]:
if item_2 != item:
item_keys.append(item)
item_values.append(item_2)
lookup_table = pd.DataFrame({"key": item_keys, "value": item_values})
lookup_table
In lookup_table I can then search for a key item and gett all items that were sold together with it.
Is there a faster and more elegant way to do this?)
DanneP is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.