Suppose that I have a pandas data frame A with columns called user_id and history where history is an array of ints. And the possible histories are bounded from above by 2000. I need to iterate through all rows of A, for each history b = [b1, b2, b3, …, bn]. All bi’s are unique(only appearing once in the array). I need to find all possible triples (bi, bj, bk) such that i < j < k, and count the occurrences of all such triples.
import pandas as pd
from itertools import combinations
# Example DataFrame A (replace this with your actual DataFrame)
import pandas as pd
from collections import defaultdict
# Example DataFrame A (replace this with your actual DataFrame)
A = pd.DataFrame({
'user_id': [1, 2, 3],
'history': [[1, 2, 3], [2, 3, 4], [1, 3, 5]]
})
# Initialize a dictionary to store counts of triples
triple_counts = defaultdict(int)
# Iterate over each row of A
for index, row in A.iterrows():
history = row['history']
n = len(history)
# Iterate over all triples (bi, bj, bk) where i < j < k
for i in range(n):
for j in range(i + 1, n):
for k in range(j + 1, n):
bi = history[i]
bj = history[j]
bk = history[k]
# Ensure bi < bj < bk
if bi < bj < bk:
triple_counts[(bi, bj, bk)] += 1
# Output the counts of all triples
for triple, count in triple_counts.items():
print(f"Triple {triple}: Count = {count}")
The issue with this approach is that A is an extremely large data frame and each row has a complexity of O(n^3) so it takes forever to complete this computation. Is there faster way to do this possibly leveraging pytorch or tensor operations?