Thiết kế website giá rẻ

Question

I have a log table which contains changes. Sign + means addition, sign – means deletion.

<code>import pandas as pd

history = pd.DataFrame({

"First":

["X", "Y", "Y", "X", "X", "Y", "Z"],

"Last":

["X", "Y", "Y", "X", "X", "Y", "A"],

"Change":

["+", "-", "+", "-", "+", "+", "-"],

"Date":

["2024-05-01", "2024-06-01", "2024-06-01", "2024-05-03",

"2024-05-02", "2024-06-02", "2024-06-01"]

})

history = history.sort_values(by=["Date", "Change"])

# sort needed to process the entries anachronically

</code>

<code>import pandas as pd history = pd.DataFrame({ "First": ["X", "Y", "Y", "X", "X", "Y", "Z"], "Last": ["X", "Y", "Y", "X", "X", "Y", "A"], "Change": ["+", "-", "+", "-", "+", "+", "-"], "Date": ["2024-05-01", "2024-06-01", "2024-06-01", "2024-05-03", "2024-05-02", "2024-06-02", "2024-06-01"] }) history = history.sort_values(by=["Date", "Change"]) # sort needed to process the entries anachronically </code>

import pandas as pd

history = pd.DataFrame({
    "First": 
        ["X", "Y", "Y", "X", "X", "Y", "Z"],
    "Last": 
         ["X", "Y", "Y", "X", "X", "Y", "A"],
    "Change":
        ["+", "-", "+", "-", "+", "+", "-"],
    "Date": 
        ["2024-05-01", "2024-06-01", "2024-06-01", "2024-05-03",
         "2024-05-02", "2024-06-02", "2024-06-01"]
})
    
history = history.sort_values(by=["Date", "Change"])
# sort needed to process the entries anachronically

This produces

<code> First Last Change Date

0 X X + 2024-05-01

4 X X + 2024-05-02

3 X X - 2024-05-03

2 Y Y + 2024-06-01

1 Y Y - 2024-06-01

6 Z A - 2024-06-01

5 Y Y + 2024-06-02

</code>

<code> First Last Change Date 0 X X + 2024-05-01 4 X X + 2024-05-02 3 X X - 2024-05-03 2 Y Y + 2024-06-01 1 Y Y - 2024-06-01 6 Z A - 2024-06-01 5 Y Y + 2024-06-02 </code>

  First Last Change        Date
0     X    X      +  2024-05-01
4     X    X      +  2024-05-02
3     X    X      -  2024-05-03
2     Y    Y      +  2024-06-01
1     Y    Y      -  2024-06-01
6     Z    A      -  2024-06-01
5     Y    Y      +  2024-06-02

In next step, I want to display only what is currently available.

The last available sign needs to be + for item to be available.
- Available: +, ++, +-+, -+, –+, etc.
- Not available: -, ++-, +– etc.
Item is a combination of columns First and Last
Sorting is done by Date and Change

I build this logic using iteration which is very slow. Basically

<code>latest = {}

item_columns = [

"First",

"Last",

]

for _, row in history.iterrows():

key = tuple(row[column] for column in item_columns )

if row["Change"] == "+":

latest[key] = row

elif row["Change"] == "-" and key in state:

del latest[key]

today = pd.DataFrame(latest.keys(), columns=item_columns)

</code>

<code>latest = {} item_columns = [ "First", "Last", ] for _, row in history.iterrows(): key = tuple(row[column] for column in item_columns ) if row["Change"] == "+": latest[key] = row elif row["Change"] == "-" and key in state: del latest[key] today = pd.DataFrame(latest.keys(), columns=item_columns) </code>

latest = {}
item_columns = [
    "First",
    "Last",
]  

for _, row in history.iterrows():
    key = tuple(row[column] for column in item_columns )
    if row["Change"] == "+":
        latest[key] = row
    elif row["Change"] == "-" and key in state:
        del latest[key]

today = pd.DataFrame(latest.keys(), columns=item_columns)

This produces available items

<code> First Last

0 X X

1 Y Y

</code>

<code> First Last 0 X X 1 Y Y </code>

  First Last
0     X    X
1     Y    Y

The issue is that loop is slow with big tables, e.g. 20 seconds for below

<code>duplicated = pd.concat([history.iloc[[0]]] * 50000, ignore_index=True)

history = pd.concat([history, duplicated], ignore_index=True)

for _, row in history.iterrows():

key = tuple(row[column] for column in item_columns )

if row["Change"] == "+":

latest[key] = row

elif row["Change"] == "-" and key in state:

del latest[key]

today = pd.DataFrame(latest.keys(), columns=item_columns)

</code>

<code>duplicated = pd.concat([history.iloc[[0]]] * 50000, ignore_index=True) history = pd.concat([history, duplicated], ignore_index=True) for _, row in history.iterrows(): key = tuple(row[column] for column in item_columns ) if row["Change"] == "+": latest[key] = row elif row["Change"] == "-" and key in state: del latest[key] today = pd.DataFrame(latest.keys(), columns=item_columns) </code>

duplicated = pd.concat([history.iloc[[0]]] * 50000, ignore_index=True)
history = pd.concat([history, duplicated], ignore_index=True)
for _, row in history.iterrows():
    key = tuple(row[column] for column in item_columns )
    if row["Change"] == "+":
        latest[key] = row
    elif row["Change"] == "-" and key in state:
        del latest[key]
today = pd.DataFrame(latest.keys(), columns=item_columns)

Any way to speed up?

Thiết kế website giá rẻ

Danh mục

Vectorize processing of log table to determine the latest availability