I have a code with heavy use of dataframes and lots of computation. I want to speed up the process time so I bought a Nvidia GPU and try to implement cudf library. But performance is awful. benchmark code below run very slow on gpu.
import cudf as cd
import pandas as pd
import numpy as np
import time
import cupy as cp
# Generating random data
np.random.seed(42)
size = 1000
iteration = 1000
data = {
'type': np.random.choice(['X', 'Y'], size=size),
'num1': np.random.randint(10, 1001, size=size),
'num2': np.random.randint(1000, 100001, size=size)
}
# Creating DataFrame
df_cpu = pd.DataFrame(data)
df_gpu = cd.DataFrame(data)
for i in range (2):
t = time.time()
for i in range(iteration):
temp = sum(df_cpu[df_cpu.type == 'X']['num1'].values)
print(f'filtering in cpu in {time.time()- t}')
t = time.time()
for i in range(iteration):
temp = sum(df_cpu.loc[df_cpu.type == 'X', 'num1'].values)
print(f'filtering with loc cpu in {time.time()- t}')
t = time.time()
for i in range(iteration):
temp = sum(df_gpu[df_gpu.type == 'X']['num1'].values)
print(f'filtering in gpu in {time.time()- t}')
t = time.time()
for i in range(iteration):
temp = sum(df_gpu.loc[df_gpu.type == 'X', 'num1'].values)
print(f'filtering with loc in gpu in {time.time()- t}')
Benchmark results
nvdia-smi
Modules
What am I missing? I need to full utilize gpu and increase performance
New contributor
HakanA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.