I have below SQL
query
SELECT DISTINCT actors, COUNT(*)
FROM (VALUES('a', 1), ('a', 1), ('b', 1), ('b', 2)) t(actors, id)
GROUP BY actors, id
Now I want to get a similar implementation in Python
import pandas as pd
dat = pd.DataFrame({'col1' : ['a', 'a', 'b', 'b'], 'col2' : [1,1,1,2]})
dat2 = dat.groupby(['col1', 'col2']).size().reset_index(drop = False).drop_duplicates('col1')
print(dat2)
However, I wonder if above code is exactly same to SQL. In Python
, when I am removing duplicate I retain the first instance. That also depends on the sorting of rows of the dataframe before applying drop_duplicates
. Is it the case for SQL
as well?
Is there any better was to create exactly same Python
code of above SQL
?