I’m new to pandas and am struggling with some data wrangling issues.
I have a data source that looks approximately like this:
location available sold name local_id more_data
0 1001 - BBB 1 0 Alpha 24 'DJQ3DD3y'
1 1001 - BBB 1 0 Alpha 24 'aB3joXQy'
2 1001 - BBB 1 0 Alpha 24 'ZJve572B'
3 1001 - BBB 1 0 Alpha 24 'DJEkx8Dy'
4 1001 - BBB 1 0 Alpha 24 'VyVaWLYp'
5 1001 - BBB 1 0 Bravo 19 'Rpr7AvVy'
6 1001 - BBB 1 0 Bravo 19 'ZJlO0VmB'
7 1001 - BBB 1 0 Bravo 19 'OBb6NrrB'
8 1001 - BBB 1 0 Bravo 19 'ZJvPPEXy'
9 1001 - BBB 1 0 Charlie 6 'Vy9MMOEy'
10 1001 - BBB 1 0 Charlie 6 'MJ8AALKp'
11 1001 - BBB 1 0 Delta 17 'vpWmN1kB'
12 1001 - BBB 1 0 Delta 17 'DJEb9qQp'
13 1001 - BBB 1 0 Echo 7 'qyZ1zn1p'
14 1001 - BBB 1 0 Echo 7 'bBqaYoMB'
15 1001 - BBB 1 0 Golf 22 'AJgLr9qp'
16 1001 - BBB 1 0 Golf 22 'vBdV57Ap'
17 1001 - BBB 1 0 Golf 22 'VJYxnLZB'
18 1001 - GG 1029 237 Charlie 6 'VJYxnGXB'
19 1001 - GG 1029 237 Charlie 6 'Vy9Mo52y'
20 1001 - GG 1029 237 Delta 17 'aB3zxYWy'
21 1001 - GG 1029 237 Delta 17 'MJ8A3z0p'
22 1001 - GG 1029 237 Echo 7 'YpLMPwNy'
23 1001 - GG 1029 237 Echo 7 '8Bwev1ep'
24 1001 - GG 1029 237 Golf 22 'MJXm6bLp'
25 1001 - GG 1029 237 Golf 22 'oye7XR0J'
26 1001 - GG 1029 237 Golf 22 'vpWmDDYB'
27 1001 - P 873 375 Charlie 6 'DJEbjjkp'
28 1001 - P 873 375 Charlie 6 'aB3z66zy'
29 1001 - P 873 375 Delta 17 'Kp4zrrKB'
30 1001 - P 873 375 Delta 17 'oyxqMMAB'
31 1001 - P 873 375 Echo 7 'zJ1KMMZy'
32 1001 - P 873 375 Echo 7 'ZJlOzz6B'
33 1001 - P 873 375 Foxtrot 20 'YpLMbbay'
34 1001 - P 873 375 Foxtrot 20 'ZJnmzzYB'
35 1001 - P 873 375 Golf 22 'Kp4zr5LB'
36 1001 - P 873 375 Golf 22 'oye7jg8J'
37 1001 - P 873 375 Golf 22 'OBb6jE3B'
38 1002 - GG 37 11 Charlie 6 'EyGMWPbJ'
39 1002 - GG 37 11 Charlie 6 'aB3zOoDy'
40 1002 - GG 37 11 Delta 17 'DJQ4laLB'
41 1002 - GG 37 11 Delta 17 'ZJlOvNXB'
42 1002 - GG 37 11 Echo 7 'Rpr7a8Dy'
43 1002 - GG 37 11 Echo 7 'zJjYNR4B'
44 1002 - GG 37 11 Golf 22 'Vy9MqkRy'
45 1002 - GG 37 11 Golf 22 'oye7Y0kJ'
46 1002 - GG 37 11 Golf 22 '8BweZbnp'
47 1002 - P 1854 826 Charlie 6 'Rpr7Z5by'
48 1002 - P 1854 826 Charlie 6 'vBdVK1Ap'
49 1002 - P 1854 826 Delta 17 '4Jkae8Dy'
50 1002 - P 1854 826 Delta 17 'YpLM3nxy'
51 1002 - P 1854 826 Echo 7 'VB7vD6Py'
52 1002 - P 1854 826 Echo 7 'ZJlOXbzB'
53 1002 - P 1854 826 Foxtrot 20 'MpNqezKJ'
54 1002 - P 1854 826 Foxtrot 20 '9pOWo39p'
55 1002 - P 1854 826 Golf 22 'MJXm5qnp'
56 1002 - P 1854 826 Golf 22 'oy5vxd4B'
57 1002 - P 1854 826 Golf 22 'DJQ4qz3B'
As you can see, the available
and sold
are presented in relation to the location
column. What I am trying to do is group these by the first portion of the location
column (ie, the “1001” portion of the “1001 – XXX”) and summing the unique values for available
and sold
, while preserving the other data which is unique per row and includes numeric types that are keys and I do not want altered.
Therefore, the output would look like the following:
location available sold name local_id more_data
0 1001 1903 612 Alpha 24 'DJQ3DD3y'
1 1001 1903 612 Alpha 24 'aB3joXQy'
2 1001 1903 612 Alpha 24 'ZJve572B'
3 1001 1903 612 Alpha 24 'DJEkx8Dy'
4 1001 1903 612 Alpha 24 'VyVaWLYp'
5 1001 1903 612 Bravo 19 'Rpr7AvVy'
6 1001 1903 612 Bravo 19 'ZJlO0VmB'
7 1001 1903 612 Bravo 19 'OBb6NrrB'
8 1001 1903 612 Bravo 19 'ZJvPPEXy'
9 1001 1903 612 Charlie 6 'Vy9MMOEy'
10 1001 1903 612 Charlie 6 'MJ8AALKp'
11 1001 1903 612 Delta 17 'vpWmN1kB'
12 1001 1903 612 Delta 17 'DJEb9qQp'
13 1001 1903 612 Echo 7 'qyZ1zn1p'
14 1001 1903 612 Echo 7 'bBqaYoMB'
15 1001 1903 612 Golf 22 'AJgLr9qp'
16 1001 1903 612 Golf 22 'vBdV57Ap'
17 1001 1903 612 Golf 22 'VJYxnLZB'
18 1001 1903 612 Charlie 6 'VJYxnGXB'
19 1001 1903 612 Charlie 6 'Vy9Mo52y'
20 1001 1903 612 Delta 17 'aB3zxYWy'
21 1001 1903 612 Delta 17 'MJ8A3z0p'
22 1001 1903 612 Echo 7 'YpLMPwNy'
23 1001 1903 612 Echo 7 '8Bwev1ep'
24 1001 1903 612 Golf 22 'MJXm6bLp'
25 1001 1903 612 Golf 22 'oye7XR0J'
26 1001 1903 612 Golf 22 'vpWmDDYB'
27 1001 1903 612 Charlie 6 'DJEbjjkp'
28 1001 1903 612 Charlie 6 'aB3z66zy'
29 1001 1903 612 Delta 17 'Kp4zrrKB'
30 1001 1903 612 Delta 17 'oyxqMMAB'
31 1001 1903 612 Echo 7 'zJ1KMMZy'
32 1001 1903 612 Echo 7 'ZJlOzz6B'
33 1001 1903 612 Foxtrot 20 'YpLMbbay'
34 1001 1903 612 Foxtrot 20 'ZJnmzzYB'
35 1001 1903 612 Golf 22 'Kp4zr5LB'
36 1001 1903 612 Golf 22 'oye7jg8J'
37 1001 1903 612 Golf 22 'OBb6jE3B'
38 1002 1891 837 Charlie 6 'EyGMWPbJ'
39 1002 1891 837 Charlie 6 'aB3zOoDy'
40 1002 1891 837 Delta 17 'DJQ4laLB'
41 1002 1891 837 Delta 17 'ZJlOvNXB'
42 1002 1891 837 Echo 7 'Rpr7a8Dy'
43 1002 1891 837 Echo 7 'zJjYNR4B'
44 1002 1891 837 Golf 22 'Vy9MqkRy'
45 1002 1891 837 Golf 22 'oye7Y0kJ'
46 1002 1891 837 Golf 22 '8BweZbnp'
47 1002 1891 837 Charlie 6 'Rpr7Z5by'
48 1002 1891 837 Charlie 6 'vBdVK1Ap'
49 1002 1891 837 Delta 17 '4Jkae8Dy'
50 1002 1891 837 Delta 17 'YpLM3nxy'
51 1002 1891 837 Echo 7 'VB7vD6Py'
52 1002 1891 837 Echo 7 'ZJlOXbzB'
53 1002 1891 837 Foxtrot 20 'MpNqezKJ'
54 1002 1891 837 Foxtrot 20 '9pOWo39p'
55 1002 1891 837 Golf 22 'MJXm5qnp'
56 1002 1891 837 Golf 22 'oy5vxd4B'
57 1002 1891 837 Golf 22 'DJQ4qz3B'
I know that I’m missing something super-easy because this is kind of problem is in panda’s wheelhouse. But sadly it’s where I am on the learning curve; I hope someone can steer me in the right direction.