I have a very specific problem, I have a Pandas DataFrame that I am continuously adding “posts” to.
When I add new posts to the DataFrame I am currently just dropping duplicates based on the post ID, but I want to add a “Comments” column to my DataFrame which will be a list containing IDs for comments in another DataFrame.
The problem is this, if a post has new comments, I want it to either add the new comments to the comments column, or replace the entire row with the post that now has new commments.
Here is some code to give an idea of my issue:
posts = pd.DataFrame(data, columns=['ID', 'Date/Time', 'Title', 'Body', 'Comments'])
new_posts = get_new_posts() # Returns DataFrame containing new posts
pd.concat([posts, new_posts], ignore_index=True).drop_duplicates('ID')
What I’d like to do is something like this (I know this wouldn’t work)
posts = pd.DataFrame(data, columns=['ID', 'Date/Time', 'Title', 'Body', 'Comments'])
new_posts = get_new_posts() # Returns DataFrame containing new posts
pd.concat([posts, new_posts], ignore_index=True).drop_duplicates(if 'ID' and 'Comments' else replace row)
I can’t just do a flat drop duplicates, because as far as I know that would leave the old post while adding a new one with updated comments but the same ID
bemy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.