I am new to python and I am trying to conduct multiple PubMed searches and save the number of results in a CSV file. The code I have right now as-is will run, but it only does one set of search terms and I would like the code to instead run through a column of “Terms” provided in a CSV file, but I don’t know what location to place the for loop, and I don’t know how to…I guess set the variable for the loop to run. Here is what I have that runs:
import requests
import time
import pandas as pd
def get_pubmed_results_count(search_terms, delay=1):
base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
results = {}
for term in search_terms:
# Define parameters for the API request
params = {
"db": "pubmed",
"term": term,
"retmode": "json"
}
try:
# Make the request to the PubMed API
response = requests.get(base_url, params=params)
response.raise_for_status()
# Parse the response
data = response.json()
count = data['esearchresult']['count']
results[term] = count
except requests.exceptions.RequestException as e:
print(f"Error retrieving data for term '{term}': {e}")
results[term] = None
# Respectful delay between requests
time.sleep(delay)
return results
# Example usage
df_searchterms = pd.read_csv('search1.csv')
print(df_searchterms)
if __name__ == "__main__":
#for index, row in df_searchterms.iterrows():
#search_terms = (row['Term'])
search_terms = ["APOE AND Alzheimer's"]
result_counts = get_pubmed_results_count(search_terms)
for term, count in result_counts.items():
df_results = pd.DataFrame(result_counts.items(), columns=['term', 'count'])
print (df_results)
df_results.to_csv('TestRestults1.csv', index=False)
And here is what my search terms data frame looks like:
Term
0 APOE AND Alzheimer's
1 PSEN1 AND Alzheimer's
2 PSEN2 AND Alzheimer's
3 APP AND Alzheimer's
4 CLU AND Alzheimer's
So my question is, can someone help me figure out where to put the for loop and how to replace the line:
search_terms = ["APOE AND Alzheimer's"]
with the appropriate line that will instead run through the CSV file? Of note, if I comment out the current search_terms line and correctly tab the replacement line (currently commented out), then I get a “Key Error” for ‘count’. Please help, thank you in advance!