I’m using the GitHub API to clone repositories with the same execution command, for example, ‘python3 main.py’ or ‘poetry run’.
How can I modify the query to retrieve repositories that match these criteria?
This is the code I wrote but doesn’t do the job intended (for poetry run command):
import os
import requests
import subprocess
import time
# GitHub API endpoint for searching repositories
api_url = 'https://api.github.com/search/repositories'
# Search query parameters
query = 'poetry run in:readme language:python'
params = {
'q': query,
'sort': 'stars',
'order': 'desc'
}
# OAuth token for authentication
oauth_token = 'my token'
# Send GET request to GitHub API with authentication
headers = {
'Authorization': f'Token {oauth_token}'
}
response = requests.get(api_url, params=params, headers=headers)
data = response.json()
# Process each repository
for item in data['items']:
repo_name = item['full_name']
clone_url = item['clone_url']
stars = item['stargazers_count']
views = item['watchers_count']
# Filter repositories based on stars or views
if stars > 100° or views > 10000:
# Clone the repository locally
clone_process = subprocess.run(['git', 'clone', '--depth', '1', clone_url], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
if clone_process.returncode == 0:
print(f'Cloned {repo_name}')
# Traverse through repository directory to find Python files
for root, dirs, files in os.walk(repo_name.split('/')[1]):
for file in files:
if file == 'main.py' or file.endswith('.py') and 'python main' in open(os.path.join(root, file)).read():
print(os.path.join(root, file))
else:
print(f'Failed to clone {repo_name}')
# Add a delay between requests
time.sleep(1) # Adjust the delay time as per your requirements
The first problem is that EACH repository has its own configuration. For instance, you have to navigate to a certain directory: cd “directory where the main file is”, and then run the file.
The second problem is that whenever I use this query, query = ‘poetry run in:readme language:python’, it searches for the exact string regardless of the context. For instance, if the readme file had “DONT USE POETRY RUN FOR THIS CODE,” the code would clone that repository even though ‘poetry run’ isn’t a part of the command.
Any ideas on how to work around this? My goal is to automate the execution with one command
Hijaw is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.