I am planning a project to automate the following tasks:
Scrape data from a website.
Check if the data is new or updated.
Push the new data to a GitHub repository.
Send an email notification or other types of notifications (e.g., Slack, FCM) when new data is detected.
Deploy the updated data to a web app or another application.
Given the wide range of tools and libraries available, I am looking for recommendations on the best options for each of these tasks. Specifically:
Data Scraping: What are the best libraries or tools for efficient and reliable web scraping?
Data Comparison: What methods or libraries are best for comparing scraped data to existing data to determine if it is new or updated?
GitHub Integration: What tools or scripts are best for automating the process of pushing updates to a GitHub repository?
Notifications: What are the most effective ways to send email or other types of notifications (e.g., Slack) when new data is detected?
Deployment: What are the best practices and tools for automating the deployment of updated data to a web app or other applications?
I am familiar with Python for scripting, but I am open to using other languages or tools if they offer significant advantages.
Any recommendations, best practices, or experiences with specific tools would be greatly appreciated. Thank you in advance for your help!
I have tried using Python with libraries like BeautifulSoup and Scrapy for data scraping, but I am unsure if these are the most efficient and reliable options. For comparing data, I considered using simple file comparison methods, but this seems inefficient for larger datasets. I also looked into using GitHub Actions for automation but found it challenging to set up the entire workflow.
I expected to find a streamlined solution that integrates all these steps efficiently, but I am overwhelmed by the variety of tools and methods available. I am hoping to get recommendations on the best tools and practices from those who have successfully implemented similar workflows.
Any recommendations, best practices, or experiences with specific tools would be greatly appreciated. Thank you in advance for your help!
Jeeven Lamichhane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.