I’ve been doing a little webscraping with Selenium and Scrapy lately im not having any issues in getting and extracting the information (so far) but i was wondering what would be a good way about handling a situation like the following:
Im extracting information about bus tickets (date of departure, date of arrival, costs, company, etc), the question is: it’s better to first extract all of the information, then “clean” the data and lastly make it into a dataframe (that’s what i want to do)? Or is it better in terms of speed and managment to clean it right after it was extracted before i do anything else?
Thanks in advance, any information, books or something that helps about this subject is app appreciated!
I’ve done both ways but i would like to know what is considered better and more efficient
cd91 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.