I am trying to crawl a webpage “https://en.wikipedia.org/wiki/Glossary_of_artificial_intelligence”. My objective is to dump the data in an excel file where the column 1 will contain the heading for example “abductive logic programming” and column 2 will contain the senteces under it.
I need to run the code till I am getting 10k sentences.
I am new to webcrawling and have no knowledge about it, can anyone guide me how to proceed in this task?
I have searched the internet and found some libraries like SOUP, but I am not sure how to proceed with the exact task requirement.
P.S. -> I ma trying to gather the data to train a NLP model.
Cs_coder is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.