I have written a Python program to automatically parse news detail pages, but now I am facing the problem of not being able to distinguish between news list pages and detail pages. What are some good methods to solve this issue?
I tried to find some specific tags in the HTML, but it didn’t work.
I tried using this open source project https://github.com/Gerapy/GerapyAutoExtractor
but the effect is not good.
Could you please provide some suggestions on how to effectively distinguish between news list pages and detail pages?
1