I wish to fetch raw wikitext from a selection of webpages (which may be generated using MediaWiki or not).
I try to programmically look through the HTML of a webpage, determine if it uses MediaWiki and fetch raw wikitext, else skip. So far it seems MediaWiki pages tend to have:
- a
<meta>
tag withname=generator
andcontent=MediaWiki...
- a “Powered by MediaWiki” image in the footer, so look for <img> tag with
alt=Powered by MediaWiki
Is this a good approach to look for one of these and try fetching raw wikitext using query param action=raw
or is there a better way to do this?
Thanks
New contributor
argosci is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2