I am looking for solution to extract main article content from html (not paid API). Generally I work in .net. There are many libraries in python, but not in .net. Of course I can download all html and try do extract all text. Is there any libraries, good algorithm that can be used?
Html Agility Pack
is the best solution there. LINK
// From File
var doc = new HtmlDocument();
doc.Load(filePath);
// From String
var doc = new HtmlDocument();
doc.LoadHtml(html);
// From Web
var url = "http://html-agility-pack.net/";
var web = new HtmlWeb();
var doc = web.Load(url);