I’m currently developing a C# scraping tool with Visual Studio 2019, without third-party libraries.
I’ve decided to implement a feature for logging into websites like the National Tax Service and scraping electronic tax invoice data.
So far, through analysis of various scraping libraries, I’ve identified three methods for the scraping engine:
1. Direct server communication: This involves sending requests directly to the server using techniques similar to XMLHttpRequest or fetch, often utilizing Node.js and JavaScript.
While it offers lightweight resource usage, developing without external libraries can be challenging, requiring a thorough understanding of the language and dynamic HTML handling,
which may be unfamiliar territory.
2. CEF browser-based approach: Developing tools using the Chromium Embedded Framework (CEF) such as CefSharp
involves embedding a browser for dynamic HTML rendering.
However, certain websites with JavaScript connected to external scripts might encounter issues with event handlers being separated, potentially hindering JavaScript commands. Currently, I’m reconsidering this method due to development roadblocks.
3. Web driver-based approach: Many popular scraping libraries like Selenium and Puppeteer utilize this method, controlling the browser installed on the user’s computer through a web driver.
While it’s considered stable and straightforward, it may suffer from slower speeds depending on data volume and browser dependencies.
I’ve heard of companies opting for the first method, but as it’s unfamiliar territory, I’m unsure how to proceed.
Although I’m currently leaning towards the second method, I’m questioning its appropriateness.
What are your thoughts?
While the first method seems appealing for maintaining control over our scraping tool as an in-house solution,
I wonder if alternatives are feasible.
Is Node.js suitable for this method? What exactly aligns with this approach?
Sorry, my English isn’t very fluent, so document is too long and my sentences might be a bit awkward.
i studied and tried make c# solution for scraping with cefsharp
김이정 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.