how to create a webscraper tool for static sites with node js ? i have used cheerio but its only scraping the homepage but how to scrap all the pages of a website or if i want any specific page
import axios from “axios”;
import * as cheerio from ‘cheerio’;
async function dataAfterScrapingWebs(url) {
try {
const response = await axios.get(url);
const $ = cheerio.load(response.data);
const bodyText = $('body').text();
const emailRegex = /([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+.[a-zA-Z0-9._-]+)/gi;
const emails = new Set(); // Using a set to avoid duplicates;
let match;
while (match = emailRegex.exec(bodyText)) {
emails.add(match[0]);
}
console.log(Array.from(emails), url);
console.log(`Data scraped successfully from ${url}`);
return { emails: Array.from(emails), websiteName: url }
} catch (error) {
if (error.response) {
console.error(`Error fetching data from ${url}. Status code: ${error.response.status}`);
} else if (error.request) {
console.error(`Error fetching data from ${url}. No response received.`);
} else {
console.error(`Error fetching data from ${url}:`, error.message);
}
}
}
export default dataAfterScrapingWebs;
and its working its only scraping the first homepage but not able to build the logic how to get all the pages of a site
user24978967 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.