I am learning to scrape from websites, and have arrived at the point where I realize I need a rotating proxy to not be rate limited (at least for the website I am trying to scrape from). I am using scrapeops free tier for this. Right now I get 400 bad request everytime the service tries to make a request to the page.
Here is a screenshot of the failed request:
Please let me know if info about the specific page I am trying to scrape is needed.
I have assured that the url is correctly formatted and that the request go though fine without the proxy.I have also tried things like not intercepting and other rearrangments that have done nothing.
The url that makes the req is something like this:
https://proxy.scrapeops.io/v1/?api_key=blahblahblah&url=https%3A%2F%2Fblah.com
Here is the code that that makes the request:
const puppeteer = require("puppeteer-extra");
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const useProxy = require("@lem0-packages/puppeteer-page-proxy");
require('dotenv').config()
puppeteer.use(StealthPlugin());
async function getItems(
url,
query,
retries = 3,
) {
for (let i = 0; i < retries; i++) {
const proxyUrl = `https://proxy.scrapeops.io/v1/?api_key=${process.env.scrapeOPKey}&url=${encodeURIComponent(url)}`
let browser;
try {
browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', async request => {
await useProxy(request, proxyUrl);
});
await page.goto(url, { waitUntil: 'networkidle2' });
await page.waitForSelector('.home_search__JQjkQ');
await page.type('.home_search__JQjkQ', query, {delay: 500});
const html = await page.content();
return html
} catch (e) {
console.log(e)
console.log(`Failed to fetch ${url}, retrying...`);
if (browser) {
await browser.close();
}
}
}
return null;
}
module.exports = getItems