I’m new to coding and trying to build a script that automatically extracts and submits CAPTCHA values on a website (I’ve hidden the actual URLs for privacy reasons, but the structure is similar to http://www.demo.com/result.php and http://demo.com/api.php). I’m using Node.js with Axios and Cheerio libraries.
Here’s what I’ve achieved so far:
-
I can successfully fetch the HTML content of the CAPTCHA page.
-
My script extracts the arithmetic expression presented as text within a specific HTML element (e.g., a table cell containing a “+” symbol).
-
It parses the expression to extract the numbers and calculates the expected CAPTCHA value.
The problem:
The issue arises when I attempt to submit the calculated value (value_captcha
) along with other form data to the submission URL. The script doesn’t seem to work consistently, suggesting the actual CAPTCHA might be different each time. Here are three observations that support this:
-
Changing CAPTCHA Value: I retrieve the CAPTCHA value using my script, but when I submit it, it doesn’t work, implying the actual CAPTCHA might have changed between retrieval and submission.
-
Dynamic CAPTCHA: My assumption is that the CAPTCHA is dynamic and generates a new value each time the page is loaded.
-
Time Discrepancy: There might be a slight delay between fetching the CAPTCHA and submitting the form, allowing the CAPTCHA to refresh.
I want to achieve a solution where my script can:
-
Retrieve the CAPTCHA value.
-
Automatically submit the form with the retrieved CAPTCHA value in a single, streamlined process.
Code Snippet:
JavaScript
const axios = require('axios');
const cheerio = require('cheerio');
// Function to extract and evaluate the captcha expression
function extractAndEvaluateCaptcha(html) {
const $ = cheerio.load(html);
// Search for the specific HTML element containing the captcha expression
const captchaElement = $('td:contains("+")').filter(function() {
return $(this).text().match(/^s*d+s*[+-*/]s*d+s*$/);
}).first();
if (captchaElement.length > 0) {
const arithmeticExpression = captchaElement.text().trim();
console.log("Captcha arithmetic expression:", arithmeticExpression);
const numbers = arithmeticExpression.match(/d+/g).map(Number);
const captchaValue = numbers.reduce((a, b) => a + b, 0);
console.log(captchaValue);
return captchaValue; // Return the calculated value
} else {
console.log("Captcha element not found in the HTML response.");
return null; // Indicate failure to find the element
}
}
// Example usage (**This section needs modification!**)
const url = 'http://www.demo.com/result.php'; // Replace with actual URL
// **HERE's where we need to combine fetching and submitting:**
axios.get(url)
.then(response => {
if (response && response.data) {
const extractedValue = extractAndEvaluateCaptcha(response.data);
if (extractedValue) {
// **Submit form data with extracted value (implementation details needed)**
const formData = {
user_type: 2,
name: 'robert',
year: 2022,
value_captcha: extractedValue,
button2: 'Submit'
};
// **Make a POST request with formData (consider using Axios.post)**
// ... submit form data with extractedValue
} else {
console.error("Failed to extract CAPTCHA value.");
}
} else {
console.error("No response data received.");
}
})
.catch(error => {
console.error("Failed to fetch HTML:", error);
});
cURL Example:
Bash
curl 'http://demo.com/api.php'
-H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=
Question:
How can I modify my script to achieve automatic retrieval and submission of the CAPTCHA value in a single process? Are there any ethical considerations or limitations to keep in mind when dealing with dynamic CAPTCHAs?
Thank you!
I look forward to any insights and suggestions from the Stack Overflow community!