How can robots beat CAPTCHAs?

I have a website e-mail form. I use a custom CAPTCHA to prevent spam from robots. Despite this, I still get spam.

Why? How do robots beat the CAPTCHA? Do they use some kind of advanced OCR or just get the solution from where it is stored?

How can I prevent this? Should I change to another type of CAPTCHA?


I am sure the e-mails are coming from the form, because it is sent from my email-sender that serves the form messages. Also the letter style is the same.

For the record, I am using PHP + MySQL, but I’m not searching for a solution to this problem. I was interested in the general situation how the robots beat these technologies. I just told this situation as an example, so you can understand better what I’m asking about.

24

Two easiest ways to get through CAPTCHA:

  • Use human farms, i.e. ask for people to fill CAPTCHAs for money, just like ProTypers does.

  • Use an OCR.

There may also be a bug either in the CAPTCHA mechanism itself or the surrounding application, allowing someone to bypass the CAPTCHA.

By the way, the W3C article Inaccessibility of CAPTCHA : Alternatives to Visual Turing Tests on the Web explains as well how CAPTCHAs could be compromised:

[…] One of the first documented attacks on the system was by a Carnegie Mellon student, who associated CAPTCHA images with access to an adult Web site, thus gaining free human labor to crack the authentication. […]

External projects […] have shown methodologies and results indicating that many of the systems can be defeated by computers with between 88% and 100% accuracy, using optical character recognition.

So how can you prevent those attacks?

  • If you have your custom implemented CAPTCHA, you may try to move to a popular one, like reCAPTCHA.

    This will help if either your own CAPTCHA was too easy to OCR, or if there was a bug which was successfully exploited.

  • If you use a popular CAPTCHA mechanism, moving to a custom-made one or to another popular one might prevent OCR.

Technically, nothing would prevent human farms: you may create animated GIFs where several frames display different text very quickly, and only one frame is actually visible by the user, you may distort or bend text in all directions or find new, alternative ways to prevent OCRs from recognizing text, still humans paid for solving CAPTCHAs will successfully solve them.

You may want to move from visual CAPTCHA to sound (if you’re not using both already, and you should), but this means that users with hearing impairment would be unable to use your application.


FrustratedWithFormsDesigner and GalacticCowboy mentioned in the comments domain-specific CAPTCHAs. I tried to find some material about how effective those are, but without success, so here is just my personal opinion:

  1. Domain-specific CAPTCHAs can be hugely annoying when actual users have no idea about the answer.

    Example: I’m visiting a page on a movies-oriented website. I notice a mistake in an article and want to comment on it to notify the author about the mistake. The comments form asks me, as a CAPTCHAs mechanism, to provide the name of the actress displayed on a photo. I have no idea who is this actress, so the only thing I can do is to leave the website (or spend the next two minutes using Google Images).

    Another example: a website asks to give a synonym of “mysterious”. Easy as it sounds for a non-impaired person who speaks English fluently, it would be impossible to solve without external help for people who don’t speak English well or people with some developmental disabilities, not counting the fact that finding synonyms or antonyms is always tricky.

  2. Most of those domain-specific problems can be solved programmatically. Both examples I gave are easily solved using external resources (Google Images and Synonyms dictionary). The one about transistors given as an example by FrustratedWithFormsDesigner is better, but still may be probably solved with a custom-made bot.

  3. None resist human farms.

  4. Either they generate data, just like ordinary text CAPTCHAs draw distorted characters, in which case the generation algorithm can be itself exploited to tune the bots, or they find data somewhere, just like reCAPTCHA takes text from scanned books, in which case the bot can use this data against it (for example, if you take words from a dictionary, asking the user to provide synonyms, the bot can use the very same dictionary to have a 100% success).

12

Adding to MainMa’s answer…

Spammers trick others into doing the CAPTCHA for them

Basically, spammers set up a warez site or a porn site that appears to have a CAPTCHA on it, but it’s not a real CAPTCHA. A bot pulls the CAPTCHA from the site they want to spam (or otherwise exploit), and then displays it on the warez site or a porn site where someone completes it for them. Then the CAPTCHA value is passed back to their bot…

A bit more on Spammers

I use reCAPTCHA, and I’ve found that it’s basically worthless. I also use a custom spam filter that catches the spam that got past reCAPTCHA, and I need to review it every few days for false positives.

My forum is also all custom-written and it gets very little traffic. I don’t believe anyone coded a specific attack to my site. Still, my spam filter catches 2k spam messages a day! None are ever displayed on the site. Spammers get no benefit from spamming me, yet they still do.

I can see patterns in the spamming attempts because I log it all. I can tell you this: putting aside how they get past the CAPTCHA, spammers are clearly using a brute force technique varying the fields that are filled out and the kind of data and word mixes that populate those fields. Apparently they do this so cheaply (including bypassing the CAPTCHA) that it doesn’t even pay to do an analysis of the individual sites to see of if what they are doing is or isn’t working.

Year after year, they continue targeting my site with thousands of spam messages a day only to get one through every month, and that one gets manually deleted a day later. It’s that cheap to spam!

This is going to be a battle for years to come. Particularly for small one-man moderator sites like mine.


EDIT 6/22/2017 :
I want to add that since this post google has completely revamped reCAPTCHA and as of this writing it has been working flawlessly. Though I suspect there is a bit of false positives or its a pain for users as post have dropped a bit since I implemented it. The 2 big changes are

1) They are using Images instead of text (So no more OCR)

2) They are combining it with the users activity across all site that use reCAPTCHA. So if you get past the reCAPTCHA on site A, then go to Site B it may not even prompt you to prove you are human! Also (I think) if you are hitting too many reCAPTCHAs across too many sites it will flag you as well. I am sure it is using other sorts of AI based on the users activity as well.

I’m sure its just a matter of time until spammers beat this as well…

1

Have you ever tried using cat-dog captcha? I have a forum that had standard captcha and changed it and I have no guest spams since.

It is possible that your site is being targeted by an exploited ultra-cheap labour force and that a human being is manually entering your CAPTCHA phrases.

If the solution you are using is not overly sophisticated, it is possible that your attacker is doing image recognition.

It is also a possibility that you have a bug somewhere in your code that is allowing the CAPTCHA to be bypassed.

Don’t make the assumption that a robot is beating your CAPTCHA. Think of your system holistically and see if it has been compromised.

2

Other have discussed how spammers circumvent CAPTCHAs. Here are some tips on
How can I prevent this:

Note there is no silver bullet and spammers seem to be 1 step ahead of the game. So you will have to use a combination of multiple techniques

  1. Use a honey pot form
  2. Use a CAPTCHAs or Logic question. Basic questions like “apple, fish, hand, six – which of these is a body part”
  3. Have a delay. If the form is posted within 5 seconds of the page loading ignore the request, most robots will post within less than a second
  4. Have some IP address monitoring – if you notice a spider crawling your website which is not in a white-list (google, bing) then blacklist and ban its IP address. Preferably this would be dynamic/automated in code/software

To echo the other answers, you’re likely encountering bots that use human farms to enter the captchas for them.

I’ve recently discussed a technique (and released an accompanying Drupal module) that blocks spam bots by requiring client-side JavaScript. As far as I’m aware, this has worked with 100% efficiency on all sites that have used this code. The idea is to use AJAX to generate a unique hash and submit it along with the other form data, and then compute that same hash on the backend once the form is submitted, and compare the two values.

Full details in my blog post (coincidentally, since you mentioned using PHP + MySQL, these are the same technologies described there) – Module release: Badbot; eliminating spam…

2

If your site is twitter, and someone has targeted it specifically (rather than a bot finding it) then you can stop reading…

Otherwise, it might be worth not making your form NOT look like a form.
1. Don’t have fields with ‘e-mail’ in the type, name or placeholder, use short or misleading names for all fields.
2. Don’t use an actual html form element and submit button. Rather use AJAX to post it on the click of a normal div (styled to look like a button).
3. Don’t put the onclick event in the html, add a listener in JavaScript.
4. Use JavaScript to populate any tips ‘enter your email address here’ as it’s possible that bots won’t actually be triggering JS when trawling pages (not sure on this one, but I do it anyway).

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật