Published on March 2nd, 2023 📆 | 4177 Views ⚑
0Automating Captcha Attacks | Optiv
Captchas: the go-to solution to keeping bots away from sensitive forms. The problem is that they often donât work.
Â
Itâs not uncommon for applications to protect sensitive forms exposed to unauthenticated users by showing an image of text, usually with extra lines through the writing, some letters blown up large and others quite small and any number of other distortions applied. Ideally, these should be easy for humans to solve, but not computers. Unfortunately, this has proven to be very difficult to achieve, and many of these captchas are now difficult for humans to read. Other solutions, such as reCaptcha, work by requiring people to select from various images using contextual knowledge.
Â
Captcha bypasses are not new, but some applications still rely on them as a primary defense against automated attacks. When used as a defense-in-depth control to supplement other security measures, they can provide significant protection. When used alone, they can frequently be defeated which allows attackers to target sensitive application functionality or data.
Â
During a recent mobile application assessment, I found a login form which protected sensitive user data - debit card numbers - from enumeration by attackers with a captcha. Unfortunately for the app, this captcha turned out not to be as strong as the developers hoped and I was able to defeat it with a few Python modules and a free Optical Character Recognition (OCR) program. The scripts below require only the following:
Â
- Python, and the following Python modules:
- Tesseract OCR
Â
This blog post demonstrates how I developed a script to solve the captchas I encountered. The final script is not the most robust captcha solver, but it was sufficient to attack this application. More importantly, it shows the process I used to attack this captcha, highlights some of the difficulties image captchas face and shows why they should not be considered a primary security control.
Â
Â
Whatâs unique about my text?
The first thing we should take a look at is what is it about the text thatâs different from the rest of the image. The captchas below were taken from a mobile app during a real assessment. Defeating them was all that stood between me and automated harvesting of debit card numbers.
Â
Â
The text we want to recover is red, which doesnât match the two lines meant to obscure the text. One of the first things we might try is to split these images up by their color channels: red, green and blue.
Â
We can actually do this quite easily with a Python script, using the libraries above. Heâs the one I put together to try and achieve this:
Â
from PIL import Image
im = Image.open('captcha.png')
(red, blue, green) = im.split()
red.save('red.png')
green.save('green.png')
blue.save('blue.png')
Â
This produces the following images:
Â
Â
At first glance, it might seem surprising that the text is completely absent in the red channel. However, in the RGB colorspace, the color white is represented by setting the red, green and blue channels to their maximum value. As a result, both white and red have the same value for the red channel.
Â
Perhaps there are other colorspaces in which this red text is more recoverable? Pillow allows us to do a lot of conversions between color spaces. The Image.convert method can convert between some of the most common colorspaces, but we can perform some less common ones with only a little more effort:
Â
from PIL import Image, ImageCms
im = Image.open('captcha.png')
(red, blue, green) = im.split()
red.save('red.png')
green.save('green.png')
blue.save('blue.png')
hsv = im.convert('HSV')
(hue, sat, val) = hsv.split()
hue.save('hue.png')
sat.save('sat.png')
val.save('val.png')
cmyk = im.convert('CMYK')
(cyan, magenta, yellow) = im.split()
cyan.save('cyan.png')
magenta.save('magenta.png')
yellow.save('yellow.png')
# Convert to L*a*b colorspace is more complex:
rgb = ImageCms.createProfile(colorSpace="sRGB")
lab = ImageCms.createProfile(colorSpace="LAB")
transform = ImageCms.buildTransform(inputProfile=rgb, outputProfile=lab, inMode="RGB", outMode="LAB")
lab_im = ImageCms.applyTransform(im=im, transform=transform)
l, a, b = lab_im.split()
l.save("l.png")
a.save("a.png")
b.save("b.png")
Â
Hue | |
---|---|
Saturation | |
Value | |
Cyan | |
Magenta | |
Yellow | |
L | |
a | |
b |
Â
Two of the most promising images are the âYellowâ channel image and the âaâ channel. During the assessment, I chose to use the âaâ channel. The strategy below could likely be applied to the yellow channel with similar results.
Â
Â
Removing Distractions
To âfixâ this obscured image, weâre going to need to first identify a strategy that retains the text and discards the lines. But before we dive into this problem, we need to start thinking about our OCR library. In particular, Tesseract likes images with at least 300 dots per inch (DPI). These images were meant to be immediately displayed in a mobile application, so their DPI is a little dubious. If the data DPI isnât encoded into the image metadata, weâre going to make a conservative assumption that itâs 72 DPI and scale up. The method below implements this:
Â
def rescale(im):
# Assume 72 DPI if we don't know, as this is
# one of the lowest common DPI values.
try:
dpi = im.info['dpi'][0]
except KeyError:
dpi = 72
target_dpi = 300
factor = target_dpi / dpi
return ImageOps.scale(im, factor)
Â
Now weâre ready to return to the problem of removing the non-text lines. One of the simplest ways to do this is thresholding: If a pixel is âbrightâ enough (in the âaâ channel) weâll set it to full white. If itâs less than our threshold, weâll set it to black. Within the images, pixel values range from 0 to 255. I initially tried a few threshold values starting with 128, but found a value of 180 worked best:
Â
from PIL import Image, ImageCms, ImageOps
import numpy as np
def rescale(im):
# Assume 72 DPI if we don't know, as this is
# one of the lowest common DPI values.
try:
dpi = im.info['dpi'][0]
except KeyError:
dpi = 72
target_dpi = 300
factor = target_dpi / dpi
return ImageOps.scale(im, factor)
im = Image.open('captcha.png')
# Convert to L*a*b colorspace is more complex:
rgb = ImageCms.createProfile(colorSpace="sRGB")
lab = ImageCms.createProfile(colorSpace="LAB")
transform = ImageCms.buildTransform(inputProfile=rgb, outputProfile=lab, inMode="RGB", outMode="LAB")
lab_im = ImageCms.applyTransform(im=im, transform=transform)
lab_im = rescale(lab_im)
l, a, b = lab_im.split()
# Convert to numpy array and apply the threshold to remove lines
np_a = np.array(a)
threshold = 180
np_a[np_a
np_a[np_a > threshold] = 255
# Invert the image: we want black text on a white background
np_a = 255 - np_a
a = Image.fromarray(np_a)
a.save('a.png')
Â
The code above produces the following image:
Â
Â
Weâve successfully removed the obscuring lines and retained the text, but weâre missing some pieces of the text image. One way we might recover this missing data is by âexpandingâ the dark pixels. With Pillow, we can do this with a MinFilter. (Black has a value of 0, so expanding the black area means causing more pixels to become 0.) How much to expand the text was difficult to figure out in advance; by trial and error, 11 pixels did a decent job of closing up some of those missing pixel gaps:
Â
Â
Next, weâre going to apply a MaxFilter to âcontractâ the text. Because the text is so thick on some images, letters ran together. The MaxFilter attempts to help fix this. Hereâs the result:
Â
Â
Hereâs our new code after adding the filters:
Â
from PIL import Image, ImageCms, ImageOps, ImageFilter
import numpy as np
def rescale(im):
# Assume 72 DPI if we don't know, as this is
# one of the lowest common DPI values.
try:
dpi = im.info['dpi'][0]
except KeyError:
dpi = 72
target_dpi = 300
factor = target_dpi / dpi
return ImageOps.scale(im, factor)
im = Image.open('captcha.png')
# Convert to L*a*b colorspace is more complex:
rgb = ImageCms.createProfile(colorSpace="sRGB")
lab = ImageCms.createProfile(colorSpace="LAB")
transform = ImageCms.buildTransform(inputProfile=rgb, outputProfile=lab, inMode="RGB", outMode="LAB")
lab_im = ImageCms.applyTransform(im=im, transform=transform)
lab_im = rescale(lab_im)
l, a, b = lab_im.split()
# Convert to numpy array and apply the threshold to remove lines
np_a = np.array(a)
threshold = 180
np_a[np_a
np_a[np_a > threshold] = 255
# Invert the image: we want black text on a white background
np_a = 255 - np_a
a = Image.fromarray(np_a)
# Expand image to close up "gaps" in letters, shrink to
# stop letters running together
a_filtered = a.filter(ImageFilter.MinFilter(11))
a_filtered = a_filtered.filter(ImageFilter.MaxFilter(5))
a_filtered.save('a-filtered.png')
Â
Â
OCR: solving the captcha
Finally, we need to actually run the image through Tesseract to get the text back. The Tesseract documentation indicates that it works best when images have a border, so weâll have to add that to our image. This, too, is pretty straightforward with Pillow. Weâll add the function below, with the understanding that this method will crash if its border_size parameter is odd:
Â
def border(im, border_size=4):
im = ImageOps.expand(im, border=int(border_size/2), fill="white")
im = ImageOps.expand(im, border=int(border_size/2), fill="black")
return im
Â
With the pytesseract module, this is actually quite straightforward. This module works by starting the tesseract program and passing our image data. Hereâs the new code:
Â
from PIL import Image, ImageCms, ImageOps, ImageFilter
import pytesseract
import numpy as np
def rescale(im):
# Assume 72 DPI if we don't know, as this is
# one of the lowest common DPI values.
try:
dpi = im.info['dpi'][0]
except KeyError:
dpi = 72
target_dpi = 300
factor = target_dpi / dpi
return ImageOps.scale(im, factor)
im = Image.open('captcha.png')
# Convert to L*a*b colorspace is more complex:
rgb = ImageCms.createProfile(colorSpace="sRGB")
lab = ImageCms.createProfile(colorSpace="LAB")
transform = ImageCms.buildTransform(inputProfile=rgb, outputProfile=lab, inMode="RGB", outMode="LAB")
lab_im = ImageCms.applyTransform(im=im, transform=transform)
lab_im = rescale(lab_im)
l, a, b = lab_im.split()
# Convert to numpy array and apply the threshold to remove lines
np_a = np.array(a)
threshold = 180
np_a[np_a
np_a[np_a > threshold] = 255
# Invert the image: we want black text on a white background
np_a = 255 - np_a
a = Image.fromarray(np_a)
# Expand image to close up "gaps" in letters, shrink to
# stop letters running together
a_filtered = a.filter(ImageFilter.MinFilter(11))
a_filtered = a_filtered.filter(ImageFilter.MaxFilter(5))
a_filtered.save('a-filtered.png')
# Run OCR and get the result
result = pytesseract.image_to_string(a_filtered)
# strip() helps remove some whitespace (like n) that the OCR returns
print(result.strip())
Â
And the result:
Â
$ python solve-captcha.py
kwbkc
Â
We can refactor the code above into a function and attempt to test three of the images above. This gives us some confidence that we will solve all the captchas the server sends and that our code doesnât just solve this one image. Here are the results for the three images above:
Â
Â
The script correctly solves 2 out of the three, and missed the middle one by a single letter.
Â
Â
Making an educated guess
I then pulled 100 captcha images from the server and ran the script against these pages. The script correctly identified 29 of the 100 images. This is actually good enough for an attack, although it wonât be as fast as possible. If this is the only rate-limiting control of the application, weâve effectively circumvented it. About one in four requests will pass the captcha, and I can attack the application endpoint effectively. At this point, the captcha has been defeated. Since itâs the only security control preventing automated attacks, the debit card numbers are now at risk.
Â
One in four odds means weâre resending the same request, with different captcha solutions, frequently, however. With only a few more changes, we can do significantly better and create a more effective attack. Letâs take a look at some of the cases where weâre wrong, but close.
Â
Script Result | Correct Answer |
f8kh/ | f8kh7 |
f/6d7 | f76d7 |
6/5eX | 675ex |
b8n&3s | b8n83 |
hmn/2 | hmn72 |
6k&g6 | 6k8g6 |
X4mxp | x4mxp |
/XCWY | 7xcwg |
&dhx2 | 8dhx2 |
Â
We can see three common patterns for mistakes. By far, the most common was confusing a â7â in the image with a forward slash (â/â). Because I had 100 captcha images to sort through, I was confident that there would never be a forward slash. The two next-most common errors were confusing an â8â with an amperpsand (â&â) and incorrectly identifying capital versions of some letters with little variance between upper- and lowercase forms. Replacing any forward slash with â7,â ampersand with â8â and converting the string to lowercase meant that the script recognized 44 of the 100 images.
Â
We can make a few other corrections because we know some facts about all captcha solutions. All captchas were five characters, consisting of lowercase letters or numbers only. Applying all of the corrections below to the OCR result improved the success rate to 52 out of 100 images.
Â
def apply_corrections(result):
result = result.strip()
result = result.replace("https://www.optiv.com/", '7')
result = result.replace('&','8')
result = result.replace('S','5')
result = result.replace(' ', '')
result = result[:5]
result = result.lower()
return result
Â
At this point, I launched the attack and was able to recover many debit card numbers. Since Optivâs Application Security testing tries to avoid denial-of-service attacks, I made no effort to attack the application with a multithreaded, high-volume attack. Still, I was able to recover many valid card numbers in a few minutes. This demonstrates that a captcha is not a robust single line of defense against automated attacks.
Â
When the captcha solution was correct, the remote API would either indicate that the card number wasnât valid or would prompt the user to provide a passcode to finish logging in. After five login failures, the account would lock and it would only unlock if the customer called the bank. This is an effective rate-limiting control that prevents access to the accounts. The application could have implemented a similar protection for debit card numbers or rejected multiple debit card number requests from the same IP address in a short span. The captcha would have enhanced that protection, since failing the captcha half of the time means I would make twice as many requests to identify a single debit card number, on average, and been locked out sooner.
Â
Â
Captchas, Huh! What are they good for?
Captchas provide defense-in-depth protection at the cost of an easy user experience (UX). My final captcha solving script is under 70 lines of Python code and uses a collection of free software. This isnât the strongest set of captcha images, but it took about six hours of hands-on-keyboard work to put together and costs nothing. If the reward is worth enough, there are more advanced paid captcha-solving tools attackers might use, and simply paying humans to solve captchas may be a viable approach. Applications (mobile and web) should not rely on captchas as a primary form of defense. Even things like reCatpcha arenât a panacea, although theyâre quite difficult to solve with computers.
Â
Instead, consider more robust approaches. Locking user accounts after a series of failed logins dramatically limits the effectiveness of password-guessing attacks. These locks should be logged and monitored. Sudden spikes of account lockouts may indicate that an attacker is targeting the application, and security personnel can take action against the attack. Password reset endpoints, which take a user identifier and send an email with a reset link, can perform rate-limitng based on IP address. If a client makes a large number of requests in a short period, more than is reasonable for a single user trying to reset their password, future requests from the same IP can be rejected without processing them. A captcha might be added as a defense-in-depth measure to make these controls even stronger, but should not be relied upon as a primary security control.
Â
References:
Gloss