research-article

Open access

Breaking CaptchaStar Using the BASECASS Methodology

Authors:

Carlos Hernández-Castro,

David F. Barrero,

Maria Dolores R-MorenoAuthors Info & Claims

ACM Transactions on Internet Technology, Volume 23, Issue 1

Article No.: 5, Pages 1 - 12

https://doi.org/10.1145/3546867

Published: 27 February 2023 Publication History

All formats PDF

Abstract

In this article, we present fundamental design flaws of CaptchaStar. We also present a full analysis using the BASECASS methodology that employs machine learning techniques. By means of this methodology, we find an attack that bypasses CaptchaStar with almost 100% accuracy.

1 Introduction

IT security has a history comprising several decades. During it, several prevention, protection, and mitigation measures and mechanisms have been conceived. CAPTCHA—“Completely Automated Public Turing test to tell Computers and Humans Apart”—falls in a category of its own. It has been widely used to refer to technologies preventing automated online resource abuses. No other security mechanism has the task of remotely identifying the human species against an agent trying to mimic it.

Even though we use the name CAPTCHAs for these protection mechanisms, the name is misleading because CAPTCHAs, as they were defined by Naor [11] and Ahn et al. [15], are just a specific version of these protection mechanisms. For a Human Interaction Proof (HIP) to be a CAPTCHA, it has to meet certain requirements, including being based on an AI-hard problem, using a public algorithm, among others. Other mechanisms have been proposed, and more might be created, that do not follow these requirements but still try to solve this security problem. The design of any CAPTCHA is affected by its context and a set of practical constrains. This view is quite important to understand the difficulties and potential pitfalls in CAPTCHA design.

Many CAPTCHAs are based on image classification problems. They require the user to tell which images from a set pertain to a particular category [13]. Many CAPTCHAs of this kind have been analyzed and broken [2, 14, 17, 18], in some cases using deep learning.

Other image-based CAPTCHA proposals have arisen that are not based on classification. Among these, some interesting proposals are puzzle CAPTCHAs (as Capy CAPTCHA [12] or Key CAPTCHA, both broken by Hernández-Castro et al. [8]), Emerging Images [10], and CaptchaStar [1].

CaptchaStar is a recent CAPTCHA based on a novel idea and does not have similarities with any other precedent CAPTCHAs. Its base problem is the recomposition of an image. This recomposition is not done directly over the image, as in puzzle CAPTCHAs, but indirectly through the exploration with the mouse or pointer of a search space. For simplicity, this search space equals the image dimensions, although this is not necessary.

BASECASS [6, 7] is a methodology proposed to analyze the security of new CAPTCHA proposals. Its main purpose is to allow one to confirm that they meet a minimum level of security. In doing so, it can pinpoint their design weaknesses. BASECASS is designed to be used by cybersecurity experts and not by machine learning (ML) experts. This is why it recommends the practitioner to run the collected metrics through a plethora of ML algorithms without going too much in detail about which would be better in each case, and just letting the practitioner be aware that which ML algorithm performs best will depend on the dataset (the CAPTCHA and the metrics selected). Going into further detail would require the cybersecurity expert to learn some basic concepts in ML, which is something we explicitly try to avoid to make the methodology simpler to use.

BASECASS has been successfully applied to different CAPTCHA types, from text CAPTCHAs to image CAPTCHAs, and including puzzle CAPTCHAs. Yet before now, it had never been applied to a CAPTCHA based on detecting “order” in an image based on dots, nor a CAPTCHA with such a broad answer space. For that reason, we felt that applying BASECASS to this new CAPTCHA type would be of interest.

Then, our goal in this article is to present the application of BASECASS to CaptchaStar. Our analysis is able to find flaws in CaptchaStar that allow to bypass it with total accuracy.

There is already a published attack against CaptchaStar with a success rate of 96% [3, 4]. This attack uses ad hoc heuristics, whereas our attack is instead an example of application of a general methodology, obtaining a result that can be generalized to other CAPTCHAs.

We believe that the main contribution is that it applies almost without modification an out-of-the-box methodology that is general enough to be applied to other CAPTCHAs to check that they meet a minimum security level. In addition, as it will be shown, this attack is also interesting from the viewpoint of the application of BASECASS to a new CAPTCHA, especially so because in this case the number of possible answers is in principle too large to be successfully broken even if using a good ML classifier.

The organization of the article is as follows. In Section 2, we provide the description of Captcha Star. In Section 3, we briefly describe the BASECASS analysis methodology. In Section 4, we present our findings when applying BASECASS to CaptchaStar. Section 5 presents the details of our attack as well as its results. Then, some discussions of the study are described in Section 6. Finally, the conclusions and future research directions are outlined in Section 7.

2 CaptchaStar

CaptchaStar presents to the user a black and white image whose \(2 \times 2\) pixels have been reorganized depending on the coordinates of the pointer—the mouse or a virtual cursor on a touch screen. When the user moves the pointer, the pixels rearrange themselves, following a different straight line in each of the two coordinates. There is one coordinate in which the image looks as ordered as possible and represents some well-known item or icon, although with some noise.

That coordinate is the solution to the challenge. Figure 1 shows an example of how the image transforms when the user moves the pointer over it.

Fig. 1.

3 BASECASS

BASECASS [6, 7] is a framework designed to check that a new CAPTCHA proposal does not have typical side-channel attacks. This means that it does not leak enough information in a way that would let an attacker solve the CAPTCHA frequently enough, without the need to solve the base problem the CAPTCHA is based on. This is a necessary condition for the proper transfer of the problem difficulty (and thus security strength) from the base problem to the CAPTCHA design and implementation.

BASECASS can be divided into three main steps or iterations: a black-box basic security analysis of the CAPTCHA, an additional analysis based on statistical analysis and/or ML, and a parameter-related statistical analysis and/or ML analysis. Depending on the CAPTCHA type, the third iteration might not be possible, as it will require further insight or access into the CAPTCHA design. If it is possible, it will typically provide more accurate information about the minimum security level of the CAPTCHA.

BASECASS starts by doing a black-box basic, initial security analysis of the CAPTCHA. This is an external analysis based only on public information. During it, we will not pay attention to possible clues about the challenge design. In a general way, our black-box analysis can be divided into the following steps:

Phase I

Automatic interaction: The objective of this phase is to develop a way to interact semi-automatically with the CAPTCHA. We want to do so to download challenges from the CAPTCHA, send the possible answers to the CAPTCHA server, and receive its answer so that we can grade the answers.

Phase II

Analysis of the challenge space: In this phase, we try to know what types and subtypes of challenges the CAPTCHA presents. For example, a CAPTCHA can present two different types of challenges: OCR and image-based challenges. The subtypes that it presents can be heavily distorted words or sentences (for OCR), and image classification and reconstruction (for the image-based challenges). We are interested in establishing what possible different challenge types are easily distinguishable by a bot. We will relate these subtypes to the base problem that the CAPTCHA is theoretically based on. Is the base domain easy to explore for a bot? If it is possible within a reasonable cost, we will also want to check statistically their distribution to search for deviations from uniform. When possible, we also compare its size to the size of the base problem of the CAPTCHA.

Phase III

Analysis of the answer space: This phase focuses on checking the size and distribution of the possible answers to the challenges. Note that it will not always be possible to explore this space automatically. We might need to solve a number of challenges to study the distribution. This might be within reasonable costs or not depending on each case. Following with the previous example, we would like to know if all words or sentences are possible solutions for the OCR CAPTCHA, and what classes are used in the image-based CAPTCHA. We want to check their distribution, both globally and per challenge type. Are there any deviations from the uniform? If so, are they severe enough as to allow a successful attack?

The second step of BASECASS comprises a semi-automatic analysis of the side-channel statistics referred to the challenges. This step has four clearly defined phases. In the first one, we will prepare the challenges for processing (possibly including de-noising, pre-processing, any possible reverse transformation, etc.). In the second phase, we select and/or create metrics (statistics) that are potentially useful to characterize the challenges. In the third phase, we will use these metrics, together with the previously saved challenges and answers, to analyze the CAPTCHA statistically. This phase is optional. The fourth phase uses again the same metrics to analyze the CAPTCHA using different ML algorithms.

The third step of BASECASS explores possible weaknesses and correlations between the challenges and their correct answers, but does so taking into account the values of the different parameters that are used when creating a challenge. Note that these values are not always accessible nor easy to deduct from a produced challenge. Thus, this step is not always possible.

For a more detailed description of BASECASS, including examples of its application, we forward the reader to the work of Hernández-Castro et al. [6, 7].

4 BASECASS Analysis of CaptchaStar

In this section, we describe the steps and phases that were required to break CaptchaStar. Note that only two steps out of three are described, as that last one was not needed since in step 2 the CAPTCHA was already broken.

4.1 Step 1: Black-Box Analysis

BASECASS recommends to create a way to automatically interact with the CAPTCHA analyzed. In this case, this was simply done through a program in Python that was able to download a challenge, send its answer to the CaptchaStar server, grab its response, and record a log of it. The answer was initially provided by humans through a replica of the interface of CaptchaStar. Later, it was found that CaptchaStar allows for requesting the validity of different answers for the same challenge, which allowed to use it as an oracle to also find the corresponding solution.

BASECASS recommends us to compare the theoretical size of the base problem being used, and the actual size of the challenges being proposed by the CAPTCHA, as a way to compare its strength to that of the base AI problem. In this case, the size of P can be very roughly estimated through how many different black and white images of \(300 \times 300\) pixels can there be, if the pixel size for the image is indeed 4 pixels, and if we restrict ourselves to no more than \(80\%\) of the pixels in white. This would lead to \(2^{0.8 \times {300 \times 300} \over {4 \times 4}} = 3.5 \times 10^{13}\). This is just an estimation, as many of these possible images would be nonsense and could not be used thus as solutions. To compare the size of H, we downloaded \(2,\!000\) images and checked that they were using \(1,\!631\) different base images. Note that we have counted the number of images, but not the transformations performed on them. When CaptchaStar presented the same image to the user, the transformation on its pixels was different, so there is theoretically no way for an attacker to reuse a previously solved challenge to pass a new one. Even though the challenge space H in CaptchaStar is smaller when compared to P, thanks to the number of possible transformations, it is big enough to prevent a brute-force attack based on repeated challenges.

During our interactions with CaptchaStar, we were able to test that solutions that were not optimal were still accepted by CaptchaStar if they were not too far from the optimal solution. We determined that any solution in a \(12 \times 12\) pixel square around the optimal solution would be accepted too, reducing the answer space needed to explore to \({300 \times 300}\over {12 \times 12} = 6250\), with a brute-force single attack success rate of \(0.016\%\).

We found that the demo implementation of CaptchaStar allowed to test several solutions for a single challenge. This allowed us to estimate the distribution of correct answers and compare it to an uniform distribution. After solving \(5,\!451\) challenges, we produced a heat map (here plotted using Gaussian smoothing) of the centers of correct answers that can be seen in Figure 2. As can be seen, CaptchaStar does seldom produce challenges of which answers lie close to the borders of the \(300 \times 300\) pixel canvas. Interestingly, the distribution of peaks is more accentuated in the case of a pseudo-random uniform distribution than in CaptchaStar. Pearson’s \(\chi ^2\) is \(92,\!474.15\), indicating a p–value \(\lt 0.00001\), which is significant for a distribution with \(89,\!999\) degrees of freedom (at a significance level of 0.05).

Fig. 2.

Even when CaptchaStar allows for a margin of error of 12 pixels while answering, and even though the number of images used is limited, the transformations done in them and the semi-random choosing of the center of correct answers make it resilient enough to a brute-force attack. The fact that several answers can be tested using the CaptchaStar demo implementation allows for an Oracle attack, but this can be easily solved by the designers of CaptchaStar.

4.2 Step 2: Statistical/ML Analysis

Once completing the first step of BASECASS, we proceed to the second: the S/ML analysis. To do so, we have to determine which metrics to use and whether any de-noising, pre-processing, or transformation would be beneficial. The images were not altered in a way that was easy to undo, and precisely this alteration embeds the problem on which the CAPTCHA is based. Thus, we have to process the images as they are.

To define which metrics could be of interest, we picked up some well-known metrics that gather some basic information from each image:

•

General purpose metrics:

–

Results of the ENT test of randomness [16]. A brief summary of this test battery follows:

Entropy. This test computes the entropy [5] of the sequence under examination, as defined in classical information theory. A random sequence is rich in entropy.

Chi-squared (\(\chi ^{2}\)) test. This is a widely used test described by Knuth in the classical book The Art of Programming [9]. The chi-squared test computes the frequency of the symbols and compares it with the theoretical frequency in a random sequence. To perform this comparison, the \(\chi ^{2}\) statistic is computed.

Arithmetic mean. As the name suggests, this is just the arithmetic mean of the symbols in the sequence. The expected statistic value for a true random sequence is 0.5 in binary modeand 127.5 in byte mode.

Monte Carlo value for \(\pi\). This test interprets the sequence as coordinates in a square and counts the number of points that falls into a circle inscribed within the square. This number is used to estimate the value of \(\pi\). A truly random sequence will approximate \(\pi\), whereas a non-random number is not expected to approximate \(\pi\) well. The test uses the statistic pierror, which contains the error between \(\pi\) and its estimation \(\hat{\pi }\).

Serial correlation [9]. Serial correlation computes the correlation of each symbol in the sequence with its previous symbol. A good random sequence would provide low correlation close to 0, whereas a bad random sequence would achieve a correlation close to 1.

–

Size after compression: This gives an estimate of the amount of information contained. We used the JPEG compression algorithm as implemented by the PILLOW Python library, with qualities of 1 and 95 (lowest and highest recommended).

•

Ad hoc metrics: We did not use any ad hoc metric, as we did not find any ad hoc metric that we thought could be relevant to CaptchaStar.

•

Comparative metrics: The answer space is quite large, so we decided to follow BASECASS recommendation and create comparative metrics within the same challenge for all numerical results of the ENT test and for the size results. We did so normalizing all numeric answer ranges within the same challenge.

We decided to run the tests with these simple metrics and see whether they would allow for proper classification. A summary with a brief description of the metrics used in our BASECASS attack to CaptchaStar is presented in Table 1.

Table 1.

Group	Label	Description
ENT	Entropy	Entropy of the sequence
ENT	\(\chi ^{2}\)	Measure of the difference between the observed and theoretical frequency
ENT	Arithmetic mean	Arithmetic mean of the symbols in a sequence
ENT	MC \(\pi\)	Monte Carlo estimation for \(\pi\)
ENT	Serial correlation	Correlation between consecutive symbols in a sequence
JPEG	Size1	File size after JPEG compression with quality of 1
JPEG	Size95	File size after JPEG compression with quality of 95
Comparative	ENT-x	A comparative metric on the range 0.1 for each metric from ENT
Comparative	JPEG-size-x	A comparative metric on the range 0.1 for each size metric derived from JPEG

Table 1. Summary of Metrics Used in the BASECASS Statistical/ML Analysis

We had to determine the training and tests sets to use for ML. We used the 5,451 challenges downloaded and answered in the previous step. To create a training/test set, we applied these metrics to the images resulting on putting the cursor on different positions. In particular, we created two sets:

(1)

The first one contained the images resulting when we divided the answer space in three parts—that is, when the possible coordinates are \((0,0), (150,0), (300,0), (150,0), (150,150) \ldots (300,300)\). We included another \(3 \times 3\) coordinates derived from positions at \((-10, 0, +10)\) offset of the coordinates from the center of the correct answers. This produced a maximum total of 18 images per challenge. To create the training/test files, we applied the mentioned metrics to these images. We will call this the simple dataset.

(2)

The second one contains images resulting from dividing the answer space into five parts. Similarly, it contains another \(5 \times 5\) coordinates derived from dividing the \([-10 \ldots 10]\) offset range into five parts. In addition, we added coordinates at \([-1,1]\) from the center of correct coordinates. Note that even though these coordinates are marked as correct by CaptchaStar, we marked them as wrong to see if some ML algorithms are able to differentiate which among almost-perfect and perfect solutions. In total, this produced a maximum of 59 images per challenge, to which we applied the mentioned metrics to produce the corresponding file. We will call this the detailed dataset.

We used the ML framework Weka, as it includes several classifiers that can be run using default parameters out of the box. This allows us to create a single dataset (in ARFF format) and use it with all of the classifiers, comparing results. To test the performance of each classifier, we have to settle on a metric to use. When dealing with balanced datasets, a simple metric like the accuracy or the area under the ROC curve can be adequate. Instead, for heavily imbalanced datasets—that is, those in which a class is much less represented than another—other metrics like the \(f_1\) score, the area under the precision/recall curve, or the \(\kappa\) metric are better.

In the case of CaptchaStar, most examples/points are negative (do not pass the CAPTCHA) and only a few are positive. This means that the dataset is very imbalanced. So in this case, to determine which classifier performed best, we decided to use the \(\kappa\) metric, as it is more significant than the accuracy or others when dealing with unbalanced training and tests sets like the ones we have, with only \(1/18\) and \(1/59\) correct answers, respectively.

There are different ways in which to create the train and tests set from data. A typical way, that applies in most cases, is to use cross validation, in which we repeat the testing creating different test sets each time, averaging the results. In this case, we decided to use threefold cross validation for testing.

We tested both datasets with different ML algorithms to determine those that were more successful. Tables 2 and 3 show the top classifiers by their \(\kappa\) metric for both the simple and detailed dataset, respectively. Of a total of 163 classifiers in Weka, only 37 and 34, respectively, were able to load the data and present a solution within the time-out (5 minutes).

Table 2.

Classifier	\(\kappa\)	Accuracy
meta.MultiClassClassifier	0.99	1.00
functions.Logistic	0.99	1.00
trees.RandomTree	0.99	1.00
trees.J48	0.99	1.00
meta.Bagging	0.99	1.00
meta.WeightedInstancesHandlerWrapper	0.98	1.00
meta.RandomSubSpace	0.98	1.00
functions.SimpleLogistic	0.98	1.00
functions.SGD	0.98	1.00
functions.SMO	0.98	1.00
meta.FilteredClassifier	0.98	1.00
meta.AttributeSelectedClassifier	0.98	1.00
rules.DecisionTable	0.97	1.00
bayes.NaiveBayesUpdateable	0.96	1.00
bayes.NaiveBayes	0.96	1.00
meta.AdaBoostM1	0.95	0.99
trees.HoeffdingTree	0.93	0.99
bayes.BayesNet	0.90	0.99
trees.DecisionStump	0.90	0.99
rules.OneR	0.90	0.99
bayes.NaiveBayesMultinomialUpdateable	0.03	0.51
bayes.NaiveBayesMultinomial	0.03	0.51
misc.InputMappedClassifier	0.00	0.94
meta.MultiScheme	0.00	0.94
rules.ZeroR	0.00	0.94
bayes.NaiveBayesMultinomialText	0.00	0.94
functions.SGDText	0.00	0.94

Table 2. Results of Different ML Algorithms on the Simple CaptchaStar Dataset, Ordered by the \(\kappa\) Statistic

Table 3.

Classifier	\(\kappa\)	Accuracy
meta.RandomCommittee	0.76	0.99
trees.J48	0.36	0.98
meta.AttributeSelectedClassifier	0.28	0.98
meta.FilteredClassifier	0.26	0.98
bayes.BayesNet	0.20	0.89
bayes.NaiveBayesUpdateable	0.18	0.88
bayes.NaiveBayes	0.18	0.88
meta.MultiClassClassifier	0.08	0.98
functions.Logistic	0.08	0.98
trees.HoeffdingTree	0.08	0.97
rules.OneR	0.05	0.98
meta.LogitBoost	0.05	0.98
meta.AdaBoostM1	0.03	0.98
bayes.NaiveBayesMultinomialUpdateable	0.01	0.50
bayes.NaiveBayesMultinomial	0.01	0.49
functions.SimpleLogistic	0.00	0.98
functions.SGD	0.00	0.98
meta.Vote	0.00	0.98
misc.InputMappedClassifier	0.00	0.98
rules.ZeroR	0.00	0.98
meta.MultiScheme	0.00	0.98
bayes.NaiveBayesMultinomialText	0.00	0.98
functions.SGDText	0.00	0.98
trees.DecisionStump	0.00	0.98

Table 3. Results of Different ML Algorithms on the Detailed CaptchaStar Dataset, Ordered by the \(\kappa\) Statistic

Many ML algorithms are able to classify the simple dataset with a high \(\kappa\) value. In particular, the \(meta.RandomCommittee\) (an ensemble of random classifiers), the \(functions.Logistic\) (multinomial logistic regression model with a ridge estimator), and two tree-based classifiers (\(trees.RandomTree\) and \(trees.J48\)) obtained the best results. They all obtain a \(\kappa\) of 0.99 and a perfect accuracy. This implies than an attack using any of them might be feasible.

For the second training/test set, the detailed dataset, we got worst results, as expected. At the top of the scale, there is again a meta classifier (an ensemble). The first pure classifier that reaches a decent solution is J48, with a \(\kappa\) of 0.36. Even though it is not too high, it should be enough as to perform an attack.

5 Attack

As per BASECASS, we proceed to the attack using the best-performing ML algorithms from each training/test set. We select the pre-trained models for the J48 trees for both the simple and detailed datasets, as well as the \(meta.RandomCommittee\), the \(functions.Logistic,\) and \(trees.RandomTree\).

We design an attack that downloads a challenge and creates all possible images related to answers in a grid of \(5 \times 5\) pixels. After applying the metrics to them, it runs one of these pre-trained Weka models to choose the most promising answer. It then sends this answer to the CaptchaStar server to test whether it is the correct one.

The creation of possible answers to a challenge and the extraction of metrics from them is very time consuming, as there are theoretically a total of \(9 \times 10^{4}\) possible answers. As we have determined that CaptchaStar allows for imprecisions of up to 12 pixels, we divide the answer space into \(10 \times 10\)-pixel grids and analyze only the 900 possible challenge answers. This is not adequate for the models that we have trained to be more accurate when discriminating answers closer to the solution center (detailed datasets). Because of this, we try to use a search grid of just \(2 \times 2\) pixels with these models while substantially reducing the number of experiments due to the very long experiment time. The results of these attacks can be seen in Table 4.

Table 4.

Data Source	Model Name	Steps	Examples	Correct	%	Total Secs
Simple	meta-Bagging.model	10	200	151	75.50	51.30
Simple	functions-Logistic.model	10	200	170	85.00	49.97
Simple	trees-J48.model	10	200	124	62.00	50.01
Simple	meta-MultiClassClassifier.model	10	200	170	85.00	48.39
Detailed	meta-RandomCommittee.model	10	200	36	18.00	48.18
Detailed	meta-RandomCommittee.model	2	10	4	40.00	1,774.85
Detailed	trees-J48.model	2	10	5	50.00	1,859.28

Table 4. Attack Success Rates and Additional Results for Different ML Algorithms and Different Search Grids on Both the Simple and the Detailed CaptchaStar

We found that even using metrics that have not been tailored to CaptchaStar, we can find attacks that bypass it with an \(85\%\) success rate. Increasing the training set will probably lead to higher success rates.

6 Discussion

We have identified important vulnerabilities in the design of CaptchaStar through the application of BASECASS. Although some of them could be improved, we could not think of ways to address the main problem in the design of CaptchaStar. In other words, each challenge is giving away too much side-channel information, so the correct answer can be singled out by measurement of its entropy in comparison to the other possible answers. Even though we have discussed possible ways to improve this, we could not come up with a solid solution for CaptchaStar.

If we use a specific heuristic approach instead of our general approach, CaptchaStar can be broken by a success rate of 96% as Gougeon and Lacharme reported [3, 4]. This approach, although more successful that BASECASS, requires a more detailed study of the insides of the CAPTCHA and the development of ad hoc heuristics that in principle are only of use for this particular CAPTCHA. Instead of this, we have shown that applying BASECASS out of the box and with almost no modification is also able to break CaptchaStar, which is a new type of CAPTCHA to which BASECASS had not been applied before. In addition, the answer space (or search space) of CaptchaStar is broad enough to make it challenging for any ML approach, but we check that even in this case, BASECASS is able to find important weaknesses in its design and mount a successful attack.

BASECASS has been successfully applied to other CAPTCHAs. For further insights on the methodology or its applications and lessons learned, we refer the reader to previous work [6, 7].

7 Conclusion

CaptchaStar presents a novel idea for a CAPTCHA, and apart from an easy-to-correct implementation mistake, it is quite well designed. It has no major design flaws, but the problem it is based on is not hard enough: it suffers from presenting too much information to the user, and this information can in fact be characterized easily using typical, non-tailored metrics.

Using just general metrics, we are able to attain a very good success rate both for offline classification (\(100\%\)) and attack success (\(85\%\)), even when we restrict the answer space. Increasing the number of training examples, it seems possible to increase the success rate of our attack even further.

In this article, we have presented an example of the application of BASECASS—a methodology that guides a practitioner in testing a basic security level for any new CAPTCHA proposal. We have presented an overview of it as well. We have seen that, when applied to this new CAPTCHA proposal, it is able to render results powerful enough to completely bypass it. BASECASS has been successful in finding weaknesses and exploiting them for a number of other CAPTCHAs.

This example of a successful application of BASECASS further suggests that it can be put to use to test new CAPTCHA designs.

Our next step is to analyze other recent CAPTCHA designs that have appeared in the market to analyze with BASECASS their limitations and demonstrate the advantages of our methodology.

This work is co-funded by the JCLM project SBPLY/19/180501/000024 and the Spanish Ministry of Science and Innovation project PID2019-109891RB-I00, both under the European Regional Development Fund (FEDER).

The work of María D. R-Moreno was partly supported by the Spanish Ministry of Sciences, Innovation and University under the grant number PRX18/00563.

References

[1]

Mauro Conti, Claudio Guarisco, and Riccardo Spolaor. 2016. CAPTCHaStar! A Novel CAPTCHA Based on Interactive Shape Discovery. Springer International, Guildford, UK, 611–628.

Abstract

1 Introduction

2 CaptchaStar

3 BASECASS

4 BASECASS Analysis of CaptchaStar

4.1 Step 1: Black-Box Analysis

4.2 Step 2: Statistical/ML Analysis

5 Attack

6 Discussion

7 Conclusion

References

Cited By

Index Terms

Recommendations

BASECASS: A methodology for CAPTCHAs security assurance

Machine Learning: The State of the Art

Machine learning and empathy: the Civil Rights CAPTCHA

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations