Having read various resources about password strength I’m trying to create an algorithm that will provide a rough estimation of how much entropy a password has.
I’m trying to create an algorithm that’s as comprehensive as possible. At this point I only have pseudocode, but the algorithm covers the following:
- password length
- repeated characters
- patterns (logical)
- different character spaces (LC, UC, Numeric, Special, Extended)
- dictionary attacks
It does NOT cover the following, and SHOULD cover it WELL (though not perfectly):
- ordering (passwords can be strictly ordered by output of this algorithm)
- patterns (spatial)
Can anyone provide some insight on what this algorithm might be weak to? Specifically, can anyone think of situations where feeding a password to the algorithm would OVERESTIMATE its strength? Underestimations are less of an issue.
The algorithm:
// the password to test
password = ?
length = length(password)
// unique character counts from password (duplicates discarded)
uqlca = number of unique lowercase alphabetic characters in password
uquca = number of uppercase alphabetic characters
uqd = number of unique digits
uqsp = number of unique special characters (anything with a key on the keyboard)
uqxc = number of unique special special characters (alt codes, extended-ascii stuff)
// algorithm parameters, total sizes of alphabet spaces
Nlca = total possible number of lowercase letters (26)
Nuca = total uppercase letters (26)
Nd = total digits (10)
Nsp = total special characters (32 or something)
Nxc = total extended ascii characters that dont fit into other categorys (idk, 50?)
// algorithm parameters, pw strength growth rates as percentages (per character)
flca = entropy growth factor for lowercase letters (.25 is probably a good value)
fuca = EGF for uppercase letters (.4 is probably good)
fd = EGF for digits (.4 is probably good)
fsp = EGF for special chars (.5 is probably good)
fxc = EGF for extended ascii chars (.75 is probably good)
// repetition factors. few unique letters == low factor, many unique == high
rflca = (1 - (1 - flca) ^ uqlca)
rfuca = (1 - (1 - fuca) ^ uquca)
rfd = (1 - (1 - fd ) ^ uqd )
rfsp = (1 - (1 - fsp ) ^ uqsp )
rfxc = (1 - (1 - fxc ) ^ uqxc )
// digit strengths
strength =
( rflca * Nlca +
rfuca * Nuca +
rfd * Nd +
rfsp * Nsp +
rfxc * Nxc ) ^ length
entropybits = log_base_2(strength)
A few inputs and their desired and actual entropy_bits outputs:
INPUT DESIRED ACTUAL
aaa very pathetic 8.1
aaaaaaaaa pathetic 24.7
abcdefghi weak 31.2
H0ley$Mol3y_ strong 72.2
s^fU¬5ü;y34G< wtf 88.9
[a^36]* pathetic 97.2
[a^20]A[a^15]* strong 146.8
xkcd1** medium 79.3
xkcd2** wtf 160.5
* these 2 passwords use shortened notation, where [a^N] expands to N a's.
** xkcd1 = "Tr0ub4dor&3", xkcd2 = "correct horse battery staple"
The algorithm does realize (correctly) that increasing the alphabet size (even by one digit) vastly strengthens long passwords, as shown by the difference in entropy_bits for the 6th and 7th passwords, which both consist of 36 a’s, but the second’s 21st a is capitalized. However, they do not account for the fact that having a password of 36 a’s is not a good idea, it’s easily broken with a weak password cracker (and anyone who watches you type it will see it) and the algorithm doesn’t reflect that.
It does, however, reflect the fact that xkcd1 is a weak password compared to xkcd2, despite having greater complexity density (is this even a thing?).
How can I improve this algorithm?
Addendum 1
Dictionary attacks and pattern based attacks seem to be the big thing, so I’ll take a stab at addressing those.
I could perform a comprehensive search through the password for words from a word list and replace words with tokens unique to the words they represent. Word-tokens would then be treated as characters and have their own weight system, and would add their own weights to the password. I’d need a few new algorithm parameters (I’ll call them lw, Nw ~= 2^11, fw ~= .5, and rfw) and I’d factor the weight into the password as I would any of the other weights.
This word search could be specially modified to match both lowercase and uppercase letters as well as common character substitutions, like that of E with 3. If I didn’t add extra weight to such matched words, the algorithm would underestimate their strength by a bit or two per word, which is OK. Otherwise, a general rule would be, for each non-perfect character match, give the word a bonus bit.
I could then perform simple pattern checks, such as searches for runs of repeated characters and derivative tests (take the difference between each character), which would identify patterns such as ‘aaaaa’ and ‘12345’, and replace each detected pattern with a pattern token, unique to the pattern and length. The algorithmic parameters (specifically, entropy per pattern) could be generated on the fly based on the pattern.
At this point, I’d take the length of the password. Each word token and pattern token would count as one character; each token would replace the characters they symbolically represented.
I made up some sort of pattern notation, but it includes the pattern length l, the pattern order o, and the base element b. This information could be used to compute some arbitrary weight for each pattern. I’d do something better in actual code.
Modified Example:
Password: 1234kitty$$$$$herpderp
Tokenized: 1 2 3 4 k i t t y $ $ $ $ $ h e r p d e r p
Words Filtered: 1 2 3 4 @W5783 $ $ $ $ $ @W9001 @W9002
Patterns Filtered: @P[l=4,o=1,b='1'] @W5783 @P[l=5,o=0,b='$'] @W9001 @W9002
Breakdown: 3 small, unique words and 2 patterns
Entropy: about 45 bits, as per modified algorithm
Password: correcthorsebatterystaple
Tokenized: c o r r e c t h o r s e b a t t e r y s t a p l e
Words Filtered: @W6783 @W7923 @W1535 @W2285
Breakdown: 4 small, unique words and no patterns
Entropy: 43 bits, as per modified algorithm
The exact semantics of how entropy is calculated from patterns is up for discussion. I was thinking something like:
entropy(b) * l * (o + 1) // o will be either zero or one
The modified algorithm would find flaws with and reduce the strength of each password in the original table, with the exception of s^fU¬5ü;y34G<
, which contains no words or patterns.
6
Appendix A on p46 of NIST SP 800-63 talks about the work of Claude Shannon, who estimates password entropy using a number of bits. Indeed, this is the document that the XKCD cartoon uses to calculate the entropy bits. Specifically:
- the entropy of the first character is taken to be 4 bits;
- the entropy of the next 7 characters are 2 bits per character; this is roughly consistent with Shannon’s estimate that “when statistical
effects extending over not more than 8 letters are considered the
entropy is roughly 2.3 bits per character;”- for the 9th through the 20th character the entropy is taken to be 1.5 bits per character;
- for characters 21 and above the entropy is taken to be 1 bit per character;
- A “bonus” of 6 bits of entropy is assigned for a composition rule that requires both
upper case and non-alphabetic characters. This
forces the use of these characters, but in many cases thee characters
will occur only at the beginning or the end of the password, and it
reduces the total search space somewhat, so the benefit is probably
modest and nearly independent of the length of the password;- A bonus of up to 6 bits of entropy is added for an extensive dictionary check. If the
attacker knows the dictionary, he can avoid
testing those passwords, and will in any event, be able to guess much
of the dictionary, which will, however, be the most likely selected
passwords in the absence of a dictionary rule. The assumption is that
most of the guessing entropy benefits for a dictionary test accrue to
relatively short passwords, because any long password that can be
remembered must necessarily be a “pass-phrase” composed of dictionary
words, so the bonus declines to zero at 20 characters.
The idea is that an authentication system would pick certain entropy levels as thresholds. For example, 10 bits may be weak, 20 medium and 30 strong (numbers picked arbitrarily as an example, not a recommendation). Unfortunately, the document does not recommend such thresholds, probably because the computational power available to brute force or guess passwords increases over time:
As an alternative to imposing some arbitrary specific set of rules, an
authentication system might grade user passwords, using the rules
stated above, and accept any that meet some minimum entropy standard.
For example, suppose passwords with at least 24-bits of entropy were
required. We can calculate the entropy estimate of
“IamtheCapitanofthePina4” by observing that the string has 23
characters and would satisfy a composition rule requiring upper case
and non-alphabetic characters.
This may or may not be what you are looking for but is not a bad reference point, if nothing else.
[Edit: Added the following.]
The paper Testing Metrics for Password Creation Policies
by Attacking Large Sets of Revealed Passwords (by Matt Weir, Sudhir Aggarwal, Michael Collins and Henry Stern) demonstrated the Shannon model, described above, is not an accurate model of entropy for human generated passwords. I would recommend looking at “Section 5 Generating New Password Creation Policies” for more accurate proposals.
3
Check out the source code for KeePass at the bottom of this page. The QualityEstimation
class implements a rather nice algorithm which seems to be in line with what you’re looking to have in place. My results look as such:
aaa 8
aaaaaaaaa 9
abcdefghi 18
H0ley$Mol3y_ 73
s^fU¬5ü;y34G< 99
[a^36]* 10
[a^20]A[a^15]* 18
Tr0ub4dor&3 66
correct horse battery staple 98
9
You ask
Specifically, can anyone think of situations where feeding a password to the algorithm would OVERESTIMATE its strength?
But you have an example in the question. By design, xkcd2 has ~44 bits of entropy, but your estimate is 160.5 bits.
2
Can anyone provide some insight on what this algorithm might be weak to? Specifically, can anyone think of situations where feeding a password to the algorithm would OVERESTIMATE its strength?
You’ve hinted at some in the preamble (dictionary attacks, etc). Essentially, there are a number of common practices that can be guessed at by the attacker which greatly lowers the search space. I am pretty sure that your algorithm will “overestimate” the following:
- everywhere
- Everywhere
- Everywhere1
The password is quite long, but is trivially crackable since the original word appears in a basic dictionary, and the modifications are considered common enough to form part of any decent dictionary attack. Typical letter -> number conversions (i.e. 3v3rywh3r3) should also be considered pretty weak, and you should penalise for these.
To a much lesser degree, other trouble passwords may be ones that have obvious patterns, such as:
- abcdefghijklmnop
- abcde12345
Although these are probably less likely to be targeted in actual dictionary attacks, they suffer from similar problems as your “aaaaa…” example.
I’m not sure if password phrases are currently targeted in most dictionary attacks, but no doubt as they gain popularity, they will be targeted more and more. I think the famous xkcd example takes this into account, since only 11 bits are assigned for each “common word”. Your algorithm overestimates these types of passwords, as well.
So, to summarise, the algorithm does a fairly good job of the estimation, but it really should be taking into consideration the structure of the password and common, known patterns.
1