Tuesday, July 20, 2010

Password cracking

Password cracking is the process of recovering passwords from data that has been stored in or transmitted by a computer system. A common approach is to repeatedly try guesses for the password. The purpose of password cracking might be to help a user recover a forgotten password (though installing an entirely new password is less of a security risk, but involves system administration privileges), to gain unauthorized access to a system, or as a preventive measure by system administrators to check for easily crackable passwords. On a file-by file basis, password cracking is utilized to gain access to digital evidence for which a judge has allowed access but the particular file's access is restricted.

    1 Background
    2 Principal attack methods

        2.1 Weak encryption
        2.2 Guessing, dictionary and brute force attacks

            2.2.1 Guessing
            2.2.2 Dictionary attacks
            2.2.3 Brute force attack

        2.3 Precomputation

            2.3.1 Salting
            2.3.2 Early Unix password vulnerability

    3 Prevention
    4 Software
    5 References
    6 See also
    7 External links


Passwords to access computer systems are usually stored in a database so that the system can perform password verification when a user attempts to log in or access a restricted resource. To preserve confidentiality of system passwords, the password verification data is typically not stored in cleartext form, but instead a one-way function is applied to the password, possibly in combination with other data, and the resulting value is stored. When a user later attempts to authenticate by entering the password, the same function is applied to the entered value and the result is compared with the stored value. If they match, there is an extremely high probability that the entered password was correct. For simplicity in this discussion, we will refer to the one way function employed (which may be either an encryption function or cryptographic hash) as a hash and its output as a hashed password.

Even though functions that create hashed passwords may be cryptographically secure, possession of the hashed password provides a quick way to test guesses for the password by applying the one-way function to each guess, and comparing the result to the verification data. The most commonly used hash functions can be computed rapidly and the attacker can test guesses repeatedly with different guesses until one succeeds, meaning that the plaintext password has been recovered.

The term password cracking generally refers to recovery of one or more plaintext passwords from hashed passwords, but there are also many other ways of obtaining passwords illicitly. Without the hashed version of a password, the attacker can still attempt access to the computer system in question with guessed passwords. But well-designed systems limit the number of failed access attempts and can alert administrators to trace the source of the attack if that quota is exceeded. If he has the hashed password, the attacker can work undetected, and if the attacker has obtained several hashed passwords, the chance of cracking at least one is quite high.

Other ways to obtain passwords include social engineering, wiretapping, keystroke logging, login spoofing, dumpster diving, phishing, shoulder surfing, timing attack, acoustic cryptanalysis, using a Trojan Horse or virus, identity management system attacks (such as abuse of Self-service password reset) and compromising host security (see password for details). While those methods are not considered "password cracking" they are very popular among criminals (notably phishing) and remain very effective. They are often considered as the main vulnerability in password authentification systems.

Common methods for verifying users over a computer network often expose the hashed password. For example, use of a hash-based challenge-response authentication method for password verification may provide a hashed password to a network eavesdropper, who can then crack the password. A number of stronger cryptographic protocols exist that do not expose hashed passwords during verification over a network, either by protecting them in transmission using a high-grade key, or by using a zero-knowledge password proof.
Principal attack methods
Weak encryption

If a system uses a poorly designed password hashing scheme to protect stored passwords, an attacker can exploit any weaknesses to recover even 'well-chosen' passwords. One example is the LM hash that Microsoft Windows XP and previous versions use by default to store user passwords of less than 15 characters in length. LM hash converts the password into all uppercase letters then breaks the password into two 7-character fields which are hashed separately—which allows each half to be attacked individually.

Password encryption schemes that use stronger hash functions like MD5, SHA-512, SHA-1, and RIPEMD-160 can still be vulnerable to brute-force and precomputation attacks. Such attacks do not depend on reversing the hash function. Instead, they work by hashing a large number of words or random permutations and comparing the result of each guess to a user's stored password hash. Modern schemes such as MD5-crypt[1] and bcrypt use purposefully slow algorithms so that the number of guesses that an attacker can make in a given period of time is relatively low. Salting, described below, greatly increases the difficulty of such precomputation attacks, perhaps sufficiently to resist all attacks; every instance of its use must be evaluated independently, however.

Because progress in analyzing existing cryptographic hash algorithms is always possible, a hash which is effectively invulnerable today may become vulnerable tomorrow. Both MD5 and SHA-1, long thought secure, have been shown vulnerable to less than brute force efficiency attacks. For encryption algorithms (rather different than cryptographic hashes) the same has been true. DES has been broken (in the sense of more efficient than brute force attacks being discovered), and computers have become fast enough that its short key (56 bits) is clearly and publicly insecure against even brute force attacks. Passwords protected by these measures against attack will become vulnerable, and passwords still in use thereby exposed. Historical records are not always and forever irrelevant to today's security problems.
Guessing, dictionary and brute force attacks

The distinction between guessing, dictionary and brute force attacks is not strict. They are similar in that an attacker goes through a list of candidate passwords one by one; the list may be explicitly enumerated or implicitly defined, can incorporate knowledge about the victim, and can be linguistically derived. Each of the three approaches, particularly 'dictionary attack', is frequently used as an umbrella term to denote all the three attacks and the spectrum of attacks encompassed by them.

Passwords can sometimes be guessed by humans with knowledge of the user's personal information. Examples of guessable passwords include:

    blank (none)
    the words "password", "passcode", "admin" and their derivatives
    a row of letters from the qwerty keyboard -- qwerty itself, asdf, or qwertyuiop)
    the user's name or login name
    the name of a significant other, a friend, relative or pet
    their birthplace or date of birth, or a friend's, or a relative's
    their automobile license plate number, or a friend's, or a relative's
    their office number, residence number or most commonly, their mobile number.
    a name of a celebrity they like
    a simple modification of one of the preceding, such as suffixing a digit, particularly 1, or reversing the order of the letters.
    a swear word

Personal data about individuals are now available from various sources, many on-line, and can often be obtained by someone using social engineering techniques, such as posing as an opinion surveyor or a security control checker. Attackers who know the user may have information as well. For example, if a user chooses the password "YaleLaw78" because he graduated from Yale Law School in 1978, a disgruntled business partner might be able to guess the password.

Guessing is particularly effective with systems that employ self-service password reset. For example, in September 2008, the Yahoo e-mail account of Governor of Alaska and Vice President of the United States nominee Sarah Palin was accessed without authorization by someone who was able to research answers to two of her security questions, her zip code and date of birth and was able to guess the third, where she met her husband.Dictionary attacks

Main article: Dictionary attack

See also: Password strength and Password policy

Users often choose weak passwords. Examples of insecure choices include the above list, plus single words found in dictionaries, given and family names, any too short password (usually thought to be 6 or 7 characters or less), or any password meeting a too restrictive and so predictable, pattern (eg, alternating vowels and consonants). Repeated research over some 40 years has demonstrated that around 40% of user-chosen passwords are readily guessable by sophisticated cracking programs armed with dictionaries and, perhaps, the user's personal information.[1]

In one survey of MySpace passwords obtained by phishing, 3.8 percent of those passwords were a single word findable in a dictionary, and another 12 percent were a word plus a final digit; two-thirds of the time that digit was 1.[2]

Some users neglect to change the default password that came with their computer system account. And some administrators neglect to change default account passwords provided by the operating system vendor or hardware supplier. An infamous example is the use of FieldService as a user name with Guest as the password. If not changed at system configuration time, anyone familiar with such systems will have 'cracked' an important password; such service accounts often have higher access privileges than do a normal user accounts. Lists of default passwords are available on the Internet.[3] [4] Gary McKinnon, accused by the United States of perpetrating the "biggest military computer hack of all time"[5], has claimed that he was able to get into the military's networks simply by using a Perl script that searched for blank passwords; in other words his report suggests that there were computers on these networks with no passwords at all. [6]

Cracking programs exist which accept personal information about the user being attacked and generate common variations for passwords suggested by that information
Brute force attack

Main article: Brute force attack

A last resort is to try every possible password, known as a brute force attack. In theory, if there is no limit to the number of attempts, a brute force attack will always be successful since the rules for acceptable passwords must be publicly known; but as the length of the password increases, so does the number of possible passwords. This method is unlikely to be practical unless the password is relatively short, however techniques using parallel processing can reduce the time to find the password in inverse proportion to the number of computer devices (CPUs) in use. This depends heavily on whether the prospective attacker has access to the hash of the password as well as the hashing algorithm, in which case the attack is called an offline attack (it can be done without connection to the protected resource) or not, in which case it is called an online attack. Offline attack is generally much easier, because testing a password is reduced to a mathematical computation of the hash of the password to be tried and comparison with the hash of the real password. In an online attack the attacker has to try to authenticate himself with all the possible passwords, and rules and delays can be imposed by the system and the attempts can be logged.

A common password length recommendation is eight or more randomly chosen characters combining letters, numbers, and special characters (punctuation, etc). This recommendation makes sense for systems using stronger password hashing mechanisms such as md5-crypt and the Blowfish-based bcrypt, but is inappropriate for many Microsoft Windows systems because they store a legacy LAN Manager hash which splits the password into two seven character halves. On these systems, an eight character password is converted into a seven character password and a one character password. For better security, LAN Manager password storage should be disabled if it will not break supported legacy systems.[9] Systems which limit passwords to numeric characters only, or upper case only, or generally those which limit the range of possible password character choices, also make brute force attacks easier. Using longer passwords in these cases (if possible) can compensate for the limited allowable character set. Of course, even with an adequate range of character choice, users who limit themselves to an obvious subset of the available characters (e.g., use only upper case alphabetic characters, or only digits) make brute force attacks against their accounts much easier.

Generic brute-force search techniques are often successful, but smart brute-force techniques, which exploit knowledge about how people tend to choose passwords, pose an even greater threat. NIST SP 800-63 (2) provides further discussion of password quality, and suggests, for example, that an 8 character user-chosen password may provide somewhere between 18 and 30 bits of entropy (randomness), depending on how it is chosen. For example 24 binary digits of randomness is equivalent to 3 randomly chosen bytes, or approximately 5 random characters if they are restricted to upper case alphabetic characters, or 2 words selected from a 4000 word vocabulary. This amount of entropy is far less than what is generally considered safe for an encryption key.

How small is too small for offline attacks thus depends partly on an attacker's ingenuity and resources (e.g. available time and computing power). The second of these will increase as computers get faster. Most commonly used hashes can be implemented using specialized hardware, allowing faster attacks. Large numbers of computers can be harnessed in parallel, each trying a separate portion of the search space. Unused overnight and weekend time on office computers can also be used for this purpose.

Further information: Rainbow table

In its most basic form, precomputation involves hashing each word in the dictionary (or any search space of candidate passwords) and storing the word and its computed hash in a way that enables lookup on the list of computed hashes. This way, when a new encrypted password is obtained, password recovery is instantaneous. Precomputation can be very useful for a dictionary attack if salt is not used properly (see below), and the dramatic decrease in the cost of mass storage has made it practical for fairly large dictionaries.

Advanced precomputation methods exist that are even more effective. By applying a time-memory tradeoff, a middle ground can be reached - a search space of size N can be turned into an encrypted database of size O(N2/3) in which searching for an encrypted password takes time O(N2/3). The theory has recently been refined into a practical technique. Another example[10] cracks alphanumeric Windows LAN Manager passwords in a few seconds. This is much faster than brute force attacks on the obsolete LAN Manager, which uses a particularly weak method of hashing the password. Windows systems prior to Windows Vista/Server 2008 compute and store a LAN Manager hash by default for backwards compatibility.[9]

A technique similar to precomputation, known generically as memoization, can be used to crack multiple passwords at the cost of cracking just one. Since encrypting a word takes much longer than comparing it with a stored word, a lot of effort is saved by encrypting each word only once and comparing it with each of the encrypted passwords using an efficient list search algorithm. The two approaches may of course be combined: the time-space tradeoff attack can be modified to crack multiple passwords simultaneously in a shorter time than cracking them one after the other.

Further information: Salt (cryptography)

The benefits of precomputation and memoization can be nullified by randomizing the hashing process. This is known as salting. When the user sets a password, a short, random string called the salt is suffixed to the password before encrypting it; the salt is stored along with the encrypted password so that it can be used during verification. Since the salt is usually different for each user, the attacker can no longer construct tables with a single encrypted version of each candidate password. Early Unix systems used a 12-bit salt. Attackers could still build tables with common passwords encrypted with all 4096 possible 12-bit salts. However, if the salt is long enough, there are too many possibilities and the attacker must repeat the encryption of every guess for each user. Modern methods such as md5-crypt and bcrypt use salts of 48 and 128 bits respectively.[11]
Early Unix password vulnerability

Early Unix implementations limited passwords to 8 characters and used a 12-bit salt, which allowed for 4096 possible salt values. While 12 bits was conventionally considered good enough for most purposes in the 1970s, by 2005 disk storage had become cheap enough that an attacker can precompute the hashes of millions of common passwords, including all 4096 possible salt variations for each password, and store the precomputed values on a single portable hard drive. An attacker with a larger budget can build a disk farm with all 6 character passwords and the most common 7 and 8 character passwords stored in encrypted form, for all 4096 possible salts. And when several thousand passwords are being cracked at once, memoization still offers some benefit. Since there is little downside to using a longer salt, and because they render any precomputation or memoization hopeless, modern implementations choose to do so.

Main article: Shadow password

The best method of preventing password cracking is to ensure that attackers cannot get access even to the encrypted password. For example, on the Unix operating system, encrypted passwords were originally stored in a publicly accessible file /etc/passwd. On modern Unix (and similar) systems, on the other hand, they are stored in the file /etc/shadow, which is accessible only to programs running with enhanced privileges (ie, 'system' privileges). This makes it harder for a malicious user to obtain the encrypted passwords in the first instance. Unfortunately, many common network protocols transmit passwords in cleartext or use weak challenge/response schemes.
Modern Unix systems have replaced traditional DES-based password hashing with stronger methods based on MD5 and Blowfish.[14] Other systems have also begun to adopt these methods. For instance, the Cisco IOS originally used a reversible Vigenere cipher to encrypt passwords, but now uses md5-crypt with a 24-bit salt when the "enable secret" command is used.[15] These newer methods use large salt values which prevent attackers from efficiently mounting offline attacks against multiple user accounts simultaneously. The algorithms are also much slower to execute which drastically increases the time required to mount a successful offline attack.[11]

Solutions like a security token give a formal proof answer by constantly shifting password. Those solutions abruptly reduce the timeframe for brute forcing (attacker needs to break and use the password within a single shift) and they reduce the value of the stolen passwords because of its short time validity.