In this lab, you will implement a password cracker. A password cracker is a program that recovers users’ passwords from a database of hashed passwords. In operating systems and web applications it is standard practice to not store users’ passwords in the database of user accounts. Instead, the application typically runs each user’s password through a hash function and stores the result. A hash function must be consistent—every time I run my password through a given hash function I should get the same output—but these functions are meant to work only one way. I can easily convert passwords to hashed passwords, but it is very difficult to go from hashed passwords back to passwords. However, sometimes it is possible to recover passwords from hashes; a password cracker performs this feat by searching over the entire space of possible passwords, hashing each one and comparing it to the list of known password hashes. Any time there is a match we now know the users’ original password.
Searching over password hashes is quite difficult for secure hash functions, but luckily there are some insecure hash functions that are still widely used. One of these hash functions is MD5, which is no longer recommended for cryptographic uses. Your program will receive a list of usernames and MD5-hashed passwords. It should then search over all possible passwords, hashing each one, until it finds a match for each user’s password. The space of possible passwords is somewhat constrained (see the lab details below), which makes this search process feasible. Still, there are many candidate passwords to try. To complete the search in a reasonable amount of time, we will use POSIX threads to perform the search in parallel.
Start this lab as we usually do by forking the starter code on GitHub at https://github.com/csc213/password-cracker. Don’t forget to add your lab partner as a collaborator on the forked GitHub repository.
Group information is no longer available for this course.
The starter code includes an example call to the MD5
function, which is part of the OpenSSL library.
Your task for this part of the lab is to write code that generates all possible passwords, computes their MD5 hash, then compares this hash to an MD5 value provided on the command line.
While computing all possible passwords sounds like an especially difficult task, we can take advantage of two constraints of passwords on our system:
The basic process for cracking a password on this system is as follows:
I strongly recommend that you develop an encoding scheme to walk through all possible passwords in a systematic fashion. Choosing candidate passwords at random could work for short passwords, but six characters each selected from 26 possibilities means there are over 200 billion possible passwords to try; if you choose passwords at random you will almost certainly repeat the same candidate password, which cannot possibly help you find the real password.
Here is a sample interaction with a completed program for part A.
$ ./partA 76a2173be6393254e72ffa4d6df1030a
passwd
$ ./partA 49fdbf2e1eba6226c9e195bbbe598fc0
hiwrld
Once you have a working program that can crack a single hashed password, commit your changes and push them to GitHub before moving on to part B.
The starter code for part B includes an initialization routine to read in a database of usernames and passwords from a file.
There is a file containing some hashed passwords in the starter code, but if you would like to create your own you should print the username, a space, and then the hashed password in hexadecimal.
When you run the partB
program, pass the name of the file that contains the password database as a command line argument.
In this part of the lab, you will take advantage of a convenient fact about password databases: when you are trying to reverse engineer passwords you often don’t care which one you find first. Instead of searching for a specific user’s password, just try candidate passwords and see if any user in the database has that password.
You should reuse most of the code from part A, along with additional code to check for matching hashes across the entire list of users. As soon as you discover a match, print out the username, a space, and then the cracked password.
Here is a sample interaction with the program for part B:
$ ./partB passwords.txt
jordan passwd
taylor secret
This is assuming passwords.txt contains the following values:
jordan 76a2173be6393254e72ffa4d6df1030a
taylor 5ebe2294ecd0e0f08eab7690d2a6ee69
Don’t forget to push your changes to GitHub before moving on to the next part of the lab.
Now that you have a working password cracker, it’s time to make it faster. To do this, you will use POSIX threads to check candidate passwords in parallel. This is an instance of a type of problem often referred to as embarassingly parallel. You can have each thread generate candidate passwords and check their hashes against the database of password hashes without any sort of coordination or dependence between tasks. The trick here is to divide up the space of candidate passwords over the available threads so they don’t duplicate any effort.
I will evaluate all of your lab submissions on a database of 100 hashed passwords. The three fastest implementations will receive 10% extra credit on this lab. There are a few rules and implementation details that may help you with your implementation:
clang -g -O2 --std=c11
. The default compilation flags omit the -O2
flag, which makes it a little easier to debug your code.