Lab: Password Cracker

Assigned:: Tuesday, Feb 20, 2018
Due:: Tuesday, Feb 27, 2018 by 10:30pm
Collaboration:: Work with your assigned partner for this lab. You can use online resources for your lab, provided they do not provide complete answers and you cite them in your code. If you do not know whether it is acceptable use a specific resource you can always ask me.

Overview

In this lab, you will implement a password cracker. A password cracker is a program that recovers users’ passwords from a database of hashed passwords. In operating systems and web applications it is standard practice to not store users’ passwords in the database of user accounts. Instead, the application typically runs each user’s password through a hash function and stores the result. A hash function must be consistent—every time I run my password through a given hash function I should get the same output—but these functions are meant to work only one way. I can easily convert passwords to hashed passwords, but it is very difficult to go from hashed passwords back to passwords. However, sometimes it is possible to recover passwords from hashes; a password cracker performs this feat by searching over the entire space of possible passwords, hashing each one and comparing it to the list of known password hashes. Any time there is a match we now know the users’ original password.

Searching over password hashes is quite difficult for secure hash functions, but luckily there are some insecure hash functions that are still widely used. One of these hash functions is MD5, which is no longer recommended for cryptographic uses. Your program will receive a list of usernames and MD5-hashed passwords. It should then search over all possible passwords, hashing each one, until it finds a match for each user’s password. The space of possible passwords is somewhat constrained (see the lab details below), which makes this search process feasible. Still, there are many candidate passwords to try. To complete the search in a reasonable amount of time, we will use POSIX threads to perform the search in parallel.

Start this lab as we usually do by forking the starter code on GitHub at https://github.com/csc213/password-cracker. Don’t forget to add your lab partner as a collaborator on the forked GitHub repository.

Groups

Group information is no longer available for this course.

Part A: Cracking a Single Password

The starter code includes an example call to the MD5 function, which is part of the OpenSSL library. Your task for this part of the lab is to write code that generates all possible passwords, computes their MD5 hash, then compares this hash to an MD5 value provided on the command line. While computing all possible passwords sounds like an especially difficult task, we can take advantage of two constraints of passwords on our system:

All passwords must be exactly six characters
Passwords may contain only lowercase letters

The basic process for cracking a password on this system is as follows:

Generate a candidate password
Compute the MD5 hash of this candidate password
Compare the hash of our candidate password to the provided hash function
If the two hashes are equal, output the candidate password and exit. Otherwise go to step 1.

I strongly recommend that you develop an encoding scheme to walk through all possible passwords in a systematic fashion. Choosing candidate passwords at random could work for short passwords, but six characters each selected from 26 possibilities means there are over 200 billion possible passwords to try; if you choose passwords at random you will almost certainly repeat the same candidate password, which cannot possibly help you find the real password.

Here is a sample interaction with a completed program for part A.

$ ./partA 76a2173be6393254e72ffa4d6df1030a
passwd
$ ./partA 49fdbf2e1eba6226c9e195bbbe598fc0
hiwrld

Once you have a working program that can crack a single hashed password, commit your changes and push them to GitHub before moving on to part B.

Part B: Cracking a List of Passwords

The starter code for part B includes an initialization routine to read in a database of usernames and passwords from a file. There is a file containing some hashed passwords in the starter code, but if you would like to create your own you should print the username, a space, and then the hashed password in hexadecimal. When you run the partB program, pass the name of the file that contains the password database as a command line argument.

In this part of the lab, you will take advantage of a convenient fact about password databases: when you are trying to reverse engineer passwords you often don’t care which one you find first. Instead of searching for a specific user’s password, just try candidate passwords and see if any user in the database has that password.

You should reuse most of the code from part A, along with additional code to check for matching hashes across the entire list of users. As soon as you discover a match, print out the username, a space, and then the cracked password.

Here is a sample interaction with the program for part B:

$ ./partB passwords.txt
jordan passwd
taylor secret

This is assuming passwords.txt contains the following values:

jordan 76a2173be6393254e72ffa4d6df1030a
taylor 5ebe2294ecd0e0f08eab7690d2a6ee69

Don’t forget to push your changes to GitHub before moving on to the next part of the lab.

Part C: Cracking Passwords in Parallel

Now that you have a working password cracker, it’s time to make it faster. To do this, you will use POSIX threads to check candidate passwords in parallel. This is an instance of a type of problem often referred to as embarassingly parallel. You can have each thread generate candidate passwords and check their hashes against the database of password hashes without any sort of coordination or dependence between tasks. The trick here is to divide up the space of candidate passwords over the available threads so they don’t duplicate any effort.

Extra Credit: Password Cracking Competition

I will evaluate all of your lab submissions on a database of 100 hashed passwords. The three fastest implementations will receive 10% extra credit on this lab. There are a few rules and implementation details that may help you with your implementation:

Your implementation may not use any pre-computed values. The one exception is if you choose to use an alternative MD5 implementation that has a table of a few dozen constants. Alternative MD5 implementations are unlikely to help anyway, since the OpenSSL MD5 implementation is very fast.
I will discard any modifications you make to common.mk and build with the command clang -g -O2 --std=c11. The default compilation flags omit the -O2 flag, which makes it a little easier to debug your code.
You may only use the POSIX threads (pthreads) API for parallel computation; other parallel libraries like MPI, OpenCL, Cilk, and CUDA are not allowed.
I will run the evaluation on one of our lab machines, which have four processors. Matching the number of threads to the number of processors is usually a good idea, although you may want to experiment with a range of parameters.
I will measure time from the start of the program to when it exits. The program must crack all 100 passwords before exiting, but you probably do not want to keep trying candidate passwords after cracking the 100th password.