Lab: Password Cracker

Assigned

May 4, 2021

Due

May 11, 2021 by 11:59pm

Overview

In this lab, you will implement a password cracker. A password cracker is a program that recovers users’ passwords from a database of hashed passwords. In operating systems and web applications it is standard practice to not store users’ passwords in the database of user accounts. Instead, the application typically runs each user’s password through a hash function and stores the result. A hash function must be consistent—every time I run my password through a given hash function I should get the same output—but these functions are meant to work only one way. I can easily convert passwords to hashed passwords, but it is very difficult to go from hashed passwords back to passwords. However, sometimes it is possible to recover passwords from hashes; a password cracker performs this feat by searching over the entire space of possible passwords, hashing each one and comparing it to the list of known password hashes. Any time there is a match we now know the users’ original password.

Searching over password hashes is quite difficult for secure hash functions, but luckily there are some insecure hash functions that are still widely used. One of these hash functions is MD5, which is no longer recommended for cryptographic uses. Your program will receive a list of usernames and MD5-hashed passwords. It should then search over all possible passwords, hashing each one, until it finds a match for each user’s password. The space of possible passwords is somewhat constrained (see the lab details below), which makes this search process feasible. Still, there are many candidate passwords to try. To complete the search in a reasonable amount of time, we will use POSIX threads to perform the search in parallel.

Starter Code

To obtain a copy of the starter code, one member of your lab group should perform the following steps:

Open https://gpu.cs.grinnell.edu and log in with your Grinnell account. You will need to approve your login with Duo or enter in a one-time-password if you do not use Duo.
Start an XFCE session. If you see a blank page or some other error for the session, close the tab and then click on your session again to re-open it. This fixes most issues, but please ask for help if you run into persistent issues.
Open a terminal window in the XFCE Session.

Use the git command to check out a copy of the starter code for the lab:

$ git clone /home/curtsinger/csc213/labs/password-cracker ~/csc213/labs/

And now you can use the code command to open the starter code with Visual Studio Code.
```
$ code ~/csc213/labs/password-cracker
```
A Visual Studio Code window should appear with the password-cracker directory open in the file browser. You may see a welcome message, which you can close. You can also close any prompts to upgrade to a new version of VSCode.
Open a terminal inside of VSCode using the Terminal menu. By default, terminals appear on the bottom of the window. I find it more conveninent to move it to the right side; just right-click somewhere near the top of the panel that appears and choose Move Panel Right.
Now you can run make in the terminal to build the starter code, or just type ctrl+shift+b to run the default build task (which just runs make).

Now you can use the share button at the top bar of your remote session to share with your lab partner. If you have any trouble logging in, sharing your session, or with any other part of the startup process please ask for help.

Part A: Cracking a Single Password

The starter code includes a simple program that runs in two modes: one mode cracks a single hashed password, while the other reads a file containing a list of usernames and hashed passwords, and then cracks all of them. You should start by implementing the mode to crack a single password.

The first step in running the password cracker in this mode is to generate a hashed password. We can do this using some standard command line tools:

              $ echo -n "psswrd" | md5sum
63bddf0cbc21d36c8c19808e22784df2

Now that we have a hash of our bad six-character password, we can send it to the password cracker.

              $ ./password-cracker single 63bddf0cbc21d36c8c19808e22784df2
psswrd

That output is what you should see when you run the unmodified starter code, but it only works for the candidate password “psswrd”. You will need to complete the implementation to crack all potential passwords, but first we need to understand the rules for possible passwords.

Password Rules

To make our password cracking relatively efficient, we will constrain passwords slightly more than what you’d expect to find for a real password. Our password cracker will only work for passwords that adhere to the following rules:

Passwords are exactly six characters
All characters of a password will be lowercase alphabetic characters (a–z)

Cracking a Password

To actually crack a password, your program will need to perform the following steps:

Generate a candidate password
Compute the MD5 hash of this candidate password
Compare the hash of our candidate password to the provided hash value
If the two hashes are equal, output the candidate password and exit. Otherwise go to step 1.

The starter code includes an example call to the MD5 function that you will find helpful in your implementation.

I strongly recommend that you develop an encoding scheme to walk through all possible passwords in a systematic fashion. Choosing candidate passwords at random could work for short passwords, but six characters each selected from 26 possibilities means there are over 300 million possible passwords to try; if you choose passwords at random you will almost certainly repeat the same candidate password, which cannot possibly help you find the real password.

Part B: Cracking a List of Passwords

You will also need to complete the implementation of a list mode for the password cracker. Cracking passwords from a list is more efficient than cracking each password individually because you can hash a candidate password and compare the hash to every user’s hashed password.

The starter code include the logic to read in a list of users and hashed passwords. For each user and password, the starter code calls add_password to store the username and hashed password in a data structure. Once all passwords have been read, the starter code calls crack_password_list. You will need to design a password_set_t data structure to hold usernames and hashed passwords, implement init_password_set and add_password to interact with that data structure, and then implement crack_password_list, which should crack the passwords stored in a given list.

You may find it useful to reuse some elements of your code from part A, but you should not check each candidate password individually; that undoes the advantage of cracking passwords from a list. Make sure you do not break or modify the single password mode of your password cracker. Both modes must work in your final submission.

Here is an example run of the password cracker in list mode once you have completed your implementation:

              $ ./password-cracker list inputs/input1.txt
jordan passwd
taylor secret
Cracked 2 of 2 passwords.

            

The user/password lines could appear in any order, depending on how you walk through candidate passwords.

This is assuming passwords.txt contains the following values:

              jordan 76a2173be6393254e72ffa4d6df1030a
taylor 5ebe2294ecd0e0f08eab7690d2a6ee69

You should test your implementation on both input files in the inputs directory.

Part C: Cracking Passwords in Parallel

Now that your password cracker can run on a list of users and passwords, you will need to make it faster by dividing the work up across four threads. To do this, you will use POSIX threads to check candidate passwords in parallel. This is an instance of a type of problem often referred to as embarrassingly parallel. You can have each thread generate candidate passwords and check their hashes against the database of password hashes without any sort of coordination or dependence between tasks. The trick here is to divide up the space of candidate passwords over the available threads so they don’t duplicate any effort.

You will need to decide how to divide the work and pass that work on to threads. If your threads make any changes to the data structure as you crack passwords (e.g. by removing an entry once it has been cracked) you will need to include appropriate synchronization to prevent race conditions.

As with part B, your implementation may produce output in a different order, and that order may now change between runs; that is expected. Just make sure your implementation cracks every password and prints the cracked password with the correct username.

You do not need to preserve your part B implementation; just submit the version of your code with a working parallel password cracker.

Extra Credit: Password Cracking Competition

When you submit your password cracker to Gradescope it will run against a database of 1000 random passwords. These passwords will adhere to the password rules listed above, but will change on each run. The two fastest implementations will receive a 10% extra credit bonus on this lab. You will be able to see where your implementation ranks once you submit it using Gradescope’s leaderboard page.

There are a few rules and implementation details that may be helpful:

Your implementation may not use any pre-computed values. The one exception is if you choose to use an alternative MD5 implementation that has a table of a few dozen constants. Alternative MD5 implementations are unlikely to help anyway, since the OpenSSL MD5 implementation is very fast.
You can only change password-cracker.c; the evaluation will use the default Makefile and compilation options.
You may only use the POSIX threads (pthreads) API for parallel computation; other parallel libraries like MPI, OpenCL, Cilk, and CUDA are not allowed.
The evaluation will run on Gradescope’s virtual machine, which has four processors. Your implementation must use exactly four threads to perform the password cracking (plus one main thread).
I will measure time from the start of the program to when it exits. The program must crack all the passwords before exiting, but you probably do not want to keep trying candidate passwords after cracking the last password.