Last Updated: 2016-04-29 Fri 10:54

CS 499 Homework 4: Shared Memory Password Cracker

CHANGELOG:

Fri Apr 29 10:46:07 EDT 2016
For those seeking the Java extra credit, a file java.zip at the top of the spec now links a couple support classes to handle the encryption. The central support element is the MD5Crypt.cryptNoSalt(String pass) function which will encrypt a plaintext password to a format similar to the C encryption functions. Unfortunately this algorithm is slightly different so the encrypted digests produced are not identical. The java.zip file contains encrypted password files according to this algorithm which should be used for cracking in the java setting. The support class EncryptAll.java was used to produce this and provides a demonstration of the cryptNoSalt() function.

For either extra credit portion, include a description of your design and timing numbers to comparing its performance to the other versions.

Table of Contents

1 Overview

The HW will explore parallelizing a simple password cracking tool for a shared-memory architecture. A serial program is provided and the primary task is to derive a parallel version of it using OpenMP and PThreads so that it can exploit multiple processors on a shared memory architecture.

The HW distribution contains a number of C source and data files for the project. Here is a brief synopsis of the contents.

File State Description
passcrack.c Provided main() method to crack passwords
crack_funcs.c Provided Work horse functions for cracking, see try_crack() function
dict.c Provided Utilities for reading files of words/passwords
dict_test.c Provided Demonstrate reading files of words
encrypt_all.c Provided Utility to encrypt all words in a file
md5crypt_r.c Provided Utility function for encrypting a string
md5_demo.c Provided Demonstration of how md5crytr works
time-crack.sh Provided Script to time and test the serial/parallel versions
Makefile Edit Makefile to build programs; edit to enable parallel programs to be built
parallel_funcs.c Edit Utility functions for parallel programs
omp_passcrack.c Edit main() to perform parallel password cracking with OpenMP
pthread_passcrack.c Edit main() to perform parallel password cracking with PThreads
dict-files/* Data Directory containing dictionaries of words
english-10.txt Data 10 English words
english-100.txt Data 100 random English words
english-1K.txt Data 1000 popular English words
     
pass-files/* Data Directory of plaintext and encrypted passwords
     
SMALL TESTS    
passwords-5-2-from-10.txt Data 5 plaintext passwords of 2 words drawn from english-10.txt=
encrypted-5-2-from-10.txt Data Encrypted version of above file
     
passwords-5-1-from-1K.txt Data 5 plaintext passwords of 1 word drawn from english-1K.txt=
encrypted-5-1-from-1K.txt Data Encrypted version of above file
     
MEDIUM TESTS    
passwords-5-2-from-100.txt Data 5 plaintext passwords of 2 words drawn from english-100.txt=
encrypted-5-2-from-100.txt Data Encrypted version of above file

1.1 passcrack Program

The program passcrack accepts a password with encrypted passwords in it along with 1 or more dictionary files. The program proceeds to determine what plaintext string would produce the encrypted passwords in the file given drawing words form the dictionaries given. The following session shows some single plain text words as passwords and their encrypted versions. passcrack is then invoked with to use the dictionary of words english-1K.txt to break the passwords.

# plain passwords and encrypted version
> cat pass-files/passwords-5-1-from-1K.txt 
hello
world
zoo
saxophone
usually

> cat pass-files/encrypted-5-1-from-1K.txt 
$1$$/PWPe740uvaQxKzRH.Zxj1
$1$$i4oRQQGhVoiIqPCYC1mwb.
$1$$s99bhXTZygqY3YOYkFk5H/
$1$$/oyQtgb1/x4d5cyHoNKKm0
$1$$s/xx.xxmYyTqTpakj2bH//

# usage information for passcrack
> passcrack
usage: passcrack <encrypted_file> <dict1> [dict2] ...
  <encrypted_file> : encrypted password file, one per line
  <dict1>          : dictionary to try for passwords for word 1
  [dict2]          : additional dictionary to try for word 2
                   : further dictionaries may be specified for more words

# search for passwords in the dictionary of 1000 words
> passcrack pass-files/encrypted-5-1-from-1K.txt dict-files/english-1K.txt 
found 5 passwords to crack
dict 0: 1000 words
  0: SUCCES: $1$$/PWPe740uvaQxKzRH.Zxj1 <-- hello
  1: SUCCES: $1$$i4oRQQGhVoiIqPCYC1mwb. <-- world
  2: SUCCES: $1$$s99bhXTZygqY3YOYkFk5H/ <-- zoo
  3: SUCCES: $1$$/oyQtgb1/x4d5cyHoNKKm0 <-- saxophone
  4: SUCCES: $1$$s/xx.xxmYyTqTpakj2bH// <-- usually
5 / 5 passwords cracked

Multi-word passwords are addressed by passing multiple dictionaries into passcrack. The pattern of dictionaries dictates how many words will be combined and where they will be drawn from to generate trials. In the example below, a small dictionary of 10 words is tried with only single words, then with all pairs of words.

# plain passwords and encrypted version
> cat pass-files/passwords-5-2-from-10.txt
oneone
onetwo
fourtwo
tenten
nineseven
> cat pass-files/encrypted-5-2-from-10.txt
$1$$anQyl9ToYUP8DALuPL5dt1
$1$$HZ43DKz1thMVVU3EXVDM21
$1$$MW4kdPfNxDMND0tt4YAw10
$1$$WTzKuZkjxQxftQKHpZqzg1
$1$$pCoMF5tvMloTP.e2qaxUB1

# dictionary of 10 words
> cat dict-files/english-10.txt 
one
two
three
four
five
six
seven
eight
nine
ten

# passwords are not single words
> passcrack pass-files/encrypted-5-2-from-10.txt dict-files/english-10.txt
found 5 passwords to crack
dict 0: 10 words
  0: FAILED: $1$$anQyl9ToYUP8DALuPL5dt1 <-- ???
  1: FAILED: $1$$HZ43DKz1thMVVU3EXVDM21 <-- ???
  2: FAILED: $1$$MW4kdPfNxDMND0tt4YAw10 <-- ???
  3: FAILED: $1$$WTzKuZkjxQxftQKHpZqzg1 <-- ???
  4: FAILED: $1$$pCoMF5tvMloTP.e2qaxUB1 <-- ???
0 / 5 passwords cracked

# use a second dictionary to try all pairs of words
> passcrack pass-files/encrypted-5-2-from-10.txt dict-files/english-10.txt dict-files/english-10.txt
found 5 passwords to crack
dict 0: 10 words
dict 1: 10 words
  0: SUCCES: $1$$anQyl9ToYUP8DALuPL5dt1 <-- oneone
  1: SUCCES: $1$$HZ43DKz1thMVVU3EXVDM21 <-- onetwo
  2: SUCCES: $1$$MW4kdPfNxDMND0tt4YAw10 <-- fourtwo
  3: SUCCES: $1$$WTzKuZkjxQxftQKHpZqzg1 <-- tenten
  4: SUCCES: $1$$pCoMF5tvMloTP.e2qaxUB1 <-- nineseven
5 / 5 passwords cracked

1.2 Other Utilities

Several other programs are available in the distribution mainly to demonstrate functionality of some parts. These are useful to build your understanding but should require no modification. Briefly they are

  • dict_demo : Illustrates a few functions associated with dictionaries of words in the dict_t struct. It reads a dictionary file and retrieves a word from it.
  • md5_demo : Demonstrates how the low-level encryption function in md5crypt_r.c is used to encrypt passwords. The program encrypts a word passed on the command line.
  • encrypt_all : Loads all lines in a file from the command line and prints their encrypted version to standard output.

2 Architecture of the Cracker

2.1 Enumerating Possible Passwords

The main() function in passcrack.c is primarily responsible for loading the encrypted password file and dictionaries. It proceeds to iterate through each password to crack it. This is the primary responsibility of the try_crack() function located in crack_funcs.c. It's prototype and documentation are below.

// Work horse function to try all dictionary possibilities against a
// given encrypted password.  The array of dicts are used in turn to
// generate all combinations of words from each dictionary.  The
// method recursively descends increasing the dictionary position at
// each step down. Each level of recursion runs through its own loop
// to check all possibilities in a single dictionary.  Each layer
// appends a word onto the provided buf[] array before descending
// another layer.  When the bottom layer of recursion is reached,
// buf[] contains a complete plaintext password which is checked
// agains the target encrypted password.
//
// Returns 1 on locating a plaintext password that matches which is
// left in the buf parameter (null terminated).
//
// Returns 0 if unable to locate a matching plaintext password.
int try_crack(char *target,
              dict_t **dicts, int dicts_len, int dict_pos,
              char *buf, int buflen, int bufpos)

try_crack() contains a loop and a recursive call. This is a typical approach for generating all possible permutations of some collection of objects, in this case words from dictionaries.

  • The recursion goes as deep as the number of dictionaries used / words presumed in the password. For example, with 2 dictionaries / 2 word passwords, the recursion will go 3 layers deep
    • First word added from the first dictionary in the first layer of recursion
    • Second word added from the second dictionary in the second layer of recursion
    • Third layer of recursion reaches the base case (no more dictionaries) and checks the candidate password by encrypting it and comparing it to the encrypted target password.
  • At each layer of recursion, a dictionary in the array of dictionaries is scanned through in a loop. A word is drawn from the dictionary and appended to the growing password. The recursive call is then made. If the recursion returns failure, the next word in the dictionary is written over the previous selection before recursing once again.

Parallelizing the generation of candidate passwords is the primary purpose of this project.

2.2 Encryption Scheme

The MD5 Cryptographic Hash is frequently employed to generate a fixed-size hash or digest of a message. It is also often used to store hashes of passwords on unix systems. When a user logs in, their typed plaintext password is run through the hash algorithm to produce a hash which is compared against a stored hash in a password file on the system. This avoids the need to store the plaintext passwords on the system so their of the password file does not spell immediate disaster.

The check_password() function in crack_funcs.c performs a check to determine a plaintext candidate password matches an encrypted target.

// Check whether a plaintext password hashes to the target encrypted
// password by encrypting it and comparing strings. Uses the
// md5crypt_r function. Returns 1 if the passwords match and 0
// otherwise.
int check_password(char *target, char *plain)

It utilizes the md5crypt_r() function to perform the encryption operation. This low-level function is ported from the OpenSSL project and adapted to make it easier to use in parallel programs. The code is quite technical and takes some time to execute but it should not require modification on your part.

int md5crypt_r(const char *passwd, const char *magic, const char *salt, char *out_buf)

Key features of this function are

  • passwd is the string to encrypt
  • magic is always the string 1 and salt is always the empty string. This is specific to our application and not generally true
  • out_buf is a character buffer that which has at least MD5CRYPT_SIZE characters in it and will be filled with the encrypted password.

3 Problem 1: omp_passcrack: OpenMP Parallelization

3.1 Overview

Use the directives available in OpenMP to create a parallel version of passcrack called omp_passcrack. There are several places where parallelism could be exploited.

Focus your efforts on parallelizing single password cracking rather than using different threads to crack different passwords.

To that end, you will want to spend time understanding the try_crack() function. However do not modify try_crack(). Instead, modify the following:

  • parallel_funcs.c : Add new functions you need for your OpenMP (and PThreads) version of the programs in this file. It is provided but is empty. This will allow you to make calls from your parallel functions to the original functions like try_crack() as they will be unmodified.
  • omp_passcrack.c : Adjust the main() method here to initiate parallel execution. If new This allows the serial passcrack.c to remain intact to compare its performance.
  • Makefile : Uncomment lines in here to compile the parallel_funcs.c and omp_passcrack.c files.

3.2 Number of Threads to Use

Allow the use of an environment variable to specify the number of threads to use for the parallel version. Setting this environment variable should alter the number of threads used when the program starts. The following environment variable and syntax to set it should be used for standard shells.

> export PASSCRACK_NUMTHREADS=4
> echo $PASSCRACK_NUMTHREADS
4
> export PASSCRACK_NUMTHREADS=2
> echo $PASSCRACK_NUMTHREADS
2

To retrieve and environment variable in a C program, use code such as the following standard C code.

#include <stdlib.h>

int main(...){
  // Check environment variable for number of threads
  int nthreads = 4;
  char *nthreads_str = getenv("PASSCRACK_NUMTHREADS");
  if(nthreads_str != NULL){
    nthreads = atoi(nthreads_str);
  }
  ...;

Make an appropriate OpenMP call to set the number of threads that will be used for the remainder of the program.

3.3 General Parallelization Strategy

The easiest way to parallelize using OpenMP is to establish a parallel region and use a directive to run a loop in parallel. Such a loop exists in try_crack() but you should not modify it directly as it will affect the serial version. Further, try_crack() is recursive which might cause problems if repeated layers of recursion attempted to spin up multiple loops.

Instead, write a wrapper function in parallel_funcs.c which takes care of the first "layer" of recursion that would be handled by try_crack(). Most of the code in this function is similar to try_crack() but has a parallel loop which causes threads to work on different iterations of the loop.

3.4 Obstacles with Parallelism

Keep in mind that the serial version of try_crack() uses a single character array to repeatedly write candidate passwords to check them. If multiple threads attempt to write to this area, errors are likely to result. Instead, find a way to either control this area of memory or give each thread its own array in which to write. Also keep in mind that if a thread finds the correct password, it should copy that password to a memory location that the main thread can eventually print it.

Password cracking is a search problem which means once a thread finds an answer (correct password), there is no need for more work and other threads can close down. In a typical C application would break from a search loop on finding an answer.

Unfortunately, OpenMP parallel loops have may not contain break statements: it is a restriction that comes directly from the standard. Instead, threads should check at the top of each loop iteration whether an answer has been found by inspecting some shared variable. If an answer is found, they can essentially skip through the iteration doing no work. This is possible with a continue or by enclosing the loop iteration in a conditional. Without such measures, threads will do more work than is necessary resulting it a delay before work on the next password to crack is started.

Further complicating this, recursive calls to try_crack() or its OMP adaptation should check whether a solution has been found and terminate on seeing this. This will likely involve duplicating some code from the try_crack() method to avoid changing the original.

3.5 Grading Criteria for OpenMP Code

Points Criteria
10 Code compiles using the provided Makefile and runs for sample inputs
5 NO CHANGES are made to the existing codebase such as crack_funcs.c and passcrack.c
5 Parallel funtions are in the parallel_funcs.c and omp_passcrack.c file to implement the OpenMP program
5 The number of threads used can be controlled using environment variable PASSCRACK_NUMTHREADS
5 The parallel code is reasonably well documented and cleanly written
5 It is clear that a strategy was employed for threads to exit quickly once a solution is found
5 Some speedup is achieved over the serial version of the code as indicated by timing tests on zeus

4 Problem 2: pthread_passcrack: PThread Parallelization

4.1 Overview

Similar to the first problem, implement a parallel version of passcrack using Posix Threads called pthread_passcrack. As before, focus your efforts on parallelizing single password cracking rather than using different threads to crack different passwords.

As with the OpenMP version, use a number of threads that is obtained from the environment variable PASSCRACK_NUMTHREADS.

Do not alter any existing code. Instead put new functions you require for your PThreads version in parallel_funcs.c and use pthread_passcrac.c to implement a main function which will initiate parallel execution.

As with OpenMP, the general strategy is to use the first layer of try_crack() to start multiple threads running on different loop iterations. Other than that, there are some unique difficulties associated with PThreads that you will need to surmount.

4.2 Obstacles with Parallelism

Similar obstacles to parallelism occur with the PThreads version as did for the OpenMP version.

  1. Threads will need to coordinate on the character buffer that is used to build up candidate passwords. The ideal solution is for each thread to use its own character buffer and write a found password to a globally accessible area.
  2. Threads should quit if another thread finds a solution to avoid doing more work than is necessary. This will involve checking a shared memory area for a sign that a solution has been found and terminating if it occurs.

Some additional issues with PThreads are also present.

  • It is likely that worker threads will need to receive quite a few parameters (array of dictionaries, target password, thread number, etc.). For this it is suggested that you employ a parameter structure of some kind whose fields are filled with data for each worker thread. Keep in mind that pthread_create() allows for a single pointer argument so you will likely need an array of such parameter structs with data specialized to each thread.
  • Splitting loop iterations between threads will take more manual effort than OpenMP. Draw from your experience in class to use iteration variable tricks involving the thread ids and number of threads to make this as simple as possible.

4.3 Grading Criteria for PThreads Code

Points Criteria
15 Code compiles using the provided Makefile and runs for sample inputs
5 Parallel funtions are in the parallel_funcs.c and pthreads_passcrack.c file to implement the PThreads program
5 The number of threads used can be controlled using environment variable PASSCRACK_NUMTHREADS
5 The parallel code is reasonably well documented and cleanly written
5 It is clear that a strategy was employed for threads to exit quickly once a solution is found
5 Some speedup is achieved over the serial version of the code as indicated by timing tests on zeus

5 Problem 3: Writeup and Submission

5.1 Writeup

As usual, include a document describing your program efforts. This should contain sections describing how you went about parallelizing passcrack for both parallel platforms.

You should also include an analysis of timings for the problem. Use the provided time-crack.sh script for this which generates timing output. This script also checks that the results of the parallel omp_passcrack and pthread_passcrack are identical to the serial passcrack executable. The script produces output like the following when the parallel programs are complete.

zeus-0> time-crack.sh
make: Nothing to be done for `programs'.
Running with data files pass-files/encrypted-5-2-from-100.txt dict-files/english-100.txt dict-files/english-100.txt
Serial code
real 10.97
user 10.71
sys 0.01

OMP nthreads 2
real 5.70
user 10.77
sys 0.06

OMP nthreads 4
real 5.37
user 17.55
sys 0.14

OMP nthreads 6
real 6.11
user 22.53
sys 0.05

OMP nthreads 8
real 5.38
user 19.18
sys 0.03

PTHREADS nthreads 2
real 5.50
user 10.35
sys 0.11

PTHREADS nthreads 4
real 8.32
user 28.39
sys 0.06

PTHREADS nthreads 6
real 7.80
user 28.81
sys 0.07

PTHREADS nthreads 8
real 7.91
user 28.76
sys 0.15

Several data files are available. Make sure to run at least what is shown above (the default).

5.2 (20 /20) Grading Criteria for Writeup

Points Criteria
5 Description of design of the OpenMP parallelization
5 Description of design of the PThreads parallelization
10 Timing results and discussion of how effective the parallelization was

6 Extra Credit

6.1 Java Threads (20%)

We have discussed Java Threads to enable parallelism. For a 20% bonus on this HW score, you may attempt to port the password cracker to Java and use Java Threads to produce parallelism. For this, you will need to locate an implementation of MD5 hashing which is compatible with our project as porting it directly will be terrible.

Include a description of your design of the Java Threaded implementation and timing numbers to comparing its performance to the other versions.

6.2 Cilk Implementation (10%)

For a 10% bonus on this HW, you can attempt to implement a Cilk version called cilk_passcrack. Do so by writing additional function in the parallel_funcs.c, creating another main routine, and adjust the Makefile to compile these with correct options.

Include a description of your design of the Cilk implementation and timing numbers to comparing its performance to the other versions.


Author: Chris Kauffman (kauffman@cs.gmu.edu)
Date: 2016-04-29 Fri 10:54