CS 499 Homework 4: Shared Memory Password Cracker
- Due: Tuesday 5/3/2016 by 11:59 pm
- Approximately 10% of total grade
- Submit to Blackboard zip of code with Word/PDF report
- You may work in groups of 2 and submit one assignment per group.
- Code Distribution: distrib-hw4.zip
- Java Support Functions: java.zip
CHANGELOG:
- Fri Apr 29 10:46:07 EDT 2016
- For those seeking the Java extra credit, a file
java.zip
at the top of the spec now links a couple support classes to handle the encryption. The central support element is theMD5Crypt.cryptNoSalt(String pass)
function which will encrypt a plaintext password to a format similar to the C encryption functions. Unfortunately this algorithm is slightly different so the encrypted digests produced are not identical. Thejava.zip
file contains encrypted password files according to this algorithm which should be used for cracking in the java setting. The support classEncryptAll.java
was used to produce this and provides a demonstration of thecryptNoSalt()
function.For either extra credit portion, include a description of your design and timing numbers to comparing its performance to the other versions.
Table of Contents
1 Overview
The HW will explore parallelizing a simple password cracking tool for a shared-memory architecture. A serial program is provided and the primary task is to derive a parallel version of it using OpenMP and PThreads so that it can exploit multiple processors on a shared memory architecture.
The HW distribution contains a number of C source and data files for the project. Here is a brief synopsis of the contents.
File | State | Description |
---|---|---|
passcrack.c |
Provided | main() method to crack passwords |
crack_funcs.c |
Provided | Work horse functions for cracking, see try_crack() function |
dict.c |
Provided | Utilities for reading files of words/passwords |
dict_test.c |
Provided | Demonstrate reading files of words |
encrypt_all.c |
Provided | Utility to encrypt all words in a file |
md5crypt_r.c |
Provided | Utility function for encrypting a string |
md5_demo.c |
Provided | Demonstration of how md5crytr works |
time-crack.sh |
Provided | Script to time and test the serial/parallel versions |
Makefile |
Edit | Makefile to build programs; edit to enable parallel programs to be built |
parallel_funcs.c |
Edit | Utility functions for parallel programs |
omp_passcrack.c |
Edit | main() to perform parallel password cracking with OpenMP |
pthread_passcrack.c |
Edit | main() to perform parallel password cracking with PThreads |
dict-files/* |
Data | Directory containing dictionaries of words |
english-10.txt |
Data | 10 English words |
english-100.txt |
Data | 100 random English words |
english-1K.txt |
Data | 1000 popular English words |
pass-files/* |
Data | Directory of plaintext and encrypted passwords |
SMALL TESTS | ||
passwords-5-2-from-10.txt |
Data | 5 plaintext passwords of 2 words drawn from english-10.txt= |
encrypted-5-2-from-10.txt |
Data | Encrypted version of above file |
passwords-5-1-from-1K.txt |
Data | 5 plaintext passwords of 1 word drawn from english-1K.txt= |
encrypted-5-1-from-1K.txt |
Data | Encrypted version of above file |
MEDIUM TESTS | ||
passwords-5-2-from-100.txt |
Data | 5 plaintext passwords of 2 words drawn from english-100.txt= |
encrypted-5-2-from-100.txt |
Data | Encrypted version of above file |
1.1 passcrack
Program
The program passcrack
accepts a password with encrypted passwords in
it along with 1 or more dictionary files. The program proceeds to
determine what plaintext string would produce the encrypted passwords
in the file given drawing words form the dictionaries given. The
following session shows some single plain text words as passwords and
their encrypted versions. passcrack
is then invoked with to use the
dictionary of words english-1K.txt
to break the passwords.
# plain passwords and encrypted version > cat pass-files/passwords-5-1-from-1K.txt hello world zoo saxophone usually > cat pass-files/encrypted-5-1-from-1K.txt $1$$/PWPe740uvaQxKzRH.Zxj1 $1$$i4oRQQGhVoiIqPCYC1mwb. $1$$s99bhXTZygqY3YOYkFk5H/ $1$$/oyQtgb1/x4d5cyHoNKKm0 $1$$s/xx.xxmYyTqTpakj2bH// # usage information for passcrack > passcrack usage: passcrack <encrypted_file> <dict1> [dict2] ... <encrypted_file> : encrypted password file, one per line <dict1> : dictionary to try for passwords for word 1 [dict2] : additional dictionary to try for word 2 : further dictionaries may be specified for more words # search for passwords in the dictionary of 1000 words > passcrack pass-files/encrypted-5-1-from-1K.txt dict-files/english-1K.txt found 5 passwords to crack dict 0: 1000 words 0: SUCCES: $1$$/PWPe740uvaQxKzRH.Zxj1 <-- hello 1: SUCCES: $1$$i4oRQQGhVoiIqPCYC1mwb. <-- world 2: SUCCES: $1$$s99bhXTZygqY3YOYkFk5H/ <-- zoo 3: SUCCES: $1$$/oyQtgb1/x4d5cyHoNKKm0 <-- saxophone 4: SUCCES: $1$$s/xx.xxmYyTqTpakj2bH// <-- usually 5 / 5 passwords cracked
Multi-word passwords are addressed by passing multiple dictionaries
into passcrack
. The pattern of dictionaries dictates how many words
will be combined and where they will be drawn from to generate
trials. In the example below, a small dictionary of 10 words is tried
with only single words, then with all pairs of words.
# plain passwords and encrypted version > cat pass-files/passwords-5-2-from-10.txt oneone onetwo fourtwo tenten nineseven > cat pass-files/encrypted-5-2-from-10.txt $1$$anQyl9ToYUP8DALuPL5dt1 $1$$HZ43DKz1thMVVU3EXVDM21 $1$$MW4kdPfNxDMND0tt4YAw10 $1$$WTzKuZkjxQxftQKHpZqzg1 $1$$pCoMF5tvMloTP.e2qaxUB1 # dictionary of 10 words > cat dict-files/english-10.txt one two three four five six seven eight nine ten # passwords are not single words > passcrack pass-files/encrypted-5-2-from-10.txt dict-files/english-10.txt found 5 passwords to crack dict 0: 10 words 0: FAILED: $1$$anQyl9ToYUP8DALuPL5dt1 <-- ??? 1: FAILED: $1$$HZ43DKz1thMVVU3EXVDM21 <-- ??? 2: FAILED: $1$$MW4kdPfNxDMND0tt4YAw10 <-- ??? 3: FAILED: $1$$WTzKuZkjxQxftQKHpZqzg1 <-- ??? 4: FAILED: $1$$pCoMF5tvMloTP.e2qaxUB1 <-- ??? 0 / 5 passwords cracked # use a second dictionary to try all pairs of words > passcrack pass-files/encrypted-5-2-from-10.txt dict-files/english-10.txt dict-files/english-10.txt found 5 passwords to crack dict 0: 10 words dict 1: 10 words 0: SUCCES: $1$$anQyl9ToYUP8DALuPL5dt1 <-- oneone 1: SUCCES: $1$$HZ43DKz1thMVVU3EXVDM21 <-- onetwo 2: SUCCES: $1$$MW4kdPfNxDMND0tt4YAw10 <-- fourtwo 3: SUCCES: $1$$WTzKuZkjxQxftQKHpZqzg1 <-- tenten 4: SUCCES: $1$$pCoMF5tvMloTP.e2qaxUB1 <-- nineseven 5 / 5 passwords cracked
1.2 Other Utilities
Several other programs are available in the distribution mainly to demonstrate functionality of some parts. These are useful to build your understanding but should require no modification. Briefly they are
dict_demo
: Illustrates a few functions associated with dictionaries of words in thedict_t
struct. It reads a dictionary file and retrieves a word from it.md5_demo
: Demonstrates how the low-level encryption function inmd5crypt_r.c
is used to encrypt passwords. The program encrypts a word passed on the command line.encrypt_all
: Loads all lines in a file from the command line and prints their encrypted version to standard output.
2 Architecture of the Cracker
2.1 Enumerating Possible Passwords
The main()
function in passcrack.c
is primarily responsible for
loading the encrypted password file and dictionaries. It proceeds to
iterate through each password to crack it. This is the primary
responsibility of the try_crack()
function located in
crack_funcs.c
. It's prototype and documentation are below.
// Work horse function to try all dictionary possibilities against a // given encrypted password. The array of dicts are used in turn to // generate all combinations of words from each dictionary. The // method recursively descends increasing the dictionary position at // each step down. Each level of recursion runs through its own loop // to check all possibilities in a single dictionary. Each layer // appends a word onto the provided buf[] array before descending // another layer. When the bottom layer of recursion is reached, // buf[] contains a complete plaintext password which is checked // agains the target encrypted password. // // Returns 1 on locating a plaintext password that matches which is // left in the buf parameter (null terminated). // // Returns 0 if unable to locate a matching plaintext password. int try_crack(char *target, dict_t **dicts, int dicts_len, int dict_pos, char *buf, int buflen, int bufpos)
try_crack()
contains a loop and a recursive call. This is a typical
approach for generating all possible permutations of some collection
of objects, in this case words from dictionaries.
- The recursion goes as deep as the number of dictionaries used /
words presumed in the password. For example, with 2 dictionaries / 2
word passwords, the recursion will go 3 layers deep
- First word added from the first dictionary in the first layer of recursion
- Second word added from the second dictionary in the second layer of recursion
- Third layer of recursion reaches the base case (no more dictionaries) and checks the candidate password by encrypting it and comparing it to the encrypted target password.
- At each layer of recursion, a dictionary in the array of dictionaries is scanned through in a loop. A word is drawn from the dictionary and appended to the growing password. The recursive call is then made. If the recursion returns failure, the next word in the dictionary is written over the previous selection before recursing once again.
Parallelizing the generation of candidate passwords is the primary purpose of this project.
2.2 Encryption Scheme
The MD5 Cryptographic Hash is frequently employed to generate a fixed-size hash or digest of a message. It is also often used to store hashes of passwords on unix systems. When a user logs in, their typed plaintext password is run through the hash algorithm to produce a hash which is compared against a stored hash in a password file on the system. This avoids the need to store the plaintext passwords on the system so their of the password file does not spell immediate disaster.
The check_password()
function in crack_funcs.c
performs a check to
determine a plaintext candidate password matches an encrypted target.
// Check whether a plaintext password hashes to the target encrypted // password by encrypting it and comparing strings. Uses the // md5crypt_r function. Returns 1 if the passwords match and 0 // otherwise. int check_password(char *target, char *plain)
It utilizes the md5crypt_r()
function to perform the encryption
operation. This low-level function is ported from the OpenSSL project
and adapted to make it easier to use in parallel programs. The code
is quite technical and takes some time to execute but it should not
require modification on your part.
int md5crypt_r(const char *passwd, const char *magic, const char *salt, char *out_buf)
Key features of this function are
passwd
is the string to encryptmagic
is always the string1
andsalt
is always the empty string. This is specific to our application and not generally trueout_buf
is a character buffer that which has at leastMD5CRYPT_SIZE
characters in it and will be filled with the encrypted password.
3 Problem 1: omp_passcrack
: OpenMP Parallelization
3.1 Overview
Use the directives available in OpenMP to create a parallel version of
passcrack
called omp_passcrack
. There are several places where
parallelism could be exploited.
Focus your efforts on parallelizing single password cracking rather than using different threads to crack different passwords.
To that end, you will want to spend time understanding the
try_crack()
function. However do not modify
try_crack()
. Instead, modify the following:
parallel_funcs.c
: Add new functions you need for your OpenMP (and PThreads) version of the programs in this file. It is provided but is empty. This will allow you to make calls from your parallel functions to the original functions liketry_crack()
as they will be unmodified.omp_passcrack.c
: Adjust themain()
method here to initiate parallel execution. If new This allows the serialpasscrack.c
to remain intact to compare its performance.Makefile
: Uncomment lines in here to compile theparallel_funcs.c
andomp_passcrack.c
files.
3.2 Number of Threads to Use
Allow the use of an environment variable to specify the number of threads to use for the parallel version. Setting this environment variable should alter the number of threads used when the program starts. The following environment variable and syntax to set it should be used for standard shells.
> export PASSCRACK_NUMTHREADS=4 > echo $PASSCRACK_NUMTHREADS 4 > export PASSCRACK_NUMTHREADS=2 > echo $PASSCRACK_NUMTHREADS 2
To retrieve and environment variable in a C program, use code such as the following standard C code.
#include <stdlib.h> int main(...){ // Check environment variable for number of threads int nthreads = 4; char *nthreads_str = getenv("PASSCRACK_NUMTHREADS"); if(nthreads_str != NULL){ nthreads = atoi(nthreads_str); } ...;
Make an appropriate OpenMP call to set the number of threads that will be used for the remainder of the program.
3.3 General Parallelization Strategy
The easiest way to parallelize using OpenMP is to establish a parallel
region and use a directive to run a loop in parallel. Such a loop
exists in try_crack()
but you should not modify it directly as it
will affect the serial version. Further, try_crack()
is recursive
which might cause problems if repeated layers of recursion attempted
to spin up multiple loops.
Instead, write a wrapper function in parallel_funcs.c
which takes
care of the first "layer" of recursion that would be handled by
try_crack()
. Most of the code in this function is similar to
try_crack()
but has a parallel loop which causes threads to work on
different iterations of the loop.
3.4 Obstacles with Parallelism
Keep in mind that the serial version of try_crack()
uses a single
character array to repeatedly write candidate passwords to check
them. If multiple threads attempt to write to this area, errors are
likely to result. Instead, find a way to either control this area of
memory or give each thread its own array in which to write. Also keep
in mind that if a thread finds the correct password, it should copy
that password to a memory location that the main thread can eventually
print it.
Password cracking is a search problem which means once a thread finds
an answer (correct password), there is no need for more work and other
threads can close down. In a typical C application would break
from
a search loop on finding an answer.
Unfortunately, OpenMP parallel loops have may not contain break
statements: it is a restriction that comes directly from the standard.
Instead, threads should check at the top of each loop iteration
whether an answer has been found by inspecting some shared
variable. If an answer is found, they can essentially skip through the
iteration doing no work. This is possible with a continue
or by
enclosing the loop iteration in a conditional. Without such measures,
threads will do more work than is necessary resulting it a delay
before work on the next password to crack is started.
Further complicating this, recursive calls to try_crack()
or its OMP
adaptation should check whether a solution has been found and
terminate on seeing this. This will likely involve duplicating some
code from the try_crack()
method to avoid changing the original.
3.5 Grading Criteria for OpenMP Code
Points | Criteria |
---|---|
10 | Code compiles using the provided Makefile and runs for sample inputs |
5 | NO CHANGES are made to the existing codebase such as crack_funcs.c and passcrack.c |
5 | Parallel funtions are in the parallel_funcs.c and omp_passcrack.c file to implement the OpenMP program |
5 | The number of threads used can be controlled using environment variable PASSCRACK_NUMTHREADS |
5 | The parallel code is reasonably well documented and cleanly written |
5 | It is clear that a strategy was employed for threads to exit quickly once a solution is found |
5 | Some speedup is achieved over the serial version of the code as indicated by timing tests on zeus |
4 Problem 2: pthread_passcrack
: PThread Parallelization
4.1 Overview
Similar to the first problem, implement a parallel version of
passcrack
using Posix Threads called pthread_passcrack
. As before,
focus your efforts on parallelizing single password cracking rather
than using different threads to crack different passwords.
As with the OpenMP version, use a number of threads that is obtained
from the environment variable PASSCRACK_NUMTHREADS
.
Do not alter any existing code. Instead put new functions you require
for your PThreads version in parallel_funcs.c
and use
pthread_passcrac.c
to implement a main function which will initiate
parallel execution.
As with OpenMP, the general strategy is to use the first layer of
try_crack()
to start multiple threads running on different loop
iterations. Other than that, there are some unique difficulties
associated with PThreads that you will need to surmount.
4.2 Obstacles with Parallelism
Similar obstacles to parallelism occur with the PThreads version as did for the OpenMP version.
- Threads will need to coordinate on the character buffer that is used to build up candidate passwords. The ideal solution is for each thread to use its own character buffer and write a found password to a globally accessible area.
- Threads should quit if another thread finds a solution to avoid doing more work than is necessary. This will involve checking a shared memory area for a sign that a solution has been found and terminating if it occurs.
Some additional issues with PThreads are also present.
- It is likely that worker threads will need to receive quite a few
parameters (array of dictionaries, target password, thread number,
etc.). For this it is suggested that you employ a parameter
structure of some kind whose fields are filled with data for each
worker thread. Keep in mind that
pthread_create()
allows for a single pointer argument so you will likely need an array of such parameterstructs
with data specialized to each thread. - Splitting loop iterations between threads will take more manual effort than OpenMP. Draw from your experience in class to use iteration variable tricks involving the thread ids and number of threads to make this as simple as possible.
4.3 Grading Criteria for PThreads Code
Points | Criteria |
---|---|
15 | Code compiles using the provided Makefile and runs for sample inputs |
5 | Parallel funtions are in the parallel_funcs.c and pthreads_passcrack.c file to implement the PThreads program |
5 | The number of threads used can be controlled using environment variable PASSCRACK_NUMTHREADS |
5 | The parallel code is reasonably well documented and cleanly written |
5 | It is clear that a strategy was employed for threads to exit quickly once a solution is found |
5 | Some speedup is achieved over the serial version of the code as indicated by timing tests on zeus |
5 Problem 3: Writeup and Submission
5.1 Writeup
As usual, include a document describing your program efforts. This
should contain sections describing how you went about parallelizing
passcrack
for both parallel platforms.
You should also include an analysis of timings for the problem. Use
the provided time-crack.sh
script for this which generates timing
output. This script also checks that the results of the parallel
omp_passcrack
and pthread_passcrack
are identical to the serial
passcrack
executable. The script produces output like the
following when the parallel programs are complete.
zeus-0> time-crack.sh make: Nothing to be done for `programs'. Running with data files pass-files/encrypted-5-2-from-100.txt dict-files/english-100.txt dict-files/english-100.txt Serial code real 10.97 user 10.71 sys 0.01 OMP nthreads 2 real 5.70 user 10.77 sys 0.06 OMP nthreads 4 real 5.37 user 17.55 sys 0.14 OMP nthreads 6 real 6.11 user 22.53 sys 0.05 OMP nthreads 8 real 5.38 user 19.18 sys 0.03 PTHREADS nthreads 2 real 5.50 user 10.35 sys 0.11 PTHREADS nthreads 4 real 8.32 user 28.39 sys 0.06 PTHREADS nthreads 6 real 7.80 user 28.81 sys 0.07 PTHREADS nthreads 8 real 7.91 user 28.76 sys 0.15
Several data files are available. Make sure to run at least what is shown above (the default).
5.2 (20 /20) Grading Criteria for Writeup
Points | Criteria |
---|---|
5 | Description of design of the OpenMP parallelization |
5 | Description of design of the PThreads parallelization |
10 | Timing results and discussion of how effective the parallelization was |
6 Extra Credit
6.1 Java Threads (20%)
We have discussed Java Threads to enable parallelism. For a 20% bonus on this HW score, you may attempt to port the password cracker to Java and use Java Threads to produce parallelism. For this, you will need to locate an implementation of MD5 hashing which is compatible with our project as porting it directly will be terrible.
Include a description of your design of the Java Threaded implementation and timing numbers to comparing its performance to the other versions.
6.2 Cilk Implementation (10%)
For a 10% bonus on this HW, you can attempt to implement a Cilk
version called cilk_passcrack
. Do so by writing additional function
in the parallel_funcs.c
, creating another main routine, and adjust
the Makefile
to compile these with correct options.
Include a description of your design of the Cilk implementation and timing numbers to comparing its performance to the other versions.