CS 499 Homework 3: Unix IPC and Shared Memory Machine Basics

Due: Sunday 4/3/2016 by 11:59 pm
Approximately 10% of total grade
Submit to Blackboard zip of code with Word/PDF report
You may work in groups of 2 and submit one assignment per group.
Code Distribution: distrib-hw3.zip

CHANGELOG: Empty

1. Overview
2. (40%) Problem 1: IPC Heat
3. (30%) Problem 2: Basics of Shared Memory Architecture
4. (30%) Problem 3: OpenMP Patternlets

1 Overview

This assignment is divided into 3 parts to cover

Unix System V IPC
Basic Shared Memory Architecture Theory
OpenMP Basics

Each problem has some writing associated with it which you should do in either a Word or PDF document which is turned in with any code you submit.

2 (40%) Problem 1: IPC Heat

Convert the heat.c program to use a System V Interprocess Communication calls to parallelize it. This program should be familiar to you based on the prior programming assignments.

2.1 Implementation Notes

The restrictions on your implementation are as follows.

Name your program ipc_heat.c and adhere to the following usage pattern which takes a final argument which is the number of processes to use in the computation.

> ipc_heat
usage: ipc_heat max_time width print #PROCS
  max_time: int
  width: int
  print: 1 print output, 0 no printing
  #PROCS:   int, number of processes to use
> ipc_heat 5 8 1 2
   |     0     1     2     3     4     5     6     7 
---+-------------------------------------------------
  0|  20.0  50.0  50.0  50.0  50.0  50.0  50.0  10.0 
  1|  20.0  35.0  50.0  50.0  50.0  50.0  30.0  10.0 
  2|  20.0  35.0  42.5  50.0  50.0  40.0  30.0  10.0 
  3|  20.0  31.2  42.5  46.2  45.0  40.0  25.0  10.0 
  4|  20.0  31.2  38.8  43.8  43.1  35.0  25.0  10.0

As was the case for the MPI version of heat, ipc_heat only needs to work when the number processors used evenly divides the number of columns in the heat matrix.
Use fork() to spawn child processes to parallelize the program. Spawn a number of children equal to the last command line argument given. Take care that children do not spawn additional child processes which is a common mistake.
Use a block of shared memory for the Heat Matrix processes will access. You will likely want to allocate and attach this shared memory prior to spawning children so that they can all access it. Keep in mind that calls like shmget() can only allocate 1D blocks of memory so you may wish to set up pointers in an array within it to simulate a 2D-like array using a loop like the following.
```
// allocate shared memory with shmget()
// attach memory to shm_ptr using shmat()
// Below code makes mat look like a 2D array 
mat = malloc(rows * sizeof(double *));
for(i=0; i<rows; i++){
  mat[i] = shm_ptr + cols*i;
}
```
Use either semaphores or message queues to coordinate updates to the Heat Matrix between processes.
If you employ message queues, keep in mind that you will definitely want to use the "tag" mechanism which allows messages to be put into a queue with an integer associated to it that may correspond to the recipient. I found it useful to have two message queues, on for messages to the right and one for messages to the left.
If you use semaphores, you may wish to use an array of semaphores or several arrays for left/right boundary values.
At the end of the computation, a single process should print out the entire Heat Matrix as was the case in the MPI version. This will be easiest if a block of shared memory is utilized rather than having each process privately allocate its own sections of the heat matrix.

2.2 Timing Script

Use the provided script time-heat.sh to produce a table of times for various sizes of heat matrices and numbers of processors. You should run your code on zeus.vse.gmu.edu as it has 4 physical processors (8 counting hardware "hyperthreading") to demonstrate any potential speedups.

> make
gcc -g -o heat heat.c
gcc -g -o ipc_heat ipc_heat.c
> time-heat.sh
 rows  cols  p  time
 1000  1000  1  0.02
 1000  1000  2  0.01
 1000  1000  4  0.01
...
20000 10000  4  0.84
20000 10000  8  0.98

2.3 What to Turn In

Submit both your code ipc_heat.c and a word document which discusses the following issues regarding this program (and contains answers to the remaining problems).

Describe the overall design of your ipc_heat program. Discuss how you spawned processes and how many total processes you utilized for the program.
Include discussion of the System V IPC mechanisms you used to coordinate cooperating processes to finish the heat calculation.
Include a table generated using the time-heat.sh script provided as run on zeus.vse.gmu.edu.
Discuss whether the timings from zeus indicate it is possible to speed up the heat computation using IPC or if it suffers from the same communication overhead that the MPI does negating the value of multiple cooperating processes.

3 (30%) Problem 2: Basics of Shared Memory Architecture

Answer the following questions from the textbook in your Homework Write-up.

3.1 Grama 2.7

What are the major differences between message-passing and shared-address-space computers? Also outline the advantages and disadvantages of the two.

3.2 Grama 2.8

Why is it difficult to construct a true shared-memory computer? What is the minimum number of switches for connecting P processors to a shared memory with M words (where each word can be accessed independently)?

3.3 Grama 2.9

Of the four PRAM models (EREW, CREW, ERCW, and CRCW), which model is the most powerful? Why? Define what is meant by "define" what you mean by most powerful here.

4 (30%) Problem 3: OpenMP Patternlets

Courtesy of the fine folks at CS in Parallel, the code distribution for this HW has a directory called OpenMP-patternlets which contains a series of codes with which to experiment. Each code contains instructions on the intended steps to take to learn from it, mostly surrounding commenting or uncommenting lines and observing outputs.

Answer the following questions based on your experience with this code. You may need to do some additional research to answer some of these questions, perhaps examining the required OpenMP Tutorial reading. Put your answers in your Homework Write-up.

The #pragma omp parallel directive launches multiple threads to perform a computation. How many threads are used by default? How does one adjust the number of threads launched? /There are at least two ways to change the number of threads which you should describe./
How do threads in OpenMP obtain a unique identifier and determine the total number of threads being used?
OpenMP provides several easy ways to distribute multiple loop iterations over cooperating threads. Describe these and make sure to discuss the differences how loop iterations are distributed to the different thread numbers.
When multiple threads are altering the same shared variable in parallel loop, the integrity of the variable's value can be compromised. Describe some ways that this can be avoided for loops that sum or count. Give a few code examples drawn from the exercises.
Describe OpenMP's notion of a private variable. Demonstrate its use and the effects of making a variable private during a parallel loop.
Several exercises deal with critical and atomic sections facilities. Do some research and report what the difference between these two. Describe their relative strengths and limitations.