CS 707 Assignment 1 Spring 2006

Assignment #1 - Multi-threaded Web Proxy Server

Due Feb 20

Assignment overview:
In this assignment you are asked to build a multi-threaded Web proxy server that is capable of delivering Web content on behalf of a remote Web server. When a user's browser is configured to connect via a proxy server, her browser establishes a connection to the proxy server's machine and forwards its complete request to the proxy server rather than the end Web server. The proxy server accepts the user's HTTP requests and forwards them to their final destination - essentially it introduces an extra "hop" between the user's browser (or the client) and the Web server.

Multi-threading is not crucial to the functionality of a Web proxy server, but it is important to allow the server to process multiple simultaneous requests in parallel. Your proxy server will need to be multi-threaded for full credit (as discussed below).

Requirements in detail:
The proxy server acts as a HTTP server to client browsers while it acts as a HTTP client to the real Web server. The protocol specification for versions 1.0 and 1.1 of HTTP are defined in RFC 1945 and RFC 2616 respectively. Note that while these specifications are quite detailed, you only need to be concerned with a small subset of the HTTP protocol. For this assignment, you will need to understand the formats of the request and response messages used by HTTP since the proxy server will need to parse the contents of the messages it receives from the clients and web servers. Second, you need to be aware of the approaches used by HTTP 1.0 and HTTP 1.1 to make connections to the web server, i.e., the use of persistent or non-persistent connections.

A full Web server supports HEAD, POST, and GET methods. Your proxy server only needs to support the GET method. To serve each request, we first need to parse the request line and headers sent by the client. The request line for the proxy server typically looks like this:

 GET http://www.foo.com/bar.html HTTP/1.0  ... ...

The requested Web page name contains the Web server name www.foo.com and the requested file on that server /bar.html. In this case, your proxy server should make a TCP connection to Web server www.foo.com at the default port 80 and ask for file /bar.html by send the following request:

GET /bar.html HTTP/1.0\r\n\r\n

After sending a request to the end Web server, a HTTP response including the requested file (in the case of a GET request) will be received at the proxy server. The proxy server should then forward the content to the client. There are three parts in a HTTP response message: the status line, the response headers, and the entity body. The status line and response headers are terminated by an empty line (or an extra "\r\n"). In short, a HTTP response message may look like the following:

HTTP/1.0 200 OK

Server: Joe's Web proxy server

Date: Thu Sep 12 17:52:40 EDT 2002

Content-length: 3814

Last Modified: Thu Sep 12 01:37:39 EDT 2002

Content-type: text/html    ... ...

The status line and the header fields should be forwarded to the client without modification.

The server should be able to handle multiple simultaneous service requests in parallel. This means that the Web proxy server is multi-threaded. In the main thread, the server listens at a fixed port. When it receives a TCP connection request, it sets up a TCP connection socket and services the request in a separate thread.

Logging:
The proxy server should keep track of all requests in a log file named proxy.log. Each log file entry should be of the form:

Date: browserIP URL size



where

 browserIP
is
the IP address of the browser,

URL
is the URL asked for, and

size
is the size in bytes of the object that was returned (essentially, the
number
of bytes received from the end server from the time a connection is
opened to
the time it is closed.)You will need to synchronize access
to the log file so that
only one
thread can modify it at a time. If you do not synchronize access, the
log file
is likely to be corrupted.

Testing:
You should test and debug your proxy server initially using telnet as a client. Later, you can use a web browser as a client. For grading, your program will be tested on the Linux systems in the IT&E lab, so it is recommended that you compile and test your program in that environment before submitting it.

Submission Instructions:
You are asked to turn in your source files, a makefile if needed, and a README file. No matter what programming language you choose to use, your program should take a single parameter (its port number) on startup. You should also name your executable to be ProxyServer. If your program is written in Java, you should be able to launch your server using "java ProxyServer <port-number>". If your program is written in C/C++, you should be able to launch your server using "ProxyServer <port-number>". While testing your program, you should choose a port number that is large enough (e.g. 8080) that it can be used by your program without requiring special privileges.

The README file should be in plain text format. It should contain a description of your design. What is and what is not realized in your implementation. If your program requires any special compilation flag to build, you need to specify the full build command in the README file.

You should create a zip archive or tarball of all your files (source files, README file, and makefile, if needed) and email it to setia at cs.gmu.edu. You should also give me a hardcopy of your source files and README file in class.

Grading criteria:

60%: properly forward Web pages from remote servers to browsers.
30%: multi-threading that allows the server to service multiple client connections simultaneously.
10%: a clear README file, clarity of your source code and completeness of your comments.

Late turn-in policy:
Late turn-ins will be accepted for up to three days, with 10% penalty for each late day. As discussed in class, you will have an opportunity to revise your program and re-submit for an improved grade. If you chose to re-submit, the final grade will the average of the grades for the initial submission and the resubmission.