Due Feb 20
Assignment overview:
In this assignment you are asked to build a multi-threaded Web proxy
server
that is capable of delivering Web content on behalf of a remote Web
server.
When a user's browser is configured to connect via a proxy server,
her browser establishes a connection to the proxy server's machine and
forwards
its complete request to the proxy server rather than the end Web
server. The
proxy server accepts the user's HTTP requests and forwards them to
their final
destination - essentially it introduces an extra "hop" between the
user's browser (or the client) and the Web server.
Multi-threading is not crucial to the functionality of a Web proxy server, but it is important to allow the server to process multiple simultaneous requests in parallel. Your proxy server will need to be multi-threaded for full credit (as discussed below).
Requirements in detail:
The proxy server acts as a HTTP server to client browsers while it acts
as a
HTTP client to the real Web server. The protocol specification for
versions 1.0
and 1.1 of HTTP are defined in RFC
1945 and RFC 2616 respectively. Note
that while these specifications are quite detailed, you only need to be
concerned with a small subset of the HTTP protocol. For this
assignment, you
will need to understand the formats of the request and response
messages used
by HTTP since the proxy server will need to parse the contents of the
messages
it receives from the clients and web servers. Second, you need to be
aware of
the approaches used by HTTP 1.0 and HTTP 1.1 to make connections to the
web
server, i.e., the use of persistent or non-persistent connections.
A full Web server supports HEAD, POST, and GET methods. Your proxy server only needs to support the GET method. To serve each request, we first need to parse the request line and headers sent by the client. The request line for the proxy server typically looks like this:
GET http://www.foo.com/bar.html HTTP/1.0 ... ...
The requested
Web
page name contains the Web server name www.foo.com
and the requested file on that server /bar.html
.
In this case, your proxy server should make a TCP connection to Web
server www.foo.com
at the default port 80
and ask for file /bar.html
by send the following request:
GET
/bar.html HTTP/1.0\r\n\r\n
After sending a request to the end Web server, a HTTP response
including the
requested file (in the case of a GET request) will be received at the
proxy
server. The proxy server should then forward the content to the client.
There
are three parts in a HTTP response message: the status line, the
response
headers, and the entity body. The status line and response headers are
terminated by an empty line (or an extra "\r\n"
).
In short, a HTTP response message may look like the following:
HTTP/1.0 200 OK
Server: Joe's Web proxy server
Date: Thu Sep 12 17:52:40 EDT 2002
Content-length: 3814
Last Modified: Thu Sep 12 01:37:39 EDT 2002
Content-type: text/html ... ...
The status line and the header fields should be forwarded to the client without modification.
The server should be able to handle multiple simultaneous service requests in parallel. This means that the Web proxy server is multi-threaded. In the main thread, the server listens at a fixed port. When it receives a TCP connection request, it sets up a TCP connection socket and services the request in a separate thread.
Logging:
The proxy server should keep
track of all
requests in a log file named proxy.log.
Each log
file entry should be of the form:
Date: browserIP URL size
where
browserIP
is
the IP address of the browser,
URL
is the URL asked for, and
size
is the size in bytes of the object that was returned (essentially, the
number
of bytes received from the end server from the time a connection is
opened to
the time it is closed.) You will need to synchronize access
to the log file so that
only one
thread can modify it at a time. If you do not synchronize access, the
log file
is likely to be corrupted.
>
Testing:
You should test and debug your proxy server initially using telnet as a
client.
Later, you can use a web browser as a client. For grading, your program
will be
tested on the Linux systems in the IT&E lab, so it is recommended
that you
compile and test your program in that environment before submitting it.
Submission Instructions:
You are asked to turn in your source files, a makefile if needed, and a
README
file. No matter what programming language you choose to use, your
program
should take a single parameter (its port number) on startup. You should
also
name your executable to be ProxyServer
.
If your program is written in Java, you should be able to launch your
server
using "java ProxyServer
<port-number>
". If your program is written in
C/C++, you
should be able to launch your server using "ProxyServer <port-number>
". While testing your program, you should
choose a port number that is large enough (e.g. 8080) that it can be
used by
your program without requiring special privileges.
The README file should be in plain text format. It should contain a description of your design. What is and what is not realized in your implementation. If your program requires any special compilation flag to build, you need to specify the full build command in the README file.
You should create a zip archive or tarball of all your files (source files, README file, and makefile, if needed) and email it to setia at cs.gmu.edu. You should also give me a hardcopy of your source files and README file in class.
Grading criteria:
Late turn-in policy:
Late turn-ins will be accepted for up to three days, with 10% penalty
for each
late day. As discussed in class,
you will
have an opportunity to revise your program and re-submit for an
improved grade.
If you chose to re-submit, the final grade will the average of the
grades for
the initial submission and the resubmission.