4 Week Programming Project
As a final project you will be writing a script which serves as a sort of simplistic Web Server. It will read an HTTP request out of STDIN and output a corresponding HTTP response to STDOUT. To understand what a web server is, and what it does, see this tutorial.
Overview
So far we have been using Perl for scripts that a user might want to use at a command line. In this Assignment we will move from local users, to non-local users.
Your script should:
- Read an HTTP request out of STDIN
- Find the file in your document root which was requested
- Output an appropriate HTTP response
- Be capable of logging server activities
What Your Server-Script Should Do
Have a Hard-Coded Root Directory
Similar to the command line interpreter's search path, your 'web server' should have a path variable, whose value is the name of a directory in which files available for download are located. This directory will hereafter be referred to as eithr the 'root directory' or 'document root'. So, for example, if the script has a hard coded root directory of '/home/r_leguen/htdocs' and your script receives a request with a URL '/stuff/index.html' it should look for and read a file '/home/r_leguen/htdocs/stuff/index.html'. This is the file whose contents will go in the Http Response's body.
If the requested file turns out to be a directory, the script should look for a file named "index.html" in that directory.
Populate the %ENV Hash
Your script should add the following key-value pairs to the pre-defined %ENV hash. Some of these headers are read out of the HTTP request, so the script must parse the HTTP requests headers to populate the %ENV hash. Any headers in the HTTP request which aren't included below should also be put in the pre-defined %ENV hash, with an 'HTTP_' prefixing the keys, and any spaces or dashes in the header's name should be turned into underscores.
So, if a request provides headers 'Content-Type: multipart/form-data' and 'User-Agent: Mozilla/4.0 (compatible; MSIE 4.5; Mac_PowerPC)', they should be inserted into the %ENV hash as the following key-value pairs:
'CONTENT_TYPE'=>"multipart/form-data"
'HTTP_USER_AGENT'=>"Mozilla/4.0 (compatible; MSIE 4.5; Mac_PowerPC)"
Here are all the values your script will have to add to the %ENV hash:
| Key | Request Header | Value |
|---|---|---|
| CONTENT_LENGTH | Content-Length | The length (in bytes) of the body of the HTTP request, if provided by the HTTP Request's header of the same name. |
| CONTENT_TYPE | Content-Type | The content type of the body of the HTTP request, if provided by the HTTP equests's header of the same name. |
Example: multipart/form-data | ||
| DOCUMENT_ROOT | – | The home directory of your "web server"; All requested URLs are considered to be relative to the document root. |
Example: /home/r/r_leguen/htdocs | ||
| QUERY_STRING | – (in the request line) | The query information obtained from the requested URL (anything after a '?' character). |
Example: hl=en&q=richard+le+guen&btnG=Google+Search&meta= | ||
| REQUEST_METHOD | – (in the request line) | The HTTP method used by this request. Provided by the HTTP request's 'request line'. |
Example: GET | ||
| SERVER_NAME | Host | The server's name or ip address. Provided by the HTTP request's 'Host' header. |
Example: www.leguen.ca or localhost | ||
| SERVER_PROTOCOL | – (in the request line) | The name and revision of the HTTP request's protocol. Provided by the HTTP request's 'request line'. |
Example: HTTP/1.1 | ||
| SERVER_SOFTWARE | – | The name and version of your script which is answering these requests… try to have fun here, and give your script a cool name! |
| Example: Apache/1.3.31 (Unix) mod_ssl/2.8.19 OpenSSL/0.9.7l |
Read and Parse an HTTP Request
Your script should read the HTTP request, and determine what file it is requesting to download. It should generate the appropriate Content-Type and Content-Length headers, and output the file's contents in the response's body. Choose the Content-Type based on the file's extension:
| Extension | Content-Type |
|---|---|
| .html | text/html |
| .htm | text/html |
| .css | text/css |
| .txt | text/plain |
| .xml | text/xml |
| .gif | image/gif |
| .jpeg | image/jpeg |
| .jpg | image/jpeg |
| .png | image/png |
| .mp3 | audio/mpeg |
| application/pdf | |
| other | application/octet-stream |
Keep a Logfile
If your script is run with the 'l' command line option set, it should write to a log file. Since your script only runs for the duration of one request, it must append output to the logfile. Every time your script is invoked, it must append a line to the logfile similar to the following:
[10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "Mozilla/4.08 [en] (Win98; I ;Nav)"
This logging includes the following information:
- The current time and date when the script finished responding to the request, in braces, '[' and ']'.
- The request line from the Http Request (in quotations).
- The numeric part of the response's status code.
- The size – in bytes – of the Http response's body.
- The value of the User-Agent header from the request.
Be sure to document in your readme.txt file how to determine or change the name of the log file.
Output an Http Response
Your script should write an Http Response to STDOUT. There are two main cases you have to consider – outputting responses to requests for '.cgi' files (server-side scripts) and requests for any other files – and some special [error] cases.
Requests For Non-Cgi (Normal) Files
If the request's URL corresponds to an existing file in the document root, the body of the response should consist of the contents of that file. (see the paragraph on the Document Root, above)
Requests For Server-Side Scripts
If your 'server' receives a request for a file which ends in a '.cgi' extension, it handles the request differently from requests for other files.
If the requested .cgi file has executable permissions set, the web server script should – after populating the %ENV hash – only output a status line; no headers, nor a request body. It should then use system() to execute the '.cgi' file, which will be responsible for outputting the headers and response body.
You will have to write and submit two '.cgi' script, written in Perl.
CGI Script: Interpreting a Post Request
The first will be a file upload script. It should be able to respond to an HTTP request sent from a form like this one: (note that you should copy-paste this HTML into a file in your document root for it to work)
<html>
<head>
<title>File Upload Page</title>
</head>
<body>
<div style='text-align:center;'>
<h1>File Upload Page</h1>
<form enctype='multipart/form-data'
method='POST' action='file-upload.cgi'
style='border:1px solid black;width:100%;
text-align:center;display:block;margin:auto;'>
<p>
Place this html file somewhere in your document root.
You will have to write a script which
- When you hit the "upload" button -
saves the uploaded file somewhere in the document root.
</p>
<input type='file' name='uploaded-file' />
<input type='submit' value='upload' />
</form>
</div>
</body>
</html>
The request body will contain data for a file upload. That data should be read out of the request body, and the uploaded file should be save somewhere in the document root. The response body should be HTML content explaining the file has been uploaded; where it was uploaded to, as well as provide a link to download it again. If the request body does not contain data needed for a file upload, the response should be no different than that for a GET request.
CGI Script: Interpreting a Get Request
The second '.cgi' script will read a parameter "LinuxCommand" out of the query string. If it finds a parameter "LinuxCommand" it returns a plain text response body which is the man page for that Linux command line command. See this tutorial if you don't remember the man pages. Otherwise, it returns HTML content which explains that the parameter was missing.
Special Cases
There are special cases where your script should produce an error page. This error page should contain the status code, a message about what the problem might be and the server information. Try creating errors on the Concordia servers and see the kinds of information they print: click here to see a Not Found error page.
Bad Request
If the HTTP request is invalid (poorly formatted) the response's status code should be "400 Bad Request" (case sensitive) and the body should be HTML content (and thus the Content-Type header should be…) which displays an appropriate message reflecting the fact that the request was invalid.
Not Found
If the requested file does not exist, the generated response's status code should be "404 Not Found" (case sensitive) and the body should be HTML content (and thus the Content-Type header should be…) which displays an appropriate message reflecting the fact that the file does not exist.
Forbidden
If the server is unable to read the requested file due to permissions, the status code should be "403 Forbidden" (case sensitive) and the body should be HTML content (and thus the Content-Type header should be…) which displays an appropriate message reflecting the fact that the file is not readable due to permissions.
Internal Server Error
A request for a .cgi file which does not have executable permissions set should produce an HTTP response with status code "500 Internal Server Error" (case sensitive) and the body should be HTML content which explains that the server "encountered an internal error or misconfiguration and was unable to complete the request."
Method Not Allowed
Lastly, your script should respond only to requests whose request method (in the request line; the first line of the HTTP request) is either GET or POST. Only requests for '.cgi' files (see below) are permitted to use POST. Any method other than GET or POST (or a POST request for a non-cgi file) and the HTTP response should have a status code of "405 Method Not Allowed" (case sensitive) and the body should be HTML content (and thus the Content-Type header should be…) which explains that only GET requests are permitted.
What to Submit
You should submit an archived (zipped) file containing your 'web server' script, at least one 'index.html' file for download, at least the two '.cgi' scripts, and a 'readme.txt' file explaining your script. Your 'readme.txt' should explain how to change the root directory of your web server, as well as how to choose the log file.
If you face any particular challenges, feel the assignment is ambiguous and have to make assumptions, or are uncertain of one of your solutions, explain this in your readme.txt file to potentially get partial marks.
You've seen everything you'll need for this assignment in the tutorials unless it is specified otherwise above. Do not go looking for extra Perl modules to use! Students lost marks for using non-standard Perl modules they didn't understand in past assignments.
Comment your code well, or you will be marked poorly.