CSCB09 2023 Summer Assignment 4

Due: August 9 Wednesday 11:59PM
This assignment is worth 10% of the course grade.

This assignment is on network stream socket programming.

As usual, you should aim for reasonably efficient algorithms and reasonably organized, comprehensible code.

Correctness (mostly auto-testing) is worth 90% of the marks; code quality is worth 10%.

HandMeUp File Submission System

The protocol in this assignment is a stripped down toy example of submitting files to a central server. In short, a client sends a user name (we omit authentication tokens WLOG), a file name, the file size, and the file content. The server stores the file and sends a serial number.

Since stream sockets are used, there is no “packet boundary”. We use newlines for delimiters. (In the real world, most protocols use CR-LF. None uses NUL. (Some protocols actually have NUL as actual data, not end of strings.))

Here is the detailed protocol sequence (IF neither clients nor servers malfunction):

The client connects to the server.
The client sends:
1. user name (1 to 8 letters or digits), newline
2. file name (1 to 100 bytes, should not contain '/' or newline), newline
3. file size (1 to 10 digits), newline
4. file content
Again, there is no guarantee that the above comes as one chunk, or four chunks nicely aligned with the four items, or one chunk for the first three then one chunk for the file content, or whatever. Splitting at any point is possible. You only have newlines and the promised file size to rely on.
The server checks that the user name, the file name, and the file size satisfy the constraints stated above.

If not, the server sends HDERR, newline; then the server disconnects. (“HD” stands for “header”.)
The server receives the number of bytes as promised by the client. (File storage is specified in a later section.)
The server sends a serial number (1 to 10 digits), newline. (Serial numbers are specified in a later section.)

Again, no guarantee that even these 11 bytes are not split.
Both sides disconnect.

Client [4 marks]

Implement a client program hmu-client.c. The 4 command line arguments are: server address in dot notation, server port number, user name, file name. (When marking, they will be valid, and the file will be a readable regular file with size below 2³².) The client program is responsible for determining the file size.

If opening the file for reading fails, or the IP address or the port number is invalid, or connection is unsuccessful, you may print an error message of your choice to stderr, and exit with a non-zero exit code.

If connection is successful, follow the protocol. At the end, print the received serial number to stdout before exiting (with exit code 0).

Server malfunctions happen all the time due to bugs and service disruptions. Here are the scenerios you must handle as prescribed:

Error when sending data, or error or EOF when trying to receive the serial number: Server disruption. Exit with a non-zero exit code or get terminated by SIGPIPE; you may print error messages to stderr.
When expecting a serial number: receiving non-digits, or more than 10 bytes, or no newline after: Server bug. Do not treat it as a valid serial number. Exit with a non-zero exit code; you may print error messages to stderr.
Extra bytes after the serial number and the newline: Just ignore.

Marking will test your client against a correct server and some malicious servers.

Server [6 marks]

Implement a server program hmu-server.c. The 2 command line arguments: port number to bind to, pathname of a helper program (explained in a later section). (When marking, they will be valid.)

The server should bind to the given port at address INADDR_ANY. We do expect this to fail all the time due to the “address already in use” error. If this happens, print an error message to stderr and exit with a non-zero exit code.

A server should be responsive to multiple existing and incoming clients concurrently, even when a bad client stalls and wants the server to wait forever; bad clients happen all the time by bad luck, bugs, or malice. Well-known approaches are: forking a child process for each client, so the parent is just an accept-fork loop; or multiplexing by select() or epoll(); or multi-threading. You may choose which approach you want. (Forking is the easiest. Here is an indirect test on how well you can integrate multiple topics from this course and realize that the child process can handle split data with almost no code of your own. If you fail that test, you don’t necessarily lose marks (in this course anyway), you just lose hair and time re-inventing the wheel, which translates to losing marks in other courses and/or general misery.)

Busy polling is disallowed. Marking will be done under a tight limit on CPU time.

If you use forking: Zombie processes should not happen. And yet, the parent process should not hang indefinitely to wait for a child to terminate, since it must also stay responsive to new connection requests. (Here is an indirect test on how well you paid attention to lectures and realize that there is a dead simple one-liner solution. If you fail that test… you get my point.)

The server should not terminate until SIGTERM or SIGINT. Upon those signals, it should be terminated by the signal (least work, default action) or exit with exit code 0. (If you use forking, obviously this is required of the parent process only.)

Client malfunctions happen all the time, even more than server malfunctions. The Internet is full of fools, trolls, and foolish trolls. Here are the scenerios you should handle as prescribed:

Header errors: As covered in the protocol description.
Premature EOF when receiving header or file content: The client is gone. Just disconnect. (Do not send a serial number.)
Longer file content than the client promised: Ignore the extra bytes. (Proceed to sending the serial number.)

Marking will test your server against correct clients and some malicious clients.

Serial Numbers And File Storage

The server maintains a serial number. It starts from 0 when the server starts, and increases by 1 (post-increment) for every successful accept(). To be concrete, the first client connection gets 0 for serial number.

The serial number is for both saving the received file and sending to the client at the end.

The received file should be saved in the current working directory under the name user-serial-filename. Example: If the user name is trebla, the client-supplied filename is foo.c, and the serial number is 4, then save to trebla-4-foo.c. We assume no errors writing the file.

If fewer bytes than the client promised are received, delete the saved file.

Helper Program

If you choose forking, you can also choose to have the child exec() a different program dedicated to handling a client. This can lead to tidy code and safeguard against common mistakes, e.g., accidentally running parent code and calling accept() and even fork() therefore potential fork-bomb.

If you choose to do this, put the code in hmu-helper.c. You create your own convention for how it knows the socket FD, the filename, the serial number, etc. (e.g., make your own command line arguments).

Marking may compile it to any filename under any directory, so please do not hardcode the filename in the exec() call. Instead, please use the 2nd command line argument of the server program. Marking will provide the right pathname.

If exec() fails (meaning typo in the pathname), please call exit() with a non-zero exit code. This safeguards against accidentally running parent code and calling fork() and… you get the point.

If you choose not to use a helper program (or you cannot because you prefer select()), please hand in the unchanged starter file for hmu-helper.c, just to make sure it still compiles when marking. Then your server code can ignore the 2nd argument.

File Size vs Memory Size

Do not assume that you have enough memory for the whole file. This applies to both clients and servers. This will be tested.

On the bright side, we assume that reading and writing regular files do not block.

Bonus Question [0 marks]

I have specified incrementing the serial number per client connection, meaning even if no valid file is received (e.g., header error, too few bytes of file content).

What if I required instead: increment per valid file after receiving it?

Why is it trivial if you use select/epoll() or multi-threading?

Why is it hard if you use fork() without introducing a race condition? How could it be done?

(Don’t worry, select/epoll() makes something else hard, and multi-threading is hard to learn. The glass of water is always half empty, the other pasture is always greener, and every silver lining has tarnish.)

Debugging And Error Messages

If you like to print debugging or error messages for your own sake, please send them to stderr only.

Good-Citizen Policy

Marks can be deducted from this assignment if, on the Mathlab server or BV lab PCs, you have left-over processes that have been consuming more than 24 hours of CPU time (the TIME field in ps, top, and htop).

Testing Tips

Randomize Port Number

When you run a server on Mathlab, since everyone is doing the same, you should randomly choose a port number between 1024 and 65535 based on ((student number × 331) mod 64439) + 1024. If that still gives “address in use”, add 1 and repeat.

Manual Testing by `nc`

The nc program can let you manually act as one side to hand-test the other side. You enter to stdin what to send; you see received data on stdout. Quickstart:

To act as a client: nc [-v] [-q 1] DOTADDRESS PORT
To act as a server: nc [-v] [-q 1] -n -l [DOTADDRESS] PORT. Note that this calls accept() only once, at the beginning. It only serves one client, then quits.

Mathlab Server, PC Client

Mathlab is behind a firewall. A firewall blocks most ports for safety, including ports we need for testing this assignment. ssh can help solve this problem.

If you have a server running on Mathlab at port sssss, e.g.:

mathlab$ /path/to/server sssss /path/to/helper

Then “ssh local forwarding” allows connecting from your PC. Pick a random port number xxxxx (criterion: available on your PC). Then the ssh command goes like:

my-pc$ ssh -L xxxxx:127.0.0.1:sssss utorid@mathlab.utsc.utoronto.ca

Tell your client on your PC that the server address and port are:

my-pc$ /path/to/client 127.0.0.1 xxxxx user file

PC Server, Mathlab Client

Your home router has a firewall; Windows adds an extra one. A firewall blocks most ports for safety, including ports we need for testing this assignment. ssh can help solve this problem.

If you have a server running on your PC at port sssss, e.g.:

my-pc$ /path/to/server sssss /path/to/helper

Then “ssh remote forwarding” allows you to connect from Mathlab. Pick a random port number xxxxx (criterion: available on Mathlab). Then the ssh command goes like:

my-pc$ ssh -R xxxxx:127.0.0.1:sssss utorid@mathlab.utsc.utoronto.ca

Tell your client on Mathlab that the server address and port are:

mathlab$ /path/to/client 127.0.0.1 xxxxx user file

Sample Clients And Servers

I will have sample clients and servers (exe only, clearly) available on Mathlab next week.

Handing In

Please hand in hmu-client.c, hmu-server.c, hmu-helper.c.