Secure Distributed Backup:
Erasure Codes and Anonymous Message Delivery
Christopher Val Studholme
A thesis submitted in conformity with the requirements
for the degree of Doctor of Philosophy
Graduate Department of Computer Science
University of Toronto
Copyright © 2007 by Christopher Val Studholme
We address the problem of backing up one's important data files in an efficient, secure and robust manner by suggesting that copies of such files be sent to untrusted remote peers over the Internet. To provide incentive for such peers to store copies of your files, you would be required to devote some of your surplus disk space to storing copies of their files. In this manner, your data would be distributed widely about the planet and would thus be immune to loss due to all but the most global of disasters. We address a few of the major challenges that must be overcome to make such a network a reality.
First, we note that since remote peers are inherently unreliable, your data files must have some form a redundancy introduced to ensure complete recovery when needed; however, since wholesale replication of the files is neither space nor bandwidth efficient, we suggest the use of an erasure correcting code. We evaluate the use of a few existing codes for this task (Chapters 2 and 3) and propose a new class of codes, called windowed erasure codes, which aim to minimize time complexity while maintaining the greatest probability of successful recovery (Chapter 4).
To help ensure the security of one's data and to avoid the problem of an adversary engaging in a selective denial of service attack against a single participant, we propose the use of an anonymous message delivery technique for distributing file data (Chapter 6). The protocol we propose aims to be efficient in its use of network bandwidth while remaining secure in the face of clients who choose to collude against the others. Also, as a necessary primitive, we develop a multiparty protocol for generating secret permutations (Chapter 5).
Finally, a practical backup application will require a method of encoding a hierarchy of files and the changes to those files over time into a sequence of fixed length blocks to be used with the above protocols. For this, we propose an algorithm based on a rolling checksum which is capable of identifying and removing redundant blocks of data located anywhere within a directory hierarchy (Chapter 7).
My supervisory committee consisted of:
You may download a complete copy of the thesis from this site in either PDF (756kiB) or postscript (1.68MiB) format. Feel free to make a verbatim copy of the thesis for anyone who may be interested.
Papers related to this thesis: