Building Trust in Numbers
Information becomes very vulnerable when it is transformed into electronic form. Unlike their paper representations, electronic documents can be modified or eradicated fairly easily, leaving no traces behind. This can be a significant problem now that all kinds of documents, ranging from personal address books and census data to works of literature and music, appear with increasing frequency on the Internet or elsewhere in online electronic formats only. My research explores techniques for building trust within information repositories, which consist of a very large number of independent repositories and are run by mutually distrustful parties in the absence of a common supervisor whom everyone trusts.
My goal is to understand how to design, evaluate, and implement trusted information repositories run by very large communities of mutually distrustful strangers. The importance of such communities lies in the fact that they organize and maintain themselves without supervision by a universally trusted arbiter, and yet can be designed to remain impenetrable to the kinds of corruption that a small number of powerful organizations or individuals could cause.
Similar unsupervised communal repository services, often called "peer-to-peer services" in the Distributed Systems field of Computer Science, are fairly well known and are in popular use today. However, they are fundamentally not trusted since they offer no assurances on the authenticity of the information they maintain. Building trusted peer-to-peer information repositories can be essential in protecting freedom of speech and personal privacy, as well as deploying collaborative, incorruptible shared archives of information around the world.
One way an author can protect a sensitive document at the time of publication is to distribute it across multiple computers that are run by different individuals or organizations across different parts of the network and the world. When a reader needs to retrieve the document, he or she gathers a number of the copies originally distributed by the author. If the copies differ, because of foul play or benign failure, the version on which most copies agree, i.e., the one having the majority of the copyholders' opinions, is considered "official," thereby resolving the conflict. If all but a few of the computers holding copies remain incorrupt, then the document will survive and be accessible to others. A single malicious adversary wishing to subvert the document would have to either trick all readers into asking their computers for copies of the document, thinking they are requesting copies from distinct copy holders, or subvert all copies of the document distributed to computers around the world; both of these undertakings can be made arbitrarily expensive. Therefore, an author can be assured that their documents will defy eradication, prohibition, and unauthorized alteration even against the wishes of a small number of very powerful adversaries like governments, religions, and natural disasters.
This general approach relies on the assumption that no individual can afford to own, corrupt, coerce into corruption, or attack a significantly large fraction of the computers in the world. For example, a national government may be able to coerce through legal means all computers within its jurisdiction to not carry a specific document; it is unlikely, however, that any single government would be able to coerce many computers around the world to not carry that same document. Unfortunately, there is no straightforward way to assert that Computer A and Computer B can or cannot be biased against a document at the same time. Virtually all developed systems of this variety assume that biases against a document are independent. For example, they assume that corrupting any two computers against a document is at least twice as hard as corrupting one. In practice, however, this is not the case.
My research goals are to develop methods to evaluate distributed archival services against particular user requirements and to allow them to deal with interdependent biases in realistic settings. First, I intend to study the fundamental factors that influence the design of such systems. Some of these are availability, assurance, and storage overhead. Availability is a measure of how likely it is that the system will not be too busy to perform a task. Assurance is a measure of how convinced a user should be that the information they receive has not been compromised. Storage overhead is a measure of how much more hard disk space must be devoted to a document held within the system as opposed to if the document were just stored on its author's home computer. Although many distributed archival services are currently in development or use, they have not been studied comprehensively or compared convincingly to each other.
Second, I will address the major unresolved issue described above agreement in the face of dependent biases. To that end, I am exploring the concept of diversified majorities. When deciding which document version is official, the version supported by a more diverse, arguably independent group of copyholders, wins out. Arguable dependences are based on a number of common sense criteria, such as national boundaries, institutional affiliation, network connectivity, etc. For example, if two document versions are supported by two votes each, but one pair comprises a computer in Brazil and a computer in the U.S., whereas the other pair only comprises computers in the U.S., the former wins out. Although it is possible for a computer in Brazil and a computer in the U.S. to be interdependent (i.e. if they are owned by the same corporation), one can argue that such a dependence is less likely than a dependence between two computers in the U.S., where the owners of two distinct computers still have to abide by the same federal laws, court rulings, etc.
I am also looking into ways to infer which copyholders are independent, in a manner similar to the witch-hunts of the past. Periodically, say every month, all participants indulge in a rampant war of accusations, where each accuses other participant groups of collusion. For example, a large company, through its participants, would accuse all participants from its competitor of collusion. Once all participants reach agreement on who accuses whom (there are ways of doing that even in the presence of uncooperative participants) groups of computers that have not been accused of collusion by anyone are considered independent. Publishing this map of who is considered dependent on whom can allow document authors to distribute document copies to highly independent groups of computers, thereby making it harder to sway what the majority of copyholders think of as the official document. Of course, the map itself also has to be protected against tampering. However, since map updates, resulting from the periodic witch-hunts, occur infrequently compared to normal document publications, they can be made much harder to compromise by substantially increasing the number of map copies distributed around the network.
Though my research on this topic has just begun, I hope through the agenda I described above to help ensure the long-term well being of digital information on the Internet.
|Modified 15 January 2003 * Contact Us|