When you install the site software RPM’s on your web server, you join the Reciprocal Net Site Network, an Internet community of crystallography labs that donate their time and their servers to the project. The primary function of the Reciprocal Net site software you installed on your server is to act as a web application, allowing authenticated users to submit and alter sample data and metadata, and unauthenticated users to access data and metadata for a select set of public samples. This all takes place in the standard web application paradigm, where human users direct their web browsers to your web site.
The Reciprocal Net Site Network performs a considerable amount of behind-the-scenes management in order to keep its distributed database consistent. The several sites in the network exchange messages and files with one another and with the Reciprocal Net Coordinator on a fairly frequent basis. It is recipnetd, the Reciprocal Net daemon, which performs this function at your site. The remainder of this section will provide an overview of this interaction.
Your site will disclose sample data and metadata in the following circumstances:
· when requested by unauthenticated users via the web application, provided the sample has been Released to Public.
· when requested by authenticated human users via the web application. The authenticated user is authorized to access a sample if any of the following conditions are met: 1) the authenticated user’s account belongs to the same lab from which the sample originated, 2) the authenticated user’s account belongs to the same provider from which the sample originated, 3) the authenticated user’s account has been authorized explicitly by an entry in the sample’s access control list, or 4) the sample has been Released to Public.
· when requested by a harvesting agent elsewhere on the Internet via OAI (Open Archives Interface), provided the sample has been Released to Public. OAI is a protocol layered on top of HTTP that is commonly used in the digital library community for automated, distributed metadata harvesting. It generally does not require client authentication.
· when requested by another site in the Site Network via bulk HTTP file transfer, provided the sample has been Released to Public.
· Sample metadata is broadcast automatically to other sites (as described later) whenever a sample’s metadata record is modified, provided the sample has been Released to Public.
It is anticipated that outside organizations like the National Science Digital Library (NSDL) and the Cambridge Structural Database (CSD) will catalog public samples in the Site Network in an accurate and reliable fashion using the interfaces described above. It is also conceivable that additional interfaces may be created for such purposes in the future.
Your site will receive, accept, and process signed messages from the Reciprocal Net Coordinator for the following purposes:
· new site initialization during the installation process, by way of the recipnet.sitegrant file installed on your computer.
· maintaining a globally consistent list of sites. A message is received whenever another site in the Site Network is added to the network, removed from the network, or information in its site record is modified.
· maintaining a globally consistent list of labs. A message is received whenever a lab in the Site Network is added, or when information in the lab record is modified by the Coordinator (lab modifications may be made also by the site on which the lab is currently hosted).
· maintaining a globally consistent list of sample id’s. A message is received whenever the Coordinator transfers sample id’s to other sites, or claims sample id’s for itself (perhaps with the intent of transferring them to a new site).
· collecting global statistics. A message is received from the Coordinator whenever statistics are being requested from your site. This may occur periodically. In response, your site will send a message to the designated statistics collection point in which certain operating statistics will be disclosed. This is discussed in more detail later.
Your site will broadcast information to all other sites in the Site Network for the following purposes:
· maintaining a globally consistent list of labs. An updated lab record is broadcast to the other sites whenever any information about a lab hosted at your site is changed, such as when an administrator uses the web app’s Edit Lab feature.
· maintaining a globally consistent list of providers. A provider record is broadcast to the other sites whenever any provider associated with a lab hosted at your site is changed (such as when an administrator uses the web app’s Edit Provider feature), or whenever a new provider is created (such as when an administrator uses the web app’s Add New Provider feature).
· maintain a globally consistent set of sample metadata for public samples. A sample metadata record is broadcast to the other sites whenever a sample belonging to a lab hosted at your site becomes publicly visible (via the web app’s Release to Public feature), loses its publicly-visible status, or a publicly-visible sample has its metadata updated. In the future, an updated sample metadata record may be broadcast to the other sites whenever data files in the repository for a publicly-visible sample are added, deleted, or modified.
· maintaining a globally consistent catalog of sample data for public samples. A sample data holding announcement is broadcast to the other sites whenever data for a public sample begins to be hosted at your site (such as when a new sample originated by a lab at your site is Released to Public via the web app, or public sample data arrives at your server by some other means). Another message is broadcast if the sample data later becomes unavailable.
· maintaining a globally consistent directory of sample id’s. Sample id negotiation messages are broadcast to the other sites from time to time as your site generates new sample id’s and coordinates with other sites as they generate their own new sample id’s. A distributed algorithm is required for this process in order to ensure that the sample id’s used by your site are globally unique across all of the Site Network. It is theoretically possible that a malicious person elsewhere on the Site Network could track these broadcasts to infer the number of samples (both public and private) that have been created at your site, to the nearest thousand. It would not be possible for the malicious person to determine the way in which sample id’s were distributed among the individual labs at your site.
· in the future, real-time network-wide load balancing. General statistics regarding server/network load will be broadcast to the other sites periodically in order to facilitate load balancing across the distributed database. These statistics would be limited to data items like CPU load factor, number of simultaneous human web users (with no distinction made between those authenticated and those not), number of samples and bytes served recently (with no distinction made between samples that are public and those that are not), observed throughput to other sites, and so forth. (As of release 0.9.0, the site software does not yet have this capability.)
Your site will send private messages to other sites in the Site Network for the following purposes:
· negotiating sample id’s as was described previously.
· reporting aggregate statistics. Your site will disclose operating statistics (as described later) to the designated collection point in response to a signed request from the Coordinator.
Your site will disclose general statistics regarding its routine operation to the designated collection point in response to signed requests from the Reciprocal Net Coordinator. Such requests may be received on a regular basis. The statistics then are used by the Coordinator to publish whole-network utilization figures, track network growth, troubleshoot abnormal activity, and so forth. The statistics disclosed are:
· Total uptime and software version of recipnetd. These figures will not be published by the Coordinator in conjunction with any information that might identify your site.
· Total number of web app sessions, broken down by those in which users authenticated and those in which users did not. These figures will not be published by the Coordinator in conjunction with any information that might identify your site.
· Total number of “page views” on samples served via the web app, broken down by lab, and then broken down by visits to the Show Sample page versus visits to the Edit Sample page. These figures will not be published by the Coordinator in conjunction with any information that might identify your site or any lab at your site.
· Total combined size (in bytes) of all data files in the repository, along with the mean size of each data directory with figures broken down by lab. These figures will not be published by the Coordinator in conjunction with any information that might identify your site or any labs at your site.
· Total number of samples that have been released to the public, broken down by lab. This same information is already readily apparent to any unauthenticated user who browses Reciprocalnet.org; therefore, the Coordinator reserves the privilege of publishing this figure in conjunction with the name of your lab or site.
· Total number of messages exchanged with every other site in the Site Network, along with a list of each unique sequence number used. The actual contents of these messages are not disclosed. This information may facilitate debugging and troubleshooting. (The circumstances under which messages can be exchanged were described previously.) These figures will not be published by the Coordinator in conjunction with any information that might identify your site.
· Total number of OAI queries that were answered. Also a count of the total number of samples returned by those queries. These figures will not be published by the Coordinator in conjunction with any information that might identify your site.
· Total number of HTTP bulk file transfer requests that were accepted. Also a count of the total number of files and the total number of bytes that were transferred during those operations. These figures will not be published by the Coordinator in conjunction with any information that might identify your site.
· In the future, the National Science Foundation (NSF) and/or the National Science Digital Library (NSDL) may request information about the Site Network’s utilization that requires recipnetd to collect additional and possibly more specific statistics, and the Coordinator in turn to collect them. Such additional data collection activity would be well-defined and will not track web app visitors from visit to visit, track authenticated human users except in the aggregate, track the sort of detailed HTTP information normally written to a “web server log” except in the aggregate, or track information about non-public samples except in the aggregate and only when contrasted with public samples at the same site.