The Daily Caveat is written by Michael Thomas, a recovering corporate investigator in the Washington, DC-area.

CARE TO CONTRIBUTE?

TIPS, COMMENTS and QUESTIONS are always welcome (and strictly confidential).

Contact The Daily Caveat via:



Join our mailing list to new posts via email.



Or justrss icon read the feed...


Previous Posts
6/20/2005
Article: The Fading Memory of the State
The Daily Caveat has a soft spot for National Archives and Records Administration. The agency, where at least one Caveat Research partner began his professional career, serves as organ of our nation's memory and exists to provide the ready access to essential evidence that, quite frankly, is a prerequisite for the continued function of democracy in American. Of its institutional mandate, NARA says:
[The National Archives] is a public trust upon which our democracy depends. NARA enables people to inspect for themselves the record of what government has done. NARA enables officials and agencies to review their actions and helps citizens hold them accountable for those actions. And NARA ensures continuing access to essential evidence that documents the rights of American citizens, the actions of Federal officials, and the national experience.
In service of that public trust, the Archives is now embarking on a project of epic proportions that will come to shape the way we view transparency in government for the next century. The National Archives has been charged with the responsibility of establishing a system for accessioning, organizing and storing governmental records that are "born digital." At this moment decisions are being made at NARA which will determine how the history of our era is written and what tools historians will have available to write it.

In The Fading memory of the State, Technology Review magazine offers some insight into this looming $100 million "Manhattan Project" for digital data storage. The parameters of the - no joke, Herculean - project look something like this:
According to NARA's specifications, the system must ultimately be able to absorb any of the 16,000 other software formats believed to be in use throughout the federal bureaucracy--and, at the same time, cope with any future changes in file-reading software and storage hardware. It must ensure that stored records are authentic, available online, and impervious to hacker or terrorist attack.
NARA plans to roll out the database between 2007 and 2011. The agency is working with two primary contractors, Harris Corporation and Lockheed Martin in the design and implementation of the system. And the rollout can't come soon enough.
...managing growing data collections is already a crisis for many institutions, from hospitals to banks to universities. Tom Hawk, general manager for enterprise storage at IBM, says that in the next three years, humanity will generate more data--from websites to digital photos and video--than it generated in the previous 1,000 years. "It's a whole new set of challenges to IT organizations that have not been dealing with that level of data and complexity," Hawk says...

...Still, NARA's problem stands out because of the sheer volume of the records the U.S. government produces and receives, and the diversity of digital technologies they represent. "We operate on the premise that somewhere in the government they are using every software program that has ever been sold, and some that were never sold because they were developed for the government," says Ken Thibodeau, director of the Archives' electronic-records program. The scope of the problem, he adds, is "unlimited, and it's open ended, because the formats keep changing."

The Archives faces more than a Babel of formats; the electronic records it will eventually inherit are piling up at an ever accelerating pace. A taste: the Pentagon generates tens of millions of images from personnel files each year; the Clinton White House generated 38 million e-mail messages (and the current Bush White House is expected to generate triple that number); and the 2000 census returns were converted into more than 600 million TIFF-format image files, some 40 terabytes of data. A single patent application can contain a million pages, plus complex files like 3-D models of proteins or CAD drawings of aircraft parts. All told, NARA expects to receive 347 petabytes (see "Definitions") of electronic records by 2022.

Currently, the Archives holds only a trivial number of electronic records. Stored on steel racks in NARA's 11-year-old facility in College Park, the digital collection adds up to just five terabytes. Most of it consists of magnetic tapes of varying ages, many of them holding a mere 200 megabytes apiece--about the size of 10 high-resolution digital photographs. (The electronic holdings include such historical gems as records of military psychological-operations squads in Vietnam from 1970 to 1973, and interviews, diaries, and testimony collected by the U.S. Department of Justice's Watergate Special Prosecution Force from 1973 to 1977.) From this modest collection, only a tiny number of visitors ever seek to copy data; little is available over the Internet.

Because the Archives has no good system for taking in more data, a tremendous backlog has built up. Census records, service records, Pentagon records of Iraq War decision-making, diplomatic messages--all sit in limbo at federal departments or in temporary record-holding centers around the country. A new avalanche of records from the Bush administration--the most electronic presidency yet--will descend in three and a half years, when the president leaves office. Leaving records sitting around at federal agencies for years, or decades, worked fine when everything was on paper, but data bits are nowhere near as reliable--and storing them means paying not just for the storage media, but for a sophisticated management system and extensive IT staff.

Academic departments coast to coast - from the San Diego Supercomuting Center to the Massachusetts Institute of Technology have been set to work on how to manage and convert data from literally every format ever invented into an archival standard practical for preservation, cataloging and continuing access.

The the problems whith a technological challenge of this scale are mammoth, but what is at stake in this "moon shot" level project is at least as significant, argues KenThibodeau, director of the National Archives Electronic Records program, "there's every reason to say that in 25 years, you won't be able to read this stuff." Without their work, warns Thibodeau. "Our present will never become anybody's past."

TDC highly recommends taking a gander a the full article, which can be found here.

-- MDT

Labels:

0 Comments.
Post a Comment


all content © Michael D. Thomas 2010