— ADVERTISEMENT —
Starbulletin.com



Facts of the Matter
Richard Brill






Digital age
a nightmare
for archivists

It is not always possible to recognize the historical significance of an event when it occurs, a notable example being the first electronic computer, which was assembled from 18,000 vacuum tubes in 1945.

Historical perspective sorts the significant from the trivial; history changes as the attitudes and preferences of people change.

We expect that the more recent the event, the more that we know about it because more records survive. But that's not necessarily the case in the digital age. Rather, we run the risk that our present will become nobody's past.

The documents of today are born digital. Unfortunately, they often die young either because the way in which the data is stored is constantly changing, or unlike the Rosetta stone, the storage media for digital data has a short shelf life. In either case, documents that are only a few years old can be rendered unreadable.

Data stored magnetically on tapes and disks is susceptible to physical deterioration of the media, or a refrigerator magnet can make the data irretrievable.

Optical media such as CDs are advertised as having lifetimes of 100 years or more, but their actual longevity is unknown since the technology is only 25 years old.

In the United States, the National Archives and Records Administration (NARA), located in College Park, Md. (www.archives.gov), is the official repository of U.S. government records. NARA is charged with archiving and preserving government documents including White House records and around 2 percent of all other federal records.

The Declaration of Independence, the Constitution and the Bill of Rights are all stored there in a temperature-controlled environment, bathed in inert argon gas to prevent oxidation and other forms of deterioration.

In the long hallways, researchers thumb through boxes and drawers of old records for census, diplomatic or military information.

But don't go there to search for contemporary records. Those are largely electronic and mostly reside in storage at the federal agencies where they were created or in temporary holding centers.

NARA is facing a crisis of necessity to preserve and catalog this digital data permanently before it "rots," or risk losing it and the history of our era forever.

It is not just a problem of history, but also one of science, an artifact of the digital information age.

The problem is multifaceted. Part of it is the nature of digital media that requires hardware, firmware and software to decipher. A 5.5-inch floppy disk stored with a 1985 version of DOS is useless without the appropriate equipment and coding needed to extract its information. It won't even make a good throw toy for the dog park!

On top of the limited life span of the electronic storage media, NARA also faces thousands of incompatible data formats that have been made obsolete by the computer industry.

NARA expects that within their purview is data stored by every software program that has ever been sold and some that were never made public because they were developed specifically for one agency. Estimates are that there are 16,000 different formats spread throughout the federal bureaucracy.

And you thought you had a problem because you couldn't open a Word document that was created in Windows 95.

A second facet of the problem is the overwhelming amount of data being generated.

Some have referred to the problem as the imminent tsunami of digital data that threatens to overwhelm not only NARA, but other government and private organizations as well.

The amount of information that NARA is expected to be responsible for is predicted to grow from a mere 5 petabytes (5 million gigabytes) at present to 350 petabytes in the next 15 years, a 70-fold increase.

The root problem is not unique to the government, but to anyone who needs to maintain records of reports, data and other information such as banks, health care institutions and universities. IBM estimates that humanity will generate more data in the next three years than in the preceding 1,000 years.

Records must ultimately be preserved in a form that is authentically verifiable, unalterable, impervious to hackers and terrorists, and retrievable by future systems, the nature of which can scarcely be imagined.

The problem has fiduciary as well as technical components. According to some estimates, a typical company's budget for data storage will have doubled from 11 percent to 22 percent in the decade from 1997 to 2007.

NARA has hired contractors to attempt the archiving of the massive amount of information and data that has been piling up. Congress has already authorized $100 million, and the 2006 federal budget authorizes another $56 million, but the total price tag for achieving NARA's mandate is unknown.

The agency plans to engage their system in stages over a five-year period, beginning in 2007. The scope is such that if completed as hoped, it will attain the status of earlier "crash programs" such as the Manhattan Project, the Apollo space program or the human genome project.

Other government agencies, private industries and universities in the United States and elsewhere are developing some of the technology behind the archival "genome project" in addition to the efforts of NARA.

Hopefully these systems will be compatible with one another.

Several companies have developed virtual storage technologies that automatically spread terabytes of data across many storage devices (a terabyte is 1,000 gigabytes).

Digital archivists in Australia have developed open-source software to convert records into a standard format that will be adaptable to different users' needs and be readable by future technologies, theoretically and hopefully.

NARA and the National Science Foundation (NSF) are funding research to extract data from old formats and use them in modern form. One example is a project that took data from Vietnam War airdrops of such things as Agent Orange and propaganda pamphlets and reformatted it to be compatible with commonly used geographical information systems known as GIS.

The digital tsunami is also creating a need for a new type of professional, one with a historian's eye, a computer scientist's knowledge and competence with storage technology, and a librarian's understanding of metadata.

One wonders how much of today's electronic records will survive and be readable in 250 years like the durable parchment, wisely chosen for the founding documents of our democracy, which require only a bath of argon gas to be preserved for the ages.

Richard Brill, professor of science at Honolulu Community College (home.honolulu.hawaii.edu/~rickb), teaches earth and physical science and investigates life and the universe. His column is published on the first and third Sundays of every month. E-mail questions and comments to rickb@hcc.hawaii.edu



| | |
E-mail to Business Desk

BACK TO TOP



© Honolulu Star-Bulletin -- https://archives.starbulletin.com

— ADVERTISEMENT —
— ADVERTISEMENTS —


— ADVERTISEMENTS —