Good Practices – Internationale Lösungen Archivierungssysteme und Vertrauenswürdigkeit Dr. Uwe Borghoff, Universität der Bundeswehr München, Institute for Software Technology Preserving Born-Digital Public Records at The National Archives Adrian Brown, Head of Digital Preservation, The National Archives, UK Das deutsche Projekt kopal - Kooperativer Aufbau eines Langzeitarchivs digitaler Informationen Tobias Steinke, Projektleiter kopal der Deutschen Nationalbibliothek
Archivierungssysteme und Vertrauenswürdigkeit Dr. Uwe Borghoff, Universität der Bundeswehr München, Institut für Softwaretechnologie
Wien, 18. April 2007 Folie 3 Decision Process
Wien, 18. April 2007 Folie 4 Developing the Criteria Catalog
Wien, 18. April 2007 Folie 5 Criteria Catalog General Attributes Overall system architecture –Design principles, compliance with standards or recommendations (e.g, OAIS, OAI etc.) Explicit long-term features –E.g., file format registry, preservation meta-data scheme Object organization –E.g., single object, collections, identification Metadata organization + Rights / Role management –Consumer / producer / archive operator Functions –Ingest / access / archival storage / administration
Wien, 18. April 2007 Folie 6 Criteria Catalog General Attributes (cont’d) System / application integration –Library system / publishing system / other archives –federation / cooperation / user communities Software architecture Hardware basis
Wien, 18. April 2007 Folie 7 Criteria Catalog Functional Attributes (ingest) Accepted submission formats Object format / identification –E.g., file format restrictions Object organization –E.g., hierarchies, links, versions, variants Access procedures for producers –meta-data scheme incl. meta-data entry procedure Batch ingest / conversion / (formal) quality checking / dedicated workflow For meta-data: manually / automatic extraction / 3rd party –Overall throughput
Wien, 18. April 2007 Folie 8 DigiTool-Workflow at the Bayerische Staatsbibliothek
Wien, 18. April 2007 Folie 9 Criteria Catalog Functional Attributes (access) Access procedure for consumer –Remote vs. local / multilingual / help system / notification services / communication protocols Search / retrieval –Metadata indexes / navigation / full text search / inspection of class methods Dissemination form of objects / metadata –Conversion on the fly / on demand Accounting, e.g. as part of a Digital Rights Management Federation –access or replication transparency Interoperation
Wien, 18. April 2007 Folie 10 Criteria Catalog Functional Attributes (storage) Physical storage –media / interfaces / abstraction Limits –e.g., number / size of objects (or relations) Conceptional organization of objects and metadata –Object format (file format) / object identification Versions (time lines) vs. Variants (manifestations) Relationships object – metadata –E.g., multiplicity (simultaneous support of various schemes) Mapping of conceptual organization to logical elements –E.g., files / database tables)
Wien, 18. April 2007 Folie 11 Criteria Catalog Functional Attributes (admin) Access procedures for administrators –Local / remote / special protection Administration of object and metadata –Deletion of collection / reorganization –Updates (for new elements) / controlled vocabulary Administration of user access –OAIS-roles like producer / consumer / admin / management Object related rights Administration of physical storage –E.g., allocation of storage for objects / collections / roles
Wien, 18. April 2007 Folie 12 Criteria Catalog Functional Attributes (admin cont’d ) Access to internal interfaces –E.g., to basic database schemes / storage system Configuration / scaling –E.g., scalability transparency Disaster management / Trustworthiness Backup / recovery –Redundancy / replication / fragmentation for availability Monitoring / reporting –Trouble ticket systems / error reports / statistics / metrics
Wien, 18. April 2007 Folie 13 Criteria Catalog Non-Functional Attributes Product Costs –Initial purchase / license / leasing / maintenance / updates –Training –Personal resources –Initial installation / operating End user support, e.g., hotline / newsletter / FAQ Long-term preservation, e.g., monitoring applied (embedded) technologies / media migration Quality w.r.t. manufacturer / product / support –Company structure / development status –Market penetration / user community
Wien, 18. April 2007 Folie 14 Persistent Identifier METS-like exportControlled vocabulary preservation metadata UVC or comparable LTA features DIASyes noplanned? MyCoReyesno (XML export) no DigiToolyes no DSpaceyesplannedno EPrintsyes no Comparison
Preserving Born-Digital Public Records at The National Archives Adrian Brown, Head of Digital Preservation at The National Archives, UK
Wien, 18. April 2007 Folie 16 The National Archives (UK)
Wien, 18. April 2007 Folie 17 The National Archives (UK) Both a government department and executive agency of the Secretary of State for Constitutional Affairs Established 2003, brings together: –Public Record Office (1838) –Historical Manuscripts Commission (1869) –Office of Public Sector Information (2005) –Her Majesty’s Stationery Office (1786) Based at Kew, London Employs 580 staff
Wien, 18. April 2007 Folie 18 Collection One of the largest archival collections in the world Unbroken span of records from 11 th century to present day 180 kilometres of paper records 250 TB of digitised and born-digital records 400 TB new transfers scheduled by 2009
Wien, 18. April 2007 Folie 19 Drivers for change eGovernment 2004/5 Modernising Government targets Freedom of Information New audiences and improving access Increased efficiencies
Wien, 18. April 2007 Folie 20 Drivers for change …to terabytes From kilometres…
Wien, 18. April 2007 Folie 21 Vision Lead and transform information management Guarantee the survival of today’s information for tomorrow Bring history to life for everyone
Wien, 18. April 2007 Folie 22 Developments at TNA National Digital Archive of Datasets (1996) PRONOM (2002) Digital Archive (2003) Web Archiving Programme (2003) Electronic Records Online (2005) Seamless Flow ( ) Shared Services (2007-?)
Wien, 18. April 2007 Folie 23 National Digital Archive of Datasets Established in 1996 Operated under contract by University of London Holds over 150 datasets dating back to
Wien, 18. April 2007 Folie 24 NDAD
Wien, 18. April 2007 Folie 25 PRONOM An online technical registry A resource for anyone requiring impartial and definitive information about the file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value A knowledge base to support automated preservation services
Wien, 18. April 2007 Folie 26 PRONOM 2002: First version released (internal) 2003: Made available on the Web 2005: Major new version released 2006: PRONOM Unique Identifier scheme launched 2007: New releases as part of Seamless Flow
Wien, 18. April 2007 Folie 27 PRONOM
Wien, 18. April 2007 Folie 28 Digital Archive Operational in April 2003 Secure (air-gapped) storage for born-digital public records Scalable storage to >1 PB Records stored in robotic tape libraries with secure off-site backup Metadata stored in Oracle database
Wien, 18. April 2007 Folie 29 Web Archiving Programme Selection Based on 6 core functions of government Frequency based on content analysis and topicality 11 sites collected weekly 53 sites collected biannually Flexible collection of additional sites Now expanding to whole of.gov.uk domain
Wien, 18. April 2007 Folie 30 Web Archiving Programme Contract with Internet Archive , and with European Archive from 2005 –Regular crawls Member of UK Web Archiving Consortium –Special crawls Supporting UK website rationalisation Developing UK web archiving strategy
Wien, 18. April 2007 Folie 31 Web Archiving Programme
Wien, 18. April 2007 Folie 32 Electronic Records Online
Wien, 18. April 2007 Folie 33 Seamless Flow Passive Preservation Appraisal & Selection Transfer Active Preservation Resource Discovery Delivery & Presentation
Wien, 18. April 2007 Folie 34 Seamless Flow Macro appraisal system for electronic records Online transfer systen New passive preservation system based on Digital Archive New active preservation system centred on PRONOM New version of Electronic Records Online
Wien, 18. April 2007 Folie 35 Titel einfügen
Wien, 18. April 2007 Folie 36 Shared Preservation Services Intermediate archive for central government Preservation of semi-current records for years Eliminates wasteful duplication of effort Passive preservation storage may be contracted- out Active preservation services provided by TNA Currently seeking initial funding for 5 years
Wien, 18. April 2007 Folie 37 Collaboration PLANETS InSPECT Preserv Global Digital Format Registry
Wien, 18. April 2007 Folie 38 The National Archives (UK) Thank you!
Das deutsche Projekt kopal - Kooperativer Aufbau eines Langzeit- archivs digitaler Informationen Tobias Steinke, Projektleiter kopal der Deutschen Nationalbibliothek
Wien, 18. April 2007 Folie 40 Übersicht Motivation Organisation Technische Lösung Stand und Ausblick
Wien, 18. April 2007 Folie 41 kopal: Motivation Neues Gesetz über die Deutsche Nationalbibliothek –Pflichtabgabe und Sammelauftrag auch für Netzpublikationen –Bisher bereits digitale Publikationen auf Datenträgern (z. B. CD- ROM) –Deutsches Musikarchiv: Sammlung von digitaler Musik –Seit Mitte 2006 Hoher Bedarf bei vielen deutschen Institutionen nach Langzeitspeicher für digitales Material Bereits vorhandenen Materialen: Elektronische Dissertationen (DissOnline), E-Journals, Digitalisate Zukünftig: E-Books, Webseiten
Wien, 18. April 2007 Folie 42 kopal: Organisation BMBF-gefördertes Projekt (2004 – 2007) Partner und Rollen: –Deutsche Nationalbibliothek: Gesamtprojektleitung, Nutzung, Softwareentwicklung (koLibRI) –Niedersächsische Staats- und Universitätsbibliothek, Göttingen: Nutzung, Softwareentwicklung (koLibRI) –IBM Deutschland: Softwareentwicklung (DIAS) –Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen (GWDG): Systemhosting
Wien, 18. April 2007 Folie 43 kopal: Systemübersicht GWDG (Göttingen) DIAS von IBM Konto 1 Konto 2 SUB Göttingen Deutsche Nationalbibliothek (Frankfurt) Lokale Software
Wien, 18. April 2007 Folie 44 kopal: Technische Lösung Kern basiert auf DIAS von IBM für die Niederländische Nationalbibliothek und auf Standardsoftware Folgt dem OAIS-Referenzmodell Mandantenfähigkeit: Logisch getrennte Speicherbereiche Open-Source-Software für lokale Anbindung: koLibRI (kopal Library for Retrieval and Ingest) Offenes Archivobjektformat mit speziellen Metadaten zur Langzeitarchivierung (Universelles Objektformat) Langzeitverfügbarkeit durch zukünftige Dateiformatmigration (Konvertierung und Versionsverwaltung) Datensicherung durch Dienstleister (GWDG)
Wien, 18. April 2007 Folie 45 kopal: Stand und Ausblick DIAS in Göttingen installiert und produktiv nutzbar koLibRI als Vorversion bereits frei verfügbar Projektende: Entwicklung von Nachnutzungsszenarien –Teilnehmer: Archivieren bei DNB oder SUB Göttingen –Mandant: Eigens Konto im Göttinger DIAS –Eigenbetrieb: Weiteres DIAS-System
Good Practices – Österreichische Aktivitäten Die digitale Zukunft im Österreichischen Staatsarchiv GD Dr. Lorenz Mikoletzky, Österreichisches Staatsarchiv Langzeitarchivierung elektronischer Dokumente – eine neue Herausforderung für Nationalbibliotheken GD Dr. Johanna Rachinger, Österreichische Nationalbibliothek