The SURF Programme Digital Academic Repositories (DARE) is a joint initiative of Dutch universities to make their academic output digitally accessible. The KB (National Library of the Netherlands), the KNAW (Royal Netherlands Academy of Arts and Sciences) and the NWO (Netherlands Organisation for Scientific Research) also cooperate in this unique programme. DARE is being coordinated by the SURF Foundation [1]. The programme will run from January 2003 until December 2006.
DARE has several goals:
The DARE Programme has been given financial support by the government, through the NAP (National Action Plan electronic highway) fund of €2 million for the period 2003-2006. With this grant the Dutch government is giving a strong boost to innovation in the provision of academic information in the Netherlands.
The DARE programme DARE has brought together all (thirteen) Dutch Universities and three major academic institutions to create a network of digital repositories of Dutch academic output. The first year of DARE focused on implementing the basic infrastructure by setting up and linking the repositories. This resulted in the creation of a Dutch network of OAI (Open Archives Initiative) data providers. A demonstrator portal called DAREnet [2] has been set up to access this national network's academic output. With the DARE management at SURF as the beating OAI heart, all universities agreed on a few basics:
Other common agreements were:
The above represented the breadth of the ambition of the institutions' libraries as a whole and of the tasks that awaited them.
The DARE management created a platform on which knowledge was developed and shared with all DARE key people in the universities. This approach led to rapid results instead of (the illusion) of a complete theoretical framework in which all pros and cons were weighed. The breaking down of the OAI ambition into two levels, i.e. a data level and a service one, kept issues relatively manageable. SURF stimulates and supports projects on a data level and funds projects on a service level aiming at the delivery of content from the academic community.
Figure 1 shows the data and service model that repositories use internationally. It comprises a) a basic facility (the data level) and b) services (the service level). These services are developed from the basic facility and provide added value for specific end-users.
- The data level is where the infrastructure is set up and maintained. A scientific institution establishes a repository that stores academic output from that institution and keeps it available for use (or reuse) according to uniform international standards. This includes working papers/preprints, dissertations, research reports, datasets, conference reports, teaching material, graduate essays, multimedia material, etc. including the corresponding metadata.
- The service level: basic material from the data level can be used to develop services providing added value for scientists, students, universities, funding agencies and other interested parties. The possibilities are numerous, including developing current or new services (e.g. management information or the updating of résumés; services the institutions themselves wish to offer, either individually or in association with others (e.g. subject portals); or services provided by third parties such as publishers (e.g. e-journals). The supply and development of services can take place at a local level (e.g. individual homepages), at a national level (e.g. national academic output service, e.g. organised by document type or discipline) or even at an international one (e.g. virtual communities). The most appropriate level of co-operation can be decided upon depending on the situation in question. However, it is crucial that institutions retain control of their own information.
The advantage offered by the data-services model is that it also provides a guide to making sound decisions about the level and extent of services that an institution wishes to offer and the costs involved therein. The data level offers every institution a basic facility for the reliable, structured digital storage of its own intellectual property. This is an important factor in the digitisation of formal scientific communication. The DARE Programme ensures that the data level is set up as simply as possible. As a result, the basic facility can be offered free of unnecessary extras, at the lowest possible operational cost. This can then also better guarantee its financial sustainability in the long term.
All additions to functionality (and thus work and manpower) aimed at activities above and beyond the basic facility belong to the service level. This makes it possible to ascertain what the corresponding costs are for each service, for whom such services are intended, whether they are worth the effort, and how best to finance them. The DARE Programme stimulates the development of services in such a way that good use is made of available (open) technology for improving their effectiveness and efficiency.
An important aspect of the DARE programme is the combination of individual responsibilities with joint actions. Every university is responsible for its own repository, has its own motivation for its implementation and decides itself which services it wants to offer. Within the DARE community, participants work together for greatest effect, in order to complete the job, share knowledge and experience and to achieve interoperability.
There is no single prescribed standard repository solution, as long as the repository complies with the agreed formats. The rule is to do locally what can be done locally and restrict centralised activity to the bare essentials like creating a frame of reference, setting preconditions where and when necessary and working together on shared issues. The DARE Programme stimulates the development of different approaches in concrete projects. This way the participants learn through practical experience, a pragmatic way of pioneering at both data and service levels.
It was considered an a priori requirement that public and other data of sufficient importance stored in the repositories would be automatically preserved for long-term future use. To that end the Koninklijke Bibliotheek (The Royal Library) has been involved from the very start of the project. The Royal Library has already set up a system for the long time preservation of data, which will also be used for the data in the DARE repositories. This is therefore one less task for the local repository managers, but one to be dealt with centrally by the Royal Library.
DAREnet is a demonstration Web site providing basic information services. It has been developed to check and demonstrate the interoperability and the potential such a network of institutional repositories can offer scholarly communication. It has not been set up as a permanent national harvesting service, but merely as a demonstrator.
DAREnet uses i-Tor [3] for harvesting the metadata of all repositories and for providing services to end-users (service level in Figure 1). Some of the DARE participants used i-Tor as a data provider for their institutional repository (the data level in Figure 1, see next section).
For end-users, the standard services include the browsing of the repositories and searching the metadata (on specific or all fields). Full-text searching of the digital objects (referenced in the metadata) is also a standard feature. For administrators with the appropriate permissions it is possible to edit the Web pages and to manage the harvesting of the various repositories.
The i-Tor tool was selected to implement DAREnet because its goal fits the goals of DARE: providing open access while retaining data at the source. It uses open standards and is an open source product, providing a Web content management system. It features the collaboration and coupling of existing data sources, including OAI repositories. It was this feature in particular which was used especially intensively in DAREnet.
Whilst DAREnet was being implemented, i-Tor features for handling Open Archives were also under development. Although this meant occasional bugs, a distinct advantage was that missing features identified by DARE participants during their implementation became part of the I-Tor development process, resulting in a more tailor-made product for DARE. The fact that the software developers were a division of one of the DARE partners also represented another advantage.
The ability to manage the site remotely, not needing technicians, turned out to be a very useful feature of i-Tor. As a consequence, changes could be made swiftly. The DAREnet site was implemented in both Dutch and English. Multi-lingual interfaces are a matter of concern in most systems. As regards Web interface, i-Tor handles matters reasonably well by providing tools to maintain versions of pages for various languages. With regard to harvesting, however, separate harvesting has to be done for both the Dutch and English version and this unfortunately leads to unnecessary additional work.
At the start of the DARE Programme, a specification document 'Specifications for a Networked Repository for Dutch Universities' [4] served as our compass for the DARE architecture. This document describes DARE as a network of local repositories, with local policies, local repository software, linked together via OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting).
Starting positions for the DARE partners were very different. Some institutions already had a repository in place, some had a repository but it was not OAI-compliant or which in some other way failed to satisfy requirements. Others started from scratch.
Those who had to start from the beginning found the choice of software problematic. It was hard to find up-to-date comparisons of software. One of the universities had a lot of problems with the available software. The lack of good installation manuals and with software which had not been tested on various platforms made it impossible to implement their own repository on time. However by using the repository of another DARE partner, on a temporary basis, they were able to join the effort.
Some institutions used their existing systems and made them OAI-compliant. Another university had to make a workaround (with the help of one of the other DARE partners being their data provider) because of problems with their existing system (DigiTool). For this library, difficulties with crosswalks, XML files and Dublin Core made it necessary to switch to DSpace. Transferring data from old databases, which had been set up for other purposes, to the new repository proved to be a considerable challenge for some.
Three universities were involved in the ARNO-project (Academic Research in the Netherlands Online). This project, also funded by the SURF Foundation, resulted in an OAI-compliant institutional document server that could function as a repository. These universities therefore had their repositories up and running, despite the problems caused by a new release. Another institution already had a lot of experience with their DSpace repository.
Five different systems are in use at present:
Each element in the repository is described by a set of metadata. The original specification document for DARE states that the OAI-PMH standard should be used within the DARE Programme. This affected the standard for metadata. The OAI protocol requires that repositories offer the 15 metadata elements employed in simple Dublin Core (DC).
The application of standards always leaves room for local choices and implementations and the OAI world is hardly an exception to this rule. The OAI standard (OAI PMH 2.0) and metadata formats (oai_dc, qdc) led to a short but intensive study of the DARE use of Dublin Core metadata, which, in turn led to a 'version 1.0' that was acceptable for the DARE community to begin with. This 'DARE use of Dublin Core metadata, version 1.0' [6] benefited a great deal from the best practice observed at UKOLN, Bath, with E-Prints. In this document, which was endorsed by all DARE partners in October 2003, DARE chose to use simple DC as the mandatory metadata set (because of OAI-PMH), and DARE-qualified DC (dare-qdc) as the optional metadata set within DARE. This was because the librarians soon concluded that simple Dublin Core was too simple and limited for DARE and its institutions' needs. At time of writing, all DARE partners use simple DC for data exchange.
The project managers have spent quite some time discussing where to draw the line between the data and service levels. The basic question here was whether to first define services and then determine the use of metadata or vice versa. Some people argued that experience of Z39.50 showed that 'garbage in' produces 'garbage out'. It is difficult to imagine all possible services beforehand and therefore to think of all of the metadata needed in advance. Hence, you either get caught up in endless discussions on services or those on metadata. DARE chose the pragmatic solution to work with simple DC and leave more elaborate metadata exchange until such time as it becomes necessary.
We also discovered that discussing metadata within an OAI context is in fact a discussion of data mapping and data exchange. Each DARE partner has developed some form of mapping from internal systems and richer metadata to the simple DC. Librarians tend to see DC as a set of cataloguing rules, which it is not, it is for data exchange. In a networked, co-operative setting like DARE, one needs some guidelines that answer the question what content should go into which DC element. So some cataloguing rules can be applied to make sure the same language is spoken. Dublin Core, as is, is not specific enough for that purpose.
Once the DARE metadata guidelines were defined, we decided to freeze them for a period of six months. During the following months (October 2003 - March 2004) we kept track of issues, questions, problems and experiences with mapping, harvesting and using the metadata.
Some issues that arose were:During the period when we were building the DARE demonstrator we were obliged to create additional guidelines for metadata in respect of the following elements:
Within the context of this article it would be inappropriate to explain all the details behind these issues. Some were solved and some were not. The biggest remaining issue is the identifier/jump-off page issue. DARE set up an international OAI expert meeting on this subject in May 2004.
Looking back on the implementation period, the DARE community is proud of having achieved its first major milestones: A summary of the evaluation by the programme participants of all institutions involved is given below.
DARE as a national initiative, not only made all universities aware of the necessity of an institutional repository but also made it a reality for all universities to set up and operate an institutional repository. With the commitment from all parties and the central co-ordination by SURF, an operational network of repositories now exists, demonstrated by DAREnet. Consequent upon these achievements, a number of new services can now be developed based on this Dutch repository infrastructure.
DARE has given a Dutch boost to the dissemination and use of OAI technology. A simple standard as defined within the Open Archives Initiative has led to rapid consensus on the way to go about retrieving and exchanging data. In addition, the fear that, on presentation of the programme at the end of the project, there would be little to show unless all repositories were live, served as a considerable stimulus to activity. The interoperability of information and the profiling of the different organisations have also proved to be critical success factors.
Significant results have been achieved by a group of diverse people and institutions. Although the different cultures and approaches were apparent and sometimes represented something of a challenge, they never stood in the way of the common good of the project. Not all institutions started at the same time. Those who followed in a later phase had less time to set up servers, install the software and get content ready for the repository.
Copyright issues need a lot of organisation and will be a determining factor for a lot of academics whether to join the 'DARE movement'. However, this issue, as well as that of filling the repositories with content, will be the major focus for the remainder of 2004 and beyond. It is therefore important to develop a strategy to engage scientists' and managers' commitment to contributing content to the repositories. You could say that the technical implementation was the 'easiest' part; the institutional and organisational embedding will undoubtedly embody the challenge for the coming period.
Without a doubt, everyone has experienced the added value of working together. This added value was felt in both practical and technical terms. Indeed it turned out that partners frequently provided each other moral support as much as inspiration. Not all institutions started implementing a repository at the same time. Those who followed in a later phase were able to profit from the experience of others. Some confessed it would not have been possible to implement their repository at such short notice without the help of DARE partners. Those with an existing repository on the other hand were inspired and stimulated to enhance the content and management of their operational repositories. The central co-ordination and sharing of knowledge certainly represented a critical contribution to the results achieved.
The community proved to be a lively one, although decision-making on technical details was sometimes perceived to be too slow. To communicate and exchange information and knowledge online, in a private and secure area, an extranet was set up (making use of CommunityZero [7]). However, this extranet needed a bit of getting used to and the software could perhaps have been more user-friendly. Nevertheless, it has certainly helped to generate a feeling of community and to the exchange of information and experience.
Due to sound project management, local initiatives and the central approach have been combined to create a very good working environment. Some participants admit that having been urged to have the repository up and running has provided them with a basis for the further development that they needed.
Presentations by the DARE programme manager to different stakeholders have also had a positive effect. The unbounded enthusiasm from the programme management at SURF represented very tangible support as did good and timely information and communication.
The different institutions developed varying ways of dealing with the organisation of their institutional repositories.
Here are some examples:
One university set up a new unit that supports scientists with electronic publishing. This unit is also responsible for the IR (Institutional Repository). Another university set up a steering committee and instated project managers for the implementation process, very similar to the national DARE organisation of steering committee and project managers. These people were also responsible for communication within the university to get the DARE notion across. Commitment from a number of important users provided the impetus for the creation of a local repository at another university.
Although some universities were not quite ready for the implementation of a repository, the DARE initiative gave them a start: For some universities the lack of human resources was the main reason why they have only implemented the repository on a very small scale meaning, that they have only concentrated on the technical aspects of the IR and on making content more easily available.
Fragmentation within one of the institutions made it impossible for all institutional sub-groups to participate in the DARE Programme. One group was committed to making a mini-repository available but participants all had different ideas on how to go about this, especially concerning the issue of how to make the digital objects available. Much persuasion was needed to convince the participants. However, having seen DAREnet, they do now realise the potential of DARE and the benefits of sharing scientific information.
As for the submission of content, different organisational approaches have become apparent:
Most institutions use decentralised ways of collecting content. The national Metis research information system [8] is being used in several places for (decentralised) submission. However, such submission is often only done once a year, for the production of annual reports, and this does delay the availability of the digital object. One university library has started a project to integrate its library catalogue, documentation database and the repository in order to make the submission process more efficient and to improve on the quality of the metadata.
All DARE participants started off with readily available content from one or more faculties or departments. Most universities have dissertations and research reports available. Today, the content mainly consists of text documents, but some photographs and videos are also included. Although the repositories can contain digital objects of any kind, this is yet to be the case in practice.
An important success of the DARE Programme is combining individual responsibilities with joint actions. Each university is responsible for its own repository, has its own motivation for its introduction and decides itself which services it wants to offer. It was important for the DARE participants to know that they could hold onto their own ways of working, but also to learn of other ways of working. By using each others' knowledge and expertise, the community was able to achieve interoperability.
The DARE Programme encourages partners to try out different approaches in concrete projects. In this way the participants learned in a practical, pragmatic and pioneering manner at both data and services level.
As soon as a repository was harvestable, testing could begin. It is important to make sufficient time for the testing phase. This involves several cycles of adjustment and retesting. One reason for this is that with such a large number of diverse parties involved, several cycles are needed before the repositories deliver exactly what the harvester needs. In the DARE case, several problems arose due to the incorrect use, (mostly misinterpretation), of the Dublin Core standard. In other cases problems were caused by lack of specification (like which character encoding to use, e.g. OAI-PMH requires UTF-8 but one repository contained non-UTF-8 characters in its metadata). Another recommendation would be to set strict specifications with which each party has to comply, and provide examples. The repositories may revert to creating temporary databases, as was done by some of the DAREnet participants, just to be able to deliver the right format for harvesting purposes.
Every time repositories changed their output format, the time-consuming process of harvesting had to be performed again. New software releases of i-Tor (bug fixes or new features regarding harvesting) also occasionally made re-harvesting of all repositories necessary. The main bottleneck that made harvesting so time-consuming was the network connection between the DAREnet harvester and the repositories. This was a matter of trial and error and finally we succeeded in linking all repositories to DAREnet.
Another issue with harvesting turned out to be local resolving mechanisms, like the 'jump-off pages' often used in Open Archives. The metadata contains a link to the digital object, but this is not always a link to the object itself but a (jump-off) page on the site of the institution. That page in turn contains a link to the digital object. The jump-off page, however, is a more stable url to link to. When, as is the case with DAREnet, the digital object is needed for the purpose of full text searching, you have a problem with linking to a jump-off page. This problem still exists and cannot be solved easily.
The original assumption that the required technology already existed proved to be true. Several technical problems encountered during this project point to the importance of good hardware, backups and software as well as measures regarding support and maintenance.
OAI-PMH is not a set of cataloguing rules. It is a lingua franca for data exchange between partners who have agreed to use simple DC. However, OAI-PMH allows for data exchange based on various other formats. This can be any metadata set. OAI-PMH does not protocolise the exchange of the digital object file itself. An additional protocol is needed here as in most cases service providers also want to have the object and not just the metadata. One solution might be to use MPEG21-DIDL (as described by Herbert van de Sompel in his article in D-Lib Magazine of February 2004, 'Using MPEG-21 DIP and NISO OpenURL for the Dynamic Dissemination of Complex Digital Objects in the Los Alamos National Laboratory Digital Library' [9]. However, so far his solution has only been implemented at Los Alamos.)
OAI-PMH is simple enough to get you started. From a librarian's perspective it might be too simple. From a user's perspective (scientific author, student, professor), it might be elaborate enough. At present we have not yet decided whether we need additional qualifiers in DC. This question is still under discussion and will have to be settled based on experience with actual service development.
The unique identification of metadata, the digital object and its creator has proved to be necessary and useful. DARE is therefore working on a so-called 'digital author identification'.
The libraries' involvement in the DARE experience has shown a movement within traditional library issues of (online) access (i.e. metadata and infrastructure) and (digital) preservation towards actively publishing and the possibility of the multiple use of metadata and digital objects.
Furthermore, libraries and scientists have found themselves in new, more interactive roles between one another. DARE fundamentally improves open access to the university's 'hidden' or 'deep Web' in which scientific publications were at best downloadable but less easy to find. This shift in opportunity, potential and competence is only just being explored and is currently being translated into services that meet the needs of scientists as well as those of the general public.
DARE's focus has been on this process in this its second year, i.e. 2004. This also implies an emphasis on the more efficient reuse of metadata and digital objects for the whole process of its creation, access and long-term digital preservation - and using today's solutions to link the separate systems that manage these sub-processes. As far as the metadata is concerned, further development in DARE's use of metadata is needed.
Communication and marketing are also important issues for the next year(s) in order to inform academics of the potential of repositories for academic communication and publishing. The involvement of the scholarly community in the further submission of content and in service development will therefore be a major area of effort.
Certain scientific communities already have long-standing experience of sharing their publications in OAI environments, such as Los Alamos. One could state that in these early adopter environments the need for quick access to different versions of a paper was an impetus for the use of the OAI protocol. It served a specific need: the rapid dissemination of information and knowledge and feedback which is often only relevant for a short period of time - a process defined as 'science in the making' by Bruno Latour [10]. Latour makes a distinction between 'science in action' - the process of 'science becoming fact' and 'science as fact'.
One could stress that the new challenge is to create environments, based on the OAI protocol, to gain access to science as it has been established to be a fact. In other words: the pre-print phase and demand for quick access serves the 'science in the making' aspect, which has already manifested itself in the early OAI-adopter environments, serving mainly scientists. The task now is to give the public open access to the static science corpus we now know to be 'scientific establishment', in environments which (re)use data in OAI data providers.
Many thanks to all DARE project managers involved in the DARE Programme who have given their input to this article. Also a note of special thanks to Marlon Domingus of Leiden University and Rob Maijers of www.maijers.com.
Ariadne is published every three months by UKOLN. UKOLN is funded by MLA the Museums, Libraries and Archives Council, the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based. Material referred to on this page is copyright Ariadne (University of Bath) and original authors.