Saturday, 13 November 2010

Oxford Research Data Management Pages

The University of Oxford has launched the Research Data Management Website. This thematic site has been developed by Research Services in collaboration with OUCS and OULS as part of the EIDCSR Project.

The RDM website is designed to support researchers with their research data management activities and includes information about:

  • research funder requirements in the area of research data management
  • services available within the University to assist researchers in this area
  • guidance on how to produce a data management plan as part of a funding application
  • further sources of advice and online guidance, updates and news, and tools and training available to help.

Previously, web-based information about research data management was available from a number of sites across the University but it was felt that a single source of `signposting’ information would be a valuable resource for researchers from all subject disciplines at differing stages of the research cycle, increasing understanding of the benefits of improved research data management, as well as communicating the range of services available.

Monday, 27 September 2010

Databases in Quantum Chemistry

Last week I attended the workshop "Databases in Quantum Chemistry: Validation of methods and software, and repositories of reference computational results" taking place at the ZCAM in Zaragoza.

"The workshop is devoted to address the issues related to databasing in Quantum Chemistry. A number of international experts has been invited to discuss the more relevant points in a flexible set up, with the objective of reaching a consensus view about the degree of necessity of organized repositories of high-level quantum chemical data, as well as the technical problems associated to their design, construction and maintenance"
The workshop started with talks dealing with the needs in quantum chemistry. Although the disussions seemed to indicate that these were many and diverse, there was a general agreement that calculations at the time of publication could benefit from a space to be shared openly.

The following talks described current initiatives and experiences. My presentation "Implementing data repository services: issues and lessons learned from case studies" aimed to share some of the experiences from projects like EIDCSR or Sudamih.

Peter Murray-Rust has nicely described the workshop on a blog post. The main outcome is the set-up of the Quixote Project and the aim of having a prototype repository in one month. Very exciting!

Tuesday, 13 July 2010

Open Repositories 2010 in Madrid

This years the Open Repository Conference 2010 was held in Madrid organised by the he Spanish Foundation for Science and Technology FECYT and UNED, a Spanish public university that provides distance education.

Many of the talks discussed issues around research data and digital repositories. In the initial keynote, Prof. David de Roure emphasized the importance of capturing the research data but also the methods behind the data. In the future repositories will have a role in managing knowledge packs made of data, metadata, workflows, articles, presentations, results, etc.

The conference had a strong pressence from activities using the eSciDoc repository system based on Fedora. The BW eLab project uses this infrastructure to provide access to remote laboratory instruments as well as to manage the experimental data generated in the labs. During the workflow process eSync Daemon is used to monitor the file system of the computer connected to the instruments . The daemon replicates the new files and sends to a deposit where metadata is extracted to them deposit data and metadata in eSciDoc. A similar synchronization is used in the BRIL Project to monitor researcher's own desktop to capture as much data and metadata as possible.

Another interesting talk presented an open source repository for medical scientific research known as MIDAS. The system is used for the Insight Journal which provides open-access to articles, data, code, and reviews with an archive which hosts public collections of image datasets such as MRIs.

Other repository frameworks included Hydra, a collaboration between the Universities of Hull, Stanford and Virginia, that uses a technical architecture based on Fedora with a toolkit of reusable components that can assist with a range of content management, access and preservation. The University of Hull IR provides a Hydra use case.

Microsoft announced the release of v2.0 of their repository platform Zentity which makes use of the Open Data Protocol and uses Pivot for visualising and organising the data (see this example of pivot in action). The installation support services such as OAI-ORE and SWORD.

In the national approaches session the results of the Australian institutional research repository data readiness surveys 2010 were presented. Although repository managers are aware of ANDS and its services, there is little use of them and less than half of respondents were planning to incoorporate data in their repositories.

This has truly been a rewarding and stimulating conference.

Friday, 28 May 2010

The DCC´s Data Management Planning Tool

The Digital Curation Centre has developed a web-based data management planning tool to assist with the preparation of basic Data Management Plans (DMP) at the funding application stage as well as to help building and maintain a more detailed DMP during the project's lifetime.

Back in July 2009, the EIDCSR project responded to the proposed DCC Data Management Plan Content Checklist. This test version of the DMP tool seems to have taken into account the comments made:
  • The objective of the tool i.e. assisting with the production and maintenance of DMPs is clear and pertinent.
  • The plans can be exported into PDF and HTML so they can easily be included in funding applications, websites, etc. Moreover, the plans incorporate the DMP Online logo showing that the tool has been used which should show the evaluators that the creators have taken the time and interest to use this tool.
  • The plans can be easily edited and adjusted as required if circumstances change. This makes the DMPs a living document helping to ensure its usefulness throughout the lifecycle of the project.
Some other aspects are still unclear or could be enhanced:
  • In terms of encouraging researchers to use the tool, is there any effort towards convincing RCUKs to recommend their bidders using it?
  • It is still unclear whether the DMP team provides support for using the tool only or they can also help with the preparation of the DMPS. In cases where there data centres are in place, some might already provide this support and therefore this could be included in the guidance element of the tool.
  • Some of the information collected in the DMPs can be of great help to later on the lifecycle document the datasets that will be produced. Hence it would be convenient if these data could be exported into more reusable formats.
  • Creating a data management plan from scratch can be an arduous task that could be eased off by providing examples of plans in particular areas that can help guiding and inspiring those creating new ones.
  • In some cases researchers will want to create a DMP without necessarily having, or planning to have, funding from one of the research councils in the UK. This does not seem to be possible with this tool at the moment. A generic DMP that is not specific to any funding agency could be extremely useful.

Overall, this test version of the DCC´s Data Management Planning tool is shaping up nicely and there is a clear need for it. Bringing together the RCUK statements on data management, the DCC´s generic DMP clauses and guidance from a variety of reputable sources can help researchers immensely.

Thursday, 27 May 2010

Digital Curation Centre Workshop at Oxford on the 16th June, 2010 – How to Manage Research Data

I am very pleased to announce that the Digital Curation Centre will be paying a visit to Oxford on the 16th June to present a workshop on managing research data. The workshop is aimed primarily at researchers interested in bidding for funding for projects with a data output, although it should also appeal to those who assist and support research activities and who would like to find out more about the challenges of data curation.

Although the workshop will obviously be of relevance to those interested in either the Sudamih or EIDCSR projects, it will not focus exclusively on a particular academic discipline but should be useful across the board. Sessions will include: the roles and responsibilities associated with conceptualising, creating and managing research data during the life of a project; the responsibilities associated with the longer-term management of research data after a project has ended; developing a data management plan; and preparing data for long-term curation and re-use.

The workshop is free for members of the University of Oxford, £50 for non-members.

Anyone interested in attending the workshop should register at

Wednesday, 19 May 2010

Data management and curation cost modelling

The final report of the Keeping Research Data Safe 2 (KRDS2) project has now been published delivering a survey of data preservation costs, an enhanced curation activity model, four in-depth case studies and a benefits framework .

Oxford, and in particular the research groups participating in EIDCSR, participated as one of the case studies. For this exercise cost information was gathered on activities related to generation of data, local data management as well as the curatorial activities undertaken as part of EIDCSR such as metadata management and long-term archiving.

It is hard to make any inferences from these costs as they represent a snapshoot in time of one particular research project. Nonetheless, the Oxford costs information revealed that:

  • generating research data can be extremely expensive,
  • local data management may be modestly resourced in comparison with the value of the data,
  • start-up curation services, i.e. curation services in the process of development, can also be expensive,
  • the cost of established data management services, such as the long-term filestore, can be be rather low in comparison to those services in the process of development.
The report contains more detailed information about the Oxford case study as well as the others including the UK Data Archive, the Archaeology Data Service and the National Digital Archive of Datasets.

Friday, 7 May 2010

A new interesting project: Data Management for Bio-Imaging

A new data management project funded by JISC known as Data Management for Bio-Imaging has just created a wiki that will contain relevant information about the project.

The aim of the project is to generate better understanding and planning of data management for bio-imaging within the John Innes Centre

The project plans to document the data flows and infrastructure in the Coen Lab and the JISC Bio-Imaging service. In both cases they use sophisticated instruments such as light microscopy, CCD systems and confocal microscopy generating terabytes of imaging data.

To address their data management needs they are deploying an Open Microscopy Environment known as OMERO which features like:

- Managing and organizing
- Search&Browsing
- 3D Projection
- Metadata, annotation, tagging
- Share, Export, Import

In addition to this, they will train users, including post-docs, to use the system as well as defining strategies to handle user acceptance and encourage image processing.

This is an extremely interesting activity and we´ll surely keep a close eye.