Friday, 23 October 2009

EIDCSR technical analysis: from soft to hard

After having conducted the EIDCSR audit and requirements analysis exercise, we have started converting the high level requirements gathered into technical requirements. The idea is to produce a systems design document for a Systems Developer to start with the implementation. Howard Noble, from Computing Services, is leading this exercise for the next two months.

To start with the technical analysis, Howard and I have had a very fruitful meeting this morning. We have brainstormed ideas for a high level system design trying to identify the practical things that can be done to support the data management workflows of the research groups taking part in EIDCSR.

Using a board to produce a "rich picture" recording the processes we have encountered and our thoughts was extremely useful. We will now produce a "cleaner" version of this picture and bring it forward to key people in the research groups in a workshop. This will hopefully helps us to communicate what the project aims to achieve as well as getting feedback on the design so that researchers requirements drive any development .

Thursday, 15 October 2009

First EIDCSR workshop and executive board meeting

Yesterday was a busy day for the EIDCSR Project.

In the morning, the first project event took place at Rewley House in Oxford with an exciting group of speakers brought together under the theme of "Data curation: from lab to reuse". Their presentations are now available on the project website and a report will be produced shortly.

The afternoon served to held the first EIDCSR Executive Board meeting where progress and next steps for the project
were discussed with the extraordinary helpful and encouraging members of the board.

Overall, a great day providing loads of food for thought.

Monday, 12 October 2009

"Science these days has basically turned into a data-management problem"

The New York Times has an article about future scientists' ability to manage the large amounts of digital data being generated and how the likes of IBM or Google are trying to help, "Training to Climb an Everest of Digital Data", IBM and Google are contributing tools, computational power and access to large-scale datasets. It was actually two years ago this month that Google and IBM announced their partnership to provide universities with dedicated cluster computing resources, open source software, a dedicated website for collaboration, and a Creative Commons-licensed curriculum. In April this year the NSF funded projects at 14 US universities to take advantage of the IBM/Google Cloud Computing University Initiative. The New York Times article highlights some of these projects. The emphasis is certainly on the massive -- big compute clusters, big datasets -- and on data analysis. Not much though on the ongoing management of, access to, and preservation of data, even if Professor Jimmy Lin (University of Maryland) is quoted as saying, “Science these days has basically turned into a data-management problem”.