Research data on the cloud!? Here comes Amazon Glacier
Amazon Glacier is a low-cost storage service that aims to provide durable storage for data archiving and backup. In order to keep costs low, Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. Amounts of data can be stored in Amazon Clacier for as little as $0.01 per gigabyte per month, but what is more important… Amazon Glacier has introduced the “Research Data” as part of their customer-scenarios. This is really interesting because Business is sensing the need to store Research Data and as we have seen many times in the Internet history, science evolves around the needs of business.
Here is what they write:
Research and scientific organizations, such as pharmaceutical and bio-tech companies, as well as universities and research institutes, have large data archiving needs. An example use-case is drug development, where a substantial amount of data is generated and must be retained so researchers can verify experimental drug test results. Traditionally, this data has been stored on inflexible tape-based storage systems with copies stored in multiple sites and often with a copy vaulted offsite as well. Amazon Glacier reduces the cost of storing these data sets by eliminating the operational overhead involved in managing hardware and data centers. The service automatically stores redundant data in multiple facilities and on multiple devices within each facility and is built to be automatically self-healing, performing regular, systematic data integrity checks and using redundant data to perform automatic repairs if errors are discovered.
Digital preservationists in organizations such as libraries, historical societies, non-profit organizations and governments are increasing their efforts to preserve valuable but aging digital content such as websites, software source code, video games, user-generated content and other digital artifacts that are no longer readily available. These archive volumes may start small, but can grow to petabytes over time. Amazon Glacier makes highly durable, cost-effective storage accessible for data volumes of any size. This contrasts with traditional data archiving solutions that require large scale and accurate capacity planning in order to be cost-effective.
By directing their attention to Research Data and Digital Preservation, we can predict that techniques and tools that support the storing activity in the cloud will be part of marketing strategies that will influence scientiest decision. I have no information is storage is a concern (barrier) for Scientists in their preservation activity, but if this ever was a barrier, Amazon is showing that we should no longer be concerned. Storage is cheap and securable. All we need is formalization and good models.