A few examples of Data Management based on Quality of Service and Data Lifecycle Management:
- The User can specify the number of replicas and the QoS associated for each of them, i.e. one on fast storage (disks on SSDs) and two on tape in three different locations. The system should be able to automatically maintain in time that policy verified.
- The User can specify that certain datasets always have a mirror, checking the replicas status in real time or quasi-real time.
- The user can specify that a number of replicas are created and they have to be accessed with different protocols, i.e. http, xrootd, srm)
- The user can specify movements between QoS and/or changes in access controls based on data age (i.e quarantine periods, move to Tape old data)
For example, move unused data from fast storage systems (disks) to “glacier-like” locations (sites providing tape). As a complementary functionality, a smart engine should infer when data are becoming “hot” again and move them back to the fast storage. Note: this functionality should be available at the infrastructure level, based on an inter-sites data movement, not only as an intra-site data placement.
When ingested by the infrastructure the user can specify tasks and workflow to be executed on data before being stored.
The system should be able to identify computing resources to perform the requested actions. The feature should be available at the infrastructure level, in a form that is pluggable with virtually any user-based application/algorithm. The user community should take care of the application that will be executed.
Examples:
- experiment-independent quality checks before storing data
- data skimming
- metadata extraction
- indexing