The continuing revolutionary growth of data volumes and the increasing diversity of data-intensive applications demands an urgent investigation of effective means for efficient storage management. In the summer of 2012, the volume of data in the world was around 10 to the power of 21 bytes, about 1.1TB per internet user, and this volume continues to increase at about 50% Compound Annual Growth Rate. It has been said that “By 2013, storage systems will no longer be manually tunable for performance or manual data placement. Similar to virtual memory management, the storage array’s algorithms will determine data placement (The Future of Storage Management, Gartner 2010). Meeting service-level objective/agreement (SLO/SLA) requirements for data-intensive applications is not straightforward and will become increasingly more challenging. In particular, there is an increasing need for intelligent mechanisms to manage the underlying architectures’ infrastructure, taking into account the advent of new device technologies.
To cope with this challenge, we propose a research program in the mainstream of EPSRC’s theme “Towards an intelligent information infrastructure (TI3)”, specifically with reference to the “deluge of data” and the exploration of “emerging technologies for low power, high speed, high density, low cost memory and storage solutions”. Today, with the widespread distribution of storage, for example in cloud storage solutions, it is difficult for an infrastructure provider to decide where data resides, on what type of device, co-located with what other data owned by which other (maybe competing) user, and even in what country. The need to meet energy-consumption targets compounds this problem. These decisional problems motivate the present research proposal, which aims at developing new model-based techniques and algorithms to facilitate the effective administration of data-intensive applications and their underlying storage device infrastructure.
We propose to develop techniques and tools for the quantitative analysis and optimisation of multi-tiered data storage systems. The primary objective is to develop novel modelling approaches to define and facilitate the most appropriate data placement and data migration strategies. These strategies share the common aim of placing data on the most effective target device in a tiered storage architecture. In the proposed research, the allocation algorithm will be able to decide the placement strategy and trigger data migrations to optimize an appropriate utility function. Our research will also take into account the likely quantitative impact of evolving storage and energy-efficiency technologies, by developing suitable models of these and integrating them into our tier-allocation methodologies. In essence, our models will be specialised for different storage and power technologies (e.g. fossil fuel, solar, wind).
The models, optimisers and methodologies that we produce will be tested in pilot implementations on our in-house cloud (already purchased); on Amazon EC2 resources; and finally in an industrial, controlled production environment as part of our collaboration with NetApp. This will provide feedback to enable us to refine, enhance and extend our techniques, and hence to further improve the utility of the biggest of storage systems.