The current computing paradigm is shifting with huge, growing data on the web. This raises questions about limitations of traditional database technologies in big data computing such as scalable performance and flexible schema. In the past several years, the â€œNoSQLâ€ community emerges and rapidly expands, targeting to solve various large-scale data storage, analysis and retrieval tasks in a non-relational database environment. This NoSQL movement has led to the development of a variety of NoSQL databases, some of which were proposed by influential Web 2.0 companies such as Amazon and Google. In general, NoSQL databases can be categorized into: 1) key-value data stores such as MemcacheDB, Redis and Voldemort; 2) document-oriented data stores such as MongoDB, CouchDB and Riak; 3) column-oriented data stores such as Apacheâ€™s Hbase, Cassandra and Googleâ€™s Bigtable; 4) graph databases such as Neo4j, VertexDB and AllegroGraph.
The goal of this project is to investigate design principles of NoSQL databases, and the related issues on data modelling and query answering.
Students should have an understanding of relational databases and some programming experience.
(1) Bigtable: a distributed storage system for structured data, Fay Chang, Jeffrey Dean & et al, OSDIâ€™06. (2) Dynamo: Amazons Highly Available Key-value Store, SOSP 2007. (3) Will NoSQL databases live up to their promise? Neal Leavitt, Computer, 43(2):12â€“14, 2010. (4) The MongoDb website: http://mongodb.org, including source code and documentation.