Rogas: Building A Unified Platform for Data Analysis

Collaborators

Rogas is a platform for network analytics which integrates a collection of graph analysis tools and algorithms into a unified framework in order to support various network analysis tasks efficiently and effectively.

The Rogas platform has a relational core for storing network data. It can provide an integrated view of performed network analytics tasks, allow us to check the semantic integrity of network data, and help us understand how analysis queries interact with each other. It also supports comparative network analytics in a dynamical modelling environment. At its core, Rogas uses a query engine, called RG engine, to handle (possibly interactive) relational and graph queries of network analytics tasks. The RG engine is built upon the open-source database system PostgreSQL by extending the query engine of PostgreSQL with the query processing and optimization of graphs.

License: GPL-3.0 Code Repository: Rogas' Code in Github

Project task 1: Graph Query Optimization

 

Description

Network analysis queries are often computationally expensive. An efficient query optimizer is vital for efficiency of processing network analysis queries over graphs, relations, or a mix of them. The query engine in Rogas supports an extended SQL query language, called RG-SQL, which incorporates a number of primitive graph constructors and relational algebra operators in a unified manner.

The goal of this project is to determine the most efficient way to execute RG-SQL queries by considering the possible query plans. This requires:

  • building the statistical modelling for different query patterns
  • analysing how relational data and graph data are used in queries, in relating to their storage and processing models
  • rewriting queries based on the algebraic properties of graph constructors and relational algebra operators

Benefit for the Student

Gain a solid understanding of latest technologies and tools for graph analytics. Get hands on experience in developing a query engines for processing and optimizing queries over graphs.

Benefit for the Project

The development of an efficient query optimizer may improve the performance of network analysis queries. It will also provide a robust foundation for implementing graph analytics algorithms and tools within Rogas in the future.

Requirements

Strong skills in software development (Java, Python or C) and solid knowledge in database theory and implementation are required. Background knowledge in machine learning is also desired.

More information        

  1. S. Abiteboul, R. Hull and V. Vianu, "Foundations of databases". Addison-Wesley Reading, 1995.
  2. M. Liu, "Towards a Unified Framework for Network Analytics", Masters thesis, 2015 (for a copy of the thesis, please refer to QIng's website:http://users.cecs.anu.edu.au/~u5170295/projects.html).
  3. P. Zhao and J. Han, "On graph query optimization in large networks". Proceedings of the VLDB Endowment, 2010.
  4. S. Sakr, S. Elnikety and Y. He, "G-SPARQL: a hybrid engine for querying large attributed graphs". Proceedings of the 21st ACM international conference on Information and knowledge management, 2012.

Project task 2: Integrity Constraints of Graphs

 

Description

With more and more network analysis queries being performed from different perspectives, it becomes increasingly important to semantically align and mine their relationships. But how can we tell, given a number of network analysis queries, whether or not they are semantically relevant and consistent?

The goal of this project is to develop and implement practically useful integrity constraints over different types of graphs. This includes designing an automated verification approach that can detect inconsistencies of these graphs and determine their causes.

Benefit for the Student

Gain a solid understanding of latest technologies and tools for graph analytics. Get hands on experience in implementing integrity constraints over graphs and designing an automated verification approach.

Benefit for the Project

The development of integrity constraints is an important aspect of the RG framework. Specifying integrity constraints over graphs can bring several benefits for network analysis applications: (1) It enables semantic integrity checking across different analysis results. (2) It supports comparative analysis on different dimensions in order to predict trends and discover new insights. (3) It can improve query performance by reformulating queries in a way that can leverage existing results whenever possible.

Requirements

Strong skills in software development (Java, Python or C), and solid knowledge in database theory and graph theory are desired.

More information

  1. Q. Wang, "Network analytics ER model -- Towards a conceptual view of network analytics". Proceedings of ER, 2014.
  2. M. Liu, "Towards a Unified Framework for Network Analytics", Masters thesis, 2015 (for a copy of the thesis, please refer to QIng's website:http://users.cecs.anu.edu.au/~u5170295/projects.html).
  3. S. Abiteboul, R. Hull and V. Vianu, "Foundations of databases". Addison-Wesley Reading, 1995.

Project task 3: Dynamic Network Analysis

 

Description

Network analysis applications are “dynamic” by nature, and evolve over time. Can network analysis be dynamically performed at different scales or over different time periods so as to predict trends and patterns? To cope with this, the Rogas platform should be extended to support a variety of techniques for dynamic analysis of network data.

The goal of this project is to develop techniques that can support dynamic network analysis tasks. This consists of two tasks: (1) developing a visualization tool that can visualize graphs dynamically, and (2) designing dynamic analysis strategies that can provide a flexible and efficient way for conducting various network analysis tasks dynamically.

Benefit for the Student

Gain a solid understanding of latest technologies and tools for graph analytics. Get hands on experience in developing visualization and analysis tools for networks.

Benefit for the Project

After completing this work, the RG framework can be extended to provide a dynamic view on the semantics of network analysis tasks. It also brings us some advantages for managing network analysis tasks, such as, dynamically handling the semantic integration of different data analysis results and enabling comparative network analysis.

Requirements

Strong skills in software development (Java, Python or C), and solid knowledge in database theory and graph theory are desired.

More information

  1. Q. Wang, "A conceptual framework for network analytics". Data and Knowledge Engineering, 2015.
  2. M. Liu, "Towards a Unified Framework for Network Analytics", Masters thesis, 2015 (for a copy of the thesis, please refer to QIng's website:http://users.cecs.anu.edu.au/~u5170295/projects.html).
  3. C. Aggarwal and K. Subbian, "Evolutionary Network Analysis: A Survey". ACM Computing Surveys, 47(1), 2014.

Updated:  1 November 2018/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing