HPC Graph Analysis

SIAM Computational Science and Engineering (CSE) 2017

Atlanta, Georgia, USA

27 February - 2 March 2017

Table of Contents

1 Modeling and Computational Methods in Network Science and Applications

Monday, 27 February 2017

In recent years, there has been high demand for novel and accurate mathematical models, and fast, stable and scalable computational techniques to address problems emerging from applications on real-world networks, such as social media and power grids. In these applications, big data is often generated, collected, stored and/or processed in large-scale heterogeneous networks. New models and computational methods must tackle challenges of inhomogeneous structures of networks, randomness of dynamics, and noise in data. This mini-symposium focuses on the recent advances of mathematical modeling and numerical methods as well as their applications in modern network science.

Organizers:

Talks:

  • MS7
    Influence Prediction for Continuous-Time Information Propagation on Networks Using Graph-Based Fokker-Planck Equation

    Xiaojing Ye, Georgia State University

    We consider the problem of predicting influence, defined as the expected number of infected nodes, resulted from information propagating from any given set of source nodes on a network. We develop a novel and transformative framework that adaptively aggregates the activation states of the network according to the number of active nodes, leading to the construction of a system of differential equations that governs the time evolution of the state probabilities. This system is analogous to the Fokker-Planck equation in continuous space, and the solution readily yields the desired influence. This approach gives rise to a class of novel and scalable algorithms that work effectively for large-scale and dense networks. Numerical results on a variety of synthetic and real-world networks will be presented.

    Walk-Based Centrality and Communicability Measures: Algorithms and Applications

    Michele Benzi, Emory University

    In this talk I will review some network centrality and communicability measures based on walks. These measures can be given an elegant closed form in terms of functions of the adjacency matrix. I will introduce the notion of total communicability of a network as a measure of network connectivity and robustness and show that it can be computed very quickly even for large graphs. Finally, I will discuss efficient edge modification strategies (including edge removal, addition, and rewiring) that can be used to obtain networks with desirable communicability properties.

    The talk is based on recent work in collaboration with Christine Klymko (LLNL) and Francesca Arrigo (Strathclyde).

    Enhanced Community Detection in Multilayer and Temporal Networks through Layer Aggregation

    Dane Taylor, University of North Carolina at Chapel Hill; Rajmonda Caceres, Massachusetts Institute of Technology; Peter J. Mucha, University of North Carolina at Chapel Hill

    Inspired by real-world networks consisting of layers that encode different types of connections, such as a social network at different instances in time, we study community structure in multilayer networks. We analyze fundamental limitations on the detectability of communities by developing random matrix theory for the dominant eigenvectors of modularity matrices that encode an aggregation of network layers. Aggregation is often beneficial when the layers are correlated, and it represents a crucial step for the discretization of time-varying network data, whereby layers are binned into time windows. We explore two methods for aggregation: summing the layers’ adjacency matrices as well as thresholding this summation at some value. We analyze detectability phase transitions that are onset by varying either the density of within-community edges or community size. We identify layer-aggregation strategies that are optimal in that they minimize the detectability limit. Our results indicate good practices in the context of community detection for how to aggregate network layers, threshold pairwise-interaction data matrices, and discretize time-varying network data. We apply these results to synthetic and empirical networks, including a study of anomaly detection for the Enron email corpus.

    Dynamic Processes Over Information Networks: Representation, Modeling, Learning and Inference

    Le Song, Georgia Institute of Technology

    Large-scale and high resolution data from dynamic processes over networks are becoming increasing available nowadays from online social platforms, such as Twitter and Facebook. Such data provide great opportunities for understanding and modeling both macroscopic (network level) and microscopic (node-level) patterns in human dynamics. Such data have also fueled the increasing efforts on developing methods to address the challenges arising from understanding, predicting, controlling and distilling knowledge from these dynamic processes over networks, and answer query such as "who will do what and when?" To tackle these challenges, I will present a framework based on point processes for representing and modeling such data, and performing learning, inference and control over dynamic processes over networks.

  • MS36
    Social Networks as Point Processes

    Martin Short, Georgia Institute of Technology

    The study of social networks, and the dynamics of activity taking place on those networks, is currently of great interest, thanks in part to the availability of datasets from sources such as Facebook and Twitter. In this talk, I will discuss networks in the context of the dynamics of events occurring within them, referencing two seemingly very different applications: a network of gang rivalries from Los Angeles and a network of e-mail activity among a small group of college students. I will describe how both of these datasets can be understood through a common mathematical framework, the self-exciting point process. This modeling framework allows us to study interesting and difficult questions regarding the data, such as: which gangs were involved in a given violent altercation, and how might one infer "leadership" of a social network by studying the underlying dynamics?

    Real-Time In-Situ Seismic Imaging with Sensor Networks

    WenZhan Song, University of Georgia

    The seismic imaging process today involves massive seismic data collection from hundreds and thousands of seismic sensors to a central place for post computing. The whole process is expensive and often takes days even months to complete. There is great demand for real-time as it would reduce the costs and risks of E&P and mitigate the environment concerns. This talk presents an innovative Real-time In-situ Seismic Imaging (RISI) system that computes and visualizes the 3D subsurface imaging in seconds. The RISI system is a mesh network that sense and process seismic signals, and compute 3D subsurface image in-situ in real-time. Instead of data collection then post processing, the mesh network performs the distributed data processing and inversion computing under the severe bandwidth and resource constraints, and generates an evolving 3D subsurface image as more events arrive. Several innovative distributed seismic imaging algorithms have been successfully developed and validated using both synthetic and real-world seismic data set. The hardware prototype system has also been implemented and can be extended as a general field instrumentation platform, to incorporate new geophysical data processing and computing algorithms, beyond seismic.

    Abnormal Synchrony in Evolving Brain Networks

    Igor Belykh, Reimbay Reimbayev, and Kevin Daley, Georgia State University

    This talk addresses a fundamental question of how pathological synchronized rhythms, associated with epileptic seizures and Parkinson’s tremors, appear in brain networks as a result of complex spatial and temporal dynamics of the brain network. There is experimental evidence showing that evolving brain networks change their functional structure during epileptic seizures from a more regular to a more random structure. In this talk, we will discuss the role of evolving network structure and the switching between healthy and abnormal rhythms that is accompanied by a change in intrinsic dynamics of neurons.

2 High-Performance Graph Algorithms

Tuesday, 28 February 2017

Graph algorithms play an important role in the manipulation, distribution and analysis of large data sets arising from both traditional and emerging computational science applications. This minisymposium brings together researchers engaged in developing high-performance parallel graph algorithms with applications to parallel computing, linear system solvers, genomics, neuroscience, and cybersecurity.

Organizers:

Talks:

  • MS103
    Maintaining Connected Components for Infinite Graph Streams

    Jonathan Berry, Sandia National Laboratories; Matthew Oster, Pacific Northwest National Laboratory; Cynthia Phillips, Steve Plimpton, and Timothy Shead, Sandia National Laboratories

    We present an algorithm to maintain the connected components of a graph that arrives as an infinite stream of edges. Connectivity-related queries, including component spanning trees, are supported with some latency, returning the state of the graph at the time of the query. Because an infinite stream may eventually exceed the storage limits of any number of finite-memory processors, we assume an aging command or daemon where “uninteresting” edges are removed when the system nears capacity. Following an aging command the system will block queries until its data structures are repaired, but edges will continue to be accepted from the stream, never dropped. The algorithm will not fail unless a model-specific constant fraction of the aggregate memory across all processors is full. In normal operation, it will not fail unless aggregate memory is completely full.

    Unlike previous theoretical streaming models designed for finite graphs that assume a single shared memory machine or require arbitrary-size intermediate files, we distribute a graph over a ring network of finite-memory processors. We implemented our algorithm using an asynchronous message-passing system. We sketch the algorithm and give preliminary experimental results on synthetic and real graph streams.

    Updating Dynamic Networks in Parallel Using Graph Sparsification

    Sanjukta Bhowmick and Sriram Srinivasan, University of Nebraska, Omaha; Sajal Das, Missouri University of Science and Technology

    We present graph sparsification, an elegant technique for updating the properties of dynamic networks. Using graph sparsification the original network is divided into several small subgraphs. Each subgraph contains a set of specially marked edges, known as key edges, that pertain to the property to be updated. Each addition/deletion of edges is updated with respect to these key edges. In our presentation, we will show how by using graph sparsification we can develop scalable algorithms to update properties of weighted graphs such as minimum spanning tree and single source shortest paths.

    Computing Graph Centrality

    Erik Saule, University of North Carolina, Charlotte; A. Erdem Sariyüce, Sandia National Laboratories; Kamer Kaya, Sabanci University, Turkey; Ümit V. Çatalyürek, Georgia Institute of Technology

    Computing the centrality of a graph is a useful tool for many applications ranging from social network to traffic analysis, from drug discovery to urban planning. While many definitions of centrality exists, the most commonly used – namely betweenness and closeness centrality – are variations on the theme of all-pair shortest path. With a complexity of O(V E )O(VE), computing centrality is difficult in practice; computing centralities efficiently has attracted quite some interest in the recent years.

    In this talk, we will review the techniques one has to deploy to reach the highest performance when computing graph centrality. In both case of static graphs and incremental graphs, we will show that computing centrality on modern computing system involves both algorithmic and system issues.

    High-Performance Graph Traversal for De Bruijn Graph-Based Metagenome Assembly

    Vasudevan Rengasamy and Kamesh Madduri, Pennsylvania State University

    De Bruijn graph-based assembly is a popular technique for analyzing modern genomic and metagenomic DNA sequencing data. This technique is comprised of several computational building blocks, with the ultimate goal of reconstructing genomes based on overlapping shorter strings. This work describes new scalable, parallel implementations of several key compute- and memory-intensive steps of the De Bruijn graph-based assembly technique. We focus on exploiting multicore parallelism and parallel I/O capabilities available on current supercomputers, and also target memory efficiency, i.e., fully in-memory execution using as few compute nodes as possible. We demonstrate applicability of these optimized implementations by integrating them with the MEGAHIT metagenome assemble.

  • MS130
    Balanced Multi-Criteria Graph Partitioning

    Cedric Chevalier and Remi Barat, CEA/DAM, France; Francois Pellegrini, University of Bordeaux, France

    Multi-criteria graph partitioning is a key component for enabling highly scalable multi-physics simulations. The main challenge is to balance all the criteria at the same time. Existent multi-criteria partitioners such as ParMetis are adapted from mono-criterion and focus their goal in minimizing communication cost. However, in practice, this approach is not robust in respect to balance of all criteria, often producing results that are not in the imbalance tolerance.

    We have implemented, in Scotch, specific multi-criteria partitioning algorithms, designed to meet balance constraints for all criteria. In this talk, we will present and discuss results of these new algorithms.

    Multilevel Acyclic Partitioning of Directed Acyclic Graphs for Enhancing Data Locality

    Julien Herrmann, Georgia Institute of Technology; Aravind Sukumaran Rajam, Ohio University; Fabrice Rastello, Inria, France; P. (Saday) Sadayappan, Ohio State University; Ümit V. Çatalyürek, Georgia Institute of Technology

    In modern computational systems, the cost of data movement across nodes or within the memory hierarchies inside a node becomes more and more significant over the cost of performing arithmetic operations. When executing a program, maximizing data reuse will reduce the data movement cost and thus, improve the general execution time. Finding a good acyclic partition of the computational directed acyclic graph (DAG) associated with the algorithm can help finding an execution improving data locality. We present a multilevel direct k-way partitioner for the acyclic partitioning of DAGs. The quality of the computed acyclic partitions are assessed at the graph level by computing the edge cut or the total volume of communication between components, and at the application level by computing the number of cache miss introduced by the partition on various algorithms.

    Partitioning Irregular Graphs at the Trillion-Edge Scale

    George M. Slota, Rensselaer Polytechnic Institute; Siva Rajamanickam, Sandia National Laboratories; Kamesh Madduri, Pennsylvania State University; Karen D. Devine, Sandia National Laboratories

    We introduce XtraPuLP, a new distributed-memory partitioner designed to scale to trillion-edge graphs. XtraPuLP is a significant extension from our prior shared-memory partitioner, PuLP, and is based on the label propagation community detection technique. Label propagation has been previously demonstrated to be an effective means of partitioning modern extreme-scale graph-structured datasets, including human social and interaction networks, web craws, and brain graphs. On a collection of large sparse graphs, we show that XtraPuLP partitioning quality is comparable to other state-of-the-art partitioning methods. We also demonstrate that XtraPuLP can produce high-quality partitions of real-world graphs with billion+ vertices in minutes. Additionally, we find that using XtraPuLP partitions for distributed-memory graph analytics leads to a significant end-to-end reduction in execution time.

    Sparse Matrix-Matrix Multiplication for Modern Architectures

    Mehmet Deveci, Erik G. Boman, and Siva Rajamanickam, Sandia National Laboratories

    Sparse matrix-matrix multiplication (SPMM) is an important kernel in high performance computing that is heavily used in the graph analytics as well as multigrid linear solvers. Because of its highly sparse structure, it is usually difficult to exploit the parallelism in the modern shared memory architectures. Although there have been various work studying shared memory parallelism of SPMM, some points are usually overlooked, such as the memory usage of the SPMM kernels. Since SPMM is a service-kernel, it is important to respect the memory usage of the calling application in order not to interfere with its execution. In this work, we study memory-efficient scalable shared memory parallel SPMM methods. We study graph compression techniques that reduce the size of the matrices, and allow faster computations. Our preliminary results show that we obtain upto 14x speedup w.r.t NVIDIA’s cuSPARSE, 30% speedups w.r.t SPMM implementation provided in Intel Math Library while using 65% less memory.

3 Advances in Dynamic Graphs: Algorithms, Applications and Challenges

Wednesday, 1 March 2017

Many interesting phenomena can be modeled as dynamic graphs. For example, in biology, complex interactions of microbes evolve over a period of time; in cybersecurity, malicious activities evolve over a period of time through complex interactions with the system; and in transportation networks, dynamic analysis of traffic is essential for modeling evacuation plans and contingency measures. Several graph algorithms such as dynamic community detection, network alignment, graph matching and edge cover formulations play a vital role in the study of real-world dynamic networks.

The goal of this minisymposium is to bring together computational researchers and domain experts exploring dynamic graph models, algorithms, and applications of interest. We propose a two-part minisymposium covering problems from application domains in biology (microbial communities, brain networks), cybersecurity (anomaly detection, control systems), and transportation networks. These talks will be augmented with talks on dynamic graph algorithmic modeling, design and development at scale.

Organizers:

Talks:

  • MS157
    Advances in Algorithms and Applications for Dynamic Graphs

    Ananth Kalyanaraman, Washington State University; Mahantesh Halappanavar, Pacific Northwest National Laboratory

    Over the last decade, dynamic graph formulations have emerged in multiple application sectors. In social networking, the webs for connectivity and social discourse are maintained as complex networks that continuously evolve at rates that make them intractable to store, let alone analyze. In life sciences, high-throughput technologies for genotyping, phenotyping and imaging are being deployed at massive scales, in order to capture dynamic data at various scales. Similar trends also persist in other domains-e.g., cybersecurity, urban networks. While the machinery to collect dynamic data at scale have made significant strides, software capabilities and in particular, algorithmic modeling and network characterizations are mostly in their nascent stages.

    In this introductory talk, we will provide an overview of some of the notable algorithmic advances for dynamic networks analysis, as motivated by several application use-cases in the life sciences, cybersecurity and brain imaging. Specifically, we will focus on algorithms for community detection, graph matching, and network comparison, and their related challenges. Of emphasis will be to explore the connections (or gaps thereof) between algorithmic modeling and real world application. The talk will be followed by other talks that serve to highlight a subset of the specific application use-cases in more depth. The second part of the mini-symposium will focus on algorithms and network models for dynamic graph analyses.

    Dynamic Networks of Microbial Biofilms

    Radu Marculescu and Chieh Lo, Carnegie Mellon University

    Cell-based therapeutics is a key component of precision medicine, i.e., the new paradigm for disease prevention and treatment aimed at providing customized healthcare solutions on a patient-by-patient basis. While there has been significant progress towards understanding the cellular behavior and controlling individual cells, our understanding still lacks a computational framework able to capture the population-level cell interactions and emerging behaviors that are critical to fighting diseases like antibiotic-resistant infections or cancer.

    In this talk, we bring a new perspective on molecular communication and nano-networking at population-level (as opposed to single cell-level) which is critical to engineer cells behavior, reprogram the cell-cell communication, and develop new strategies to control the dynamics of population of cells. In particular, we discuss the significance of the network-based approach to explore the subtleties of bacteria inter-cellular network and its implications to biofilm dynamics. Indeed, as computational models become more and more powerful, a network-centric approach to studying cell populations can improve our understanding their social behaviors and possibly help controlling the infectious diseases they cause. This is a major step towards developing new drugs and targeted medical treatments to fight biofilm-related infections.

    Dynamic Brain Networks

    Kristian Eschenberg, Tom Grabowski, and David Haynor, University of Washington

    Current functional neuroimaging (fMRI) studies focus primarily on correlational network analyses, though some still apply classical frequency analysis methods. Few, however, consider global spatiotemporal patterns of cortical activity beyond identifying groups of voxels whose signals fluctuate together. In this talk, I will discuss a recently developed modal decomposition algorithm called Dynamic Mode Decomposition (DMD) and its applicability to study global brain activity patterns of fMRI data acquired by the Human Connectome Project (HCP). DMD was developed by Schmid [J. Fluid Mech., 2010] as an equation-free method to model the behavior of complex dynamical systems to identify spatially coherent modes in a flow field. Importantly, each mode is associated with an eigenvalue defining a unique frequency of oscillation and growth/decay rate. So far, DMD has been applied to the analysis of brain sleep spindles using EEG in a small data set by Brunton et al. [arXiv, 2014], but has not yet been applied in a large-scale population study or to high-resolution fMRI data such as that acquired by the HCP. I will discuss how we can use DMD’s dynamic modes to build new graphical representations of the cortex, and also how we can use them to shed light on the relationship that various factors like age and neurological disorders (such as Alzheimers and Schizophrenia) have with brain function.

    Quantitative Assessment of Transportation Network Vulnerability with Dynamic Traffic Simulation Methods

    Venkateswaran Shekar, University of Massachusetts, Dartmouth; Samrat Chatterjee and Mahantesh Halappanavar, Pacific Northwest National Laboratory; Lance Fiondella, University of Massachusetts, Dartmouth

    Safe, secure, and reliable operation of our nation's transportation networks is critical for sustaining modern society. This includes mitigating congestion and assessing impacts of natural, accidental, and intentional disruptions to such networks. Past research on transportation network vulnerability has primarily focused on static traffic models, many of which are formulated as traditional optimization problems. However, transportation networks are dynamic because their usage varies over time. As a result, a realistic characterization of network vulnerability must account for these dynamic properties. This talk describes a dynamic traffic simulation-based approach to assess the vulnerability of transportation networks to disruptions. The approach includes prioritization of critical links over time and is generalizable to the case where both link and node disruptions are of concern. Motivating case study examples are presented and insights into the time varying criticality of links are discussed.

  • MS184
    Models for Principled Characterization of Dynamic, Spatially Embedded, Multiscale Networks

    Danielle S. Bassett and Richard Betzel, University of Pennsylvania

    Advances in neuroimaging techniques have made it possible to reconstruct whole-brain networks composed of structural (physical fiber tracts) and functional (statistical relationships) connections among brain regions. Analysis of these static networks has revealed a host of non-random attributes, including highly connected hubs, modular architecture, and rich clubs. In this talk I will discuss two recent advances in brain network modeling. First, I will introduce the multi-layer network model for characterizing time-varying functional brain networks. I will cover recent work showing that the brain’s flexibility – the extent to which its multi-layer modular organization is stable across time – can be used to predict an individual’s learning rate, is associated with executive function and state of arousal, and is also altered in psychiatric diseases such as schizophrenia. Second, I will discuss the role that the brain’s intrinsic geometry plays in shaping its network architecture. I will cover two recent studies – one in which we show that simple wiring rules can explain a wide variety of the brain’s topological features and another in which we modify classical community detection tools to uncover space-independent community structure in brain networks.

    Scalable Algorithms for Graph Matching and Edge Cover Computations

    Alex Pothen and Arif Khan, Purdue University; Mostofa Patwary and Pradeep Dubey, Intel Labs

    Computing a matching and an edge cover in a graph are two basic problems in computer science, with many applications in network science, machine learning, computational science and engineering, etc. Most variants of matching and edge cover problems can be solved in polynomial time, yet the running times are high and the algorithms are sophisticated. It is even more challenging to design parallel algorithms, since many algorithms rely on searching for long paths in a graph, or implicitly communicate information along long paths, and thus have little concurrency.

    We design new approximation algorithms for variant matching and edge cover problems that have high concurrency. For b-Matching, we obtain a 1/2-approximation algorithm; and for b-Edge cover, we describe a 3/2-approximation algorithm. Both algorithms have been implemented on shared memory and distributed memory machines, and we report results from tens of thousands of cores. We also discuss some applications of b-Matchings and b-Edge covers.

    Massive-Scale Streaming Analytics for Dynamic Graphs

    David A. Bader, Georgia Institute of Technology

    Emerging real-world graph problems include: detecting community structure in large social networks; improving the resilience of the electric power grid; and detecting and preventing disease in human populations. Unlike traditional applications in computational science and engineering, solving these problems at scale often raises new challenges because of the sparsity and lack of locality in the data, the need for additional research on scalable algorithms and development of frameworks for solving these problems on high performance computers, and the need for improved models that also capture the noise and bias inherent in the torrential data streams. In this talk, the speaker will discuss the opportunities and challenges in massive data-intensive computing for applications in computational science and engineering.

    Dynamic Network Analysis: From Inference to Insight

    Tanya Y. Berger-Wolf, University of Illinois, Chicago

    From gene interactions and brain activity to cellphone calls and zebras grazing together, large, noisy, and highly dynamic networks of interactions are everywhere. Unfortunately, in this domain, our ability to analyze data lags substantially behind our ability to collect it. Moreover, we may be collecting the wrong data for the questions we want to answer in the first place. From collecting the data and inferring the networks to producing meaningful insight at scale, challenges are there every step of the way and computational approaches have been developed to meet those challenges.

    We will present computational approaches that address some of the questions about dynamic interaction networks: whom should we sample? how often? what is the “right" network? what are the meaningful patterns and trends? and how can we use the network to gain insight into other aspects of the node behavior? The methods leverage the topological graph structure of the networks and the size of the available data to, somewhat counter-intuitively, to produce more accurate results faster. We will demonstrate the scientific implications of the computational analysis on networks of zebras, baboons, and interacting brains cells.

4 High-Performance Streaming Graph Analysis

Thursday, 2 March 2017

Graph analysis extracts information from relationship data prevalent in network security, health informatics, finance, and many other fields. The modeled relationships change over time, sometimes very rapidly. Repeatedly re-analyzing massive graphs often does not keep up with performance requirements. Applications that monitor and respond to data streams can have low latency requirements. For example, analyzing a 10 gigabit Ethernet in real time requires handling over 130,000 network flows per second or an average latency of under 7.7 microseconds. New algorithms and systems can leverage scattered, local changes in this streaming data for rapid response. Less latency-intensive environments also benefit from more focused analysis and higher performance. This minisymposium surveys a range of practical algorithms, software systems, and new directions for analyzing real-world streaming graph data.

Organizers:

Talks:

  • MS200
    High-Performance Analysis of Streaming Graphs

    Jason Riedy, Georgia Institute of Technology

    Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Stopping the world to run a single, static query is infeasible. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss requirements for single-shot queries on changing graphs as well as recent high-performance algorithms that update rather than recompute results. These algorithms are incorporated into our software framework for streaming graph analysis, STING (Spatio-Temporal Interaction Networks and Graphs).

    Dense Subgraphs in Temporal Networks: Algorithms and Analysis

    A. Erdem Sariyüce and Ali Pinar, Sandia National Laboratories

    The underlying algorithmic challenge with finding dense graphs is that most standard formulations of this problem (like clique, quasi-clique, k-densest subgraph) are NP-hard. Furthermore, current dense subgraph finding algorithms usually optimize some objective in stationary settings, and only find a few such subgraphs without providing any structural relations among them. However, the goal is rarely to find the "true optimum," but to identify many (if not all) dense substructures. On the other hand, most real-world networks are highly dynamic, but existing practical dense subgraph finding algorithms are only able to handle stationary graphs. We present algorithms to find dense subgraphs in temporal networks and build the hierarchy structure among them. We also use our algorithms to analyze some real-world networks and present new insights.

    Time-Evolving Graph Processing on Commodity Clusters

    Anand Iyer and Ion Stoica, University of California, Berkeley

    Real-world graphs are seldom static. Applications that generate graph-structured data today do so continuously, giving rise to an underlying graph whose structure evolves over time. Mining these time-evolving graphs can be insightful, both from research and business perspectives. While several works have focused on some individual aspects of time-evolving graph processing such as efficient incremental computations (e.g., Naiad and its GraphLINQ library), consistent snapshot generation (e.g., Kineograph, LLAMA, DeltaGraph) or storage support for highly dynamic graphs (e.g., Weaver), there exists no general purpose distributed time-evolving graph processing engine.

    In this paper, we present Tegra, a time-evolving graph processing system built on a data flow framework. Tegra enables three broad classes of operations on evolving graphs: first, it enables storage, retrieval and bulk trans- formation of multiple graph snapshots efficiently using a structure-sharing persistent data structure based index. Second, it supports temporal graph analysis tasks such as evolutionary queries using a novel timelapse abstrac- tion that lets it process multiple snapshots simultaneously with low overhead. Finally, Tegra enables a lightweight dynamic computation model, based on the Gather-Apply- Scatter (GAS) decomposition model, that lets it do sliding window analytics on streaming graphs. Our evaluation of Tegra on real-world datasets show promising results.

    Parallel and Streaming Methods for Real-Time Analysis of Dense Structures from Graphs
    Srikanta Tirthapura, Apurba Das, A. Pavan, and Michael Svendsen, Iowa State University; Kanat Tangwongsan, Mahidol University International College, Thailand; Kung-Lun Wu, IBM T.J. Watson Research Center
  • MS226
    On Betweenness Centrality Problems in Dynamic Graphs
    Elisabetta Bergamini and Henning Meyerhenke, Karlsruhe Institute of Technology, Germany

     
     
    Predicting Movement of Vertices Across Communities in Dynamic Networks

    Sriram Srinivasan and Sanjukta Bhowmick, University of Nebraska, Omaha

    As dynamic networks evolve, a certain percentage of vertices can migrate to a different community or form a new community(ies). In this presentation, we present an algorithm on how to identify such vertices. We use a metric known as permanence, which measures by how much a vertex belongs to a community. Identifying the migrating vertices can be used to predict the structure of the dynamic communities as well as the effect of the perturbation in the network.

    Large-Scale Dynamic Graph Processing on HPC Systems

    Keita Iwabuchi, Tokyo Institute of Technology, Japan; Roger Pearce and Maya Gokhale, Lawrence Livermore National Laboratory; Satoshi Matsuoka, Tokyo Institute of Technology, Japan

    In many graph applications, the structure of the graph changes dynamically over time and may require real time analysis. However, most prior work for processing large graphs on HPC, e.g., Graph500, has not focused on a dynamic graphs, rather static only. To address this issue, we have developed algorithms, data structures, and infrastructure management necessary to support dynamic graph analysis at large scale on distributed HPC platforms, including next generation supercomputers which have locally attached NVRAM. We have explored support for online dynamic graph processing using an event-centric infrastructure in HavoqGT. When the graph structure or vertex/edge attributes change, HavoqGT triggers an algorithmic event that allows user-defined callbacks to perform the necessary application updates.

    In this talk we discuss our DegAwareRHH data-store designed for storing and indexing dynamic graphs persistently on node-local NVRAM storage. We demonstrate its performance scaling out to store large, scale-free graphs by leveraging compact hash tables with high data locality. We extend DegAwareRHH for distributed memory using the event-centric infrastructure in HavoqGT. Using DegAwareRHH, we demonstrate performance of a dynamic graph-colouring algorithm using a large scale real-world dynamic graph from Wikipedia's edit history.

    Creating Dynamic Graphs from Temporal Data

    Anita Zakrzewska, Georgia Institute of Technology

    In recent years, there has been a growing interest in the analysis of dynamic graphs, which represent relationships and interactions changing over time. Often, analysis is performed under the assumption that a dynamic graph already exists. However, dynamic graphs usually must be created from an underlying temporal stream of edges and it is necessary to choose a method of doing so. Specifically, new data from the stream must be added and old or less relevant data must be aged off. The approach used affects the structure of the resulting dynamic graph and will therefore affect both data analysis results and performance. This talk analyzes methods of aging off old data to create dynamic graphs.