Mining frequent subgraphs in multigraphs

doi:10.1016/j.ins.2018.04.001

Information Sciences

Volumes 451–452, July 2018, Pages 50-66

https://doi.org/10.1016/j.ins.2018.04.001 Get rights and content

Abstract

For more than a decade, extracting frequent patterns from single large graphs has been one of the research focuses. However, in this era of data eruption, rich and complex data is being generated at an unprecedented rate. This complex data can be represented as a multigraph structure - a generic and rich graph representation. In this paper, we propose a novel frequent subgraph mining approach MuGraM that can be applied to multigraphs. MuGraM is a generic frequent subgraph mining algorithm that discovers frequent multigraph patterns. MuGraM efficiently performs the task of subgraph matching, which is crucial for support measure, and further leverages several optimization techniques for swift discovery of frequent subgraphs. Our experiments reveal two things: MuGraM discovers multigraph patterns, where other existing approaches are unable to do so; MuGraM, when applied to simple graphs, outperforms the state of the art approaches by at least one order of magnitude.

Introduction

Real world data can be easily modelled as a graph where entities are represented as nodes, and interactions between entities are represented as edges. When only one edge type is allowed between a pair of nodes, we refer to this graph structure as single edge graph; when more than one edge type is allowed between a pair of nodes, we refer to it as multigraph. Multigraph structure enables us to represent multiple relations between a pair of nodes [2], [3].

Many real world datasets can be modelled as a network where a set of nodes are interconnected by multiple relations. Various domains are abound with multigraphs: social networks spanning over the same set of people, but with different life aspects (e.g., social relationships such as Facebook, Twitter, LinkedIn, etc.); protein-protein interaction multigraphs, where the protein pairs have direct interactions/physical associations or they are co-localised [1]; gene multigraphs, where genes are connected by different pathway interactions that belong to different pathways; Resource Description Framework knowledge graphs, where a subject/object node pair is connected by different predicates [17].

Since multigraphs allow more than one relation between a pair of nodes, we can represent real world data more succinctly, which in turn helps in mining patterns that cannot be discovered in the otherwise simple graphs. For example, a recent work in the field of bioinformatics [16] creates multigraphs by merging heterogeneous genomic and phenotype data, in order to identify the disease genes. Many such applications can be catalysed in order to mine interesting and useful patterns.

One of the most important tasks in graph data management is frequent subgraph mining [6], [11], [13], [14], [22] where the problem is to discover patterns that occur frequently in a graph database. Although plenty of approaches exist to mine frequent patterns in single edge graph, to the best of our knowledge, no approach exists to mine frequent patterns in multigraphs.

Considering FSM in single edge graph data, the existing approaches can be categorized into two main families: (i) FSM for transactional graph setting, where the graph database consists of a set of relatively small sized graphs called transactions, and (ii) FSM for single large graph setting, where the graph database consists of a single large graph.

In transactional graph databases, a subgraph is frequent if it appears in at least δ transactions, where δ is a user defined frequency threshold value. Several works have been proposed to address FSM in transactional graph databases [11], [13], [22]. However, since the task of FSM in single large graph setting is more challenging than the transactional one, several approaches [6], [14] have been proposed by considering various frequency (or support) evaluation measures [4], [20].

This work is motivated by the fact that the existing FSM approaches cannot be applied to multigraph data. That is, when the graph data contains multiple relations between a pair of entities, the existing FSM approaches cannot discover frequent patterns that contain a subset of multiedges. Thus, whenever multiple relations (multiedges) exist between a pair of nodes, in order to use the existing FSM approaches, one has to map the multiple relations (multiedges) to a unique value (distinct edge label) and then perform FSM, which however, does not yield desirable results, thereby making the existing approaches rather incomplete. To the best of our knowledge, no existing work can discover frequent subgraphs in single large multigraphs. It is to be noted that one of the recent works called GraMi [6] claims to handle multi-labeled graphs (which we refer to as multigraphs). However, neither they provide any details about managing multigraphs in their paper, nor their latest code is capable of handling multigraph data.

Let us consider a typical scenario of performing FSM on a multigraph as depicted in Fig. 1. The data multigraph in Fig. 1(a) is an extract of the real world AUCS dataset [12] that has five different relations (edge types) namely, lunch, Facebook, coauthor, leisure, and work which are defined among a set of university employees (nodes). If we perform FSM on this dataset by setting a frequency threshold $δ = 2,$ the existing FSM approaches output no patterns, since they treat a set of relations between a pair of nodes as a unique identifier, rather than treating it as a set of multiple relations. And thus, they are unable to discover those frequent patterns that are spanned from a subset of the relations, as depicted in Fig. 1(b).

The objective of the proposed work is to fill the gap in the field of FSM by proposing an approach to extract frequent patterns from multigraph data by considering patterns that can span over a subset of the multigraph relations. Thus, we propose MuGraM (Frequent MultiGraph Miner) - an algorithm that enumerates all frequent subgraph patterns in a single large multigraph. The major contributions of this work are:

•
a set of efficient pruning rules to swiftly traverse the search space for multigraph pattern extraction;
•
an efficient method to quickly evaluate the pattern support;
•
a quantitative and qualitative evaluation of MuGraM on real world graph data.

The experimental evaluation reveals that MuGraM is not only an approach that can handle multigraph data efficiently but it also outperforms the state-of-the-art approaches in extracting frequent subgraphs in single edge graph data.

The rest of the paper is organized as follows. In Section 2, we discuss the related works. In Section 3, we introduce some basic definitions and formalize the problem. In Section 4, we discuss the proposed multigraph mining algorithm MuGraM along with several optimization strategies. Detailed experimental evaluations are conducted in Section 5, followed by the conclusion in Section 6.

Section snippets

Related work

Several existing works address the problem of FSM for both transactional graph databases and single graph databases. For the transactional graph database setting, the work of Inokuchi et al. [11] has shaped the foundation for many later works. This work proposed an approach called AGM to efficiently mine the association rules among the frequently appearing substructures in a given graph data set, by treating a transaction as an adjacency matrix. Among the later works, few are notable: FSG by

Preliminaries and problem definition

In this paper we address the problem of mining single large multigraphs with undirected edges and unlabelled vertices, which will be referred as multigraphs. A multigraph $G$ is defined as a tuple (V, E, L_E, T), where V is a set of vertices, T is a set of edge types, E⊆V × V is a set of undirected edges, and L_E: V × V → 2^T is a labelling function that assigns a subset of edge types to each edge E it belongs to. The labeling function L_E maps the edge E to a multiedge, and thus $G$ is a multigraph.

MuGraM: an algorithm for mining multigraphs

In order to address the problem of FSM for single large multigraph data, we propose MuGraM - a frequent multigraph mining algorithm. Our proposed approach follows a framework similar to that of existing mining approaches, as introduced in [6], [14]. A generic framework (as depicted in Fig. 3) of mining single large graphs involves the following steps: (i) enumerate the frequent edges (frequent patterns of size $s = 1$ ), (ii) extend each frequent pattern by successively adding the frequent edges

Experimental analyses

In this section we evaluate the performance of MuGraM by carrying out both quantitative and qualitative analysis. For quantitative evaluation, we compare the time performance of MuGraM with one of the recent state-of-the-art FSM approach - GraMi; for this evaluation, we use single edge graphs since no approach exists to perform FSM on multigraphs. Further, the qualitative analysis is performed on few real-world datasets to demonstrate the nature of patterns extracted by the proposed multigraph

Conclusions

In this work we proposed a generic multigraph mining algorithm called MuGraM that can efficiently discover frequent patterns in single large multigraphs. The main contributions of this work include (i) a set of pruning techniques that reduce the search space exploration by avoiding the expensive support computation as much as possible, and further expediting the support computation, and (ii) an efficient support computation mechanism that relies on a backtracking approach to discover multigraph

Acknowledgment

This work has been funded by LabEx NUMEV integrated into the I-SITE MUSE (ANR-10-LABX-20).

References (22)

Z. Aidong
Protein Interaction Networks: Computational Analysis
(2009)
B. Boden et al.
Mining coherent subgraphs in multi-layer graphs with edge labels
Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(2012)
F. Bonchi et al.
Distance oracles in edge-labeled graphs
EDBT
(2014)
B. Bringmann et al.
What is frequent in a single graph?
Pacific-Asia Conference on Knowledge Discovery and Data Mining
(2008)
A. Cardillo, J. Gómez-Gardenes, M. Zanin, M. Romance, D. Papo, F. del Pozo, S. Boccaletti, Emergence of network...
M. Elseidy et al.
GRAMI: frequent subgraph and pattern mining in a single large graph
Proc. VLDB
(2014)
M. Fiedler et al.
Subgraph support in a single large graph
Seventh IEEE International Conference on Data Mining Workshops
(2007)
J. Gonzalez et al.
Efficient mining of graph-based data
Proceedings of the AAAI Workshop on Learning Statistical Models from Relational Data
(2000)
L. Holder et al.
Substucture discovery in the SUBDUE system
KDD Workshop
(1994)
V. Ingalalli et al.
SuMGra: querying multigraphs via efficient indexing
International Conference on Database and Expert Systems Applications
(2016)

A. Inokuchi et al.

An apriori-based algorithm for mining frequent substructures from graph data

European Conference on Principles of Data Mining and Knowledge Discovery

(2000)

Cited by (30)

Algorithms for enumerating multiple leaf-distance granular regular α-subtree of unicyclic and edge-disjoint bicyclic graphs
2024, Applied Mathematics and Computation
A multiple leaf-distance granular regular α-tree (abbreviated as LDR α-tree for short) is a tree (with at least $α + 1$ vertices) where any two leaves are at some distance divisible by α. A connected graph's subtree which is additionally an LDR α-tree is known as an LDR α-subtree. Obviously, $α = 1$ and 2, correspond to the general subtrees (excluding the single vertex subtrees) and the BC-subtrees (the distance between any two leaves of the subtree is even), respectively. With generating functions and structure decomposition, in this paper, we propose algorithms for enumerating an auxiliary subtree $α_{τ} (v)$ -subtree $(τ = 0, 1, \dots, α - 1)$ containing a fixed vertex, and various LDR α-subtrees of unicyclic graphs, respectively. Basing on these algorithms, we further present algorithms for enumerating various LDR α-subtrees of edge-disjoint bicyclic graphs.
Scalable maximal subgraph mining with backbone-preserving graph convolutions
2023, Information Sciences
Maximal subgraph mining is increasingly important in various domains, including bioinformatics, genomics, and chemistry, as it helps identify common characteristics among a set of graphs and enables their classification into different categories. Existing approaches for identifying maximal subgraphs typically rely on traversing a graph lattice. However, in practice, these approaches are limited to relatively small subgraphs due to the exponential growth of the search space and the NP-completeness of the underlying subgraph isomorphism test. In this work, we propose SCAMA, an approach that addresses these limitations by adopting a divide-and-conquer strategy for efficient mining of maximal subgraphs. Our approach involves initially partitioning a graph database into equivalence classes using bootstrapped backbones, which are tree-shaped frequent subgraphs. We then introduce a learning process based on a novel graph convolutional network (GCN) to extract maximal backbones for each equivalence class. A critical insight of our approach is that by estimating each maximal backbone directly in the embedding space, we can avoid the exponential traversal of the graph lattice. From the extracted maximal backbones, we construct the maximal frequent subgraphs. Furthermore, we outline how SCAMA can be extended to perform top-k largest frequent subgraph mining and how the discovered patterns facilitate graph classification. Our experimental results demonstrate the effectiveness of SCAMA in identifying almost perfectly maximal frequent subgraphs, while exhibiting approximately 10 times faster performance compared to the best baseline technique.
Multi-SPMiner: A Deep Learning Framework for Multi-Graph Frequent Pattern Mining with Application to spatiotemporal Graphs
2023, Procedia Computer Science
Mining frequent patterns in multigraphs is a challenging task in graph analysis with numerous real-world applications. This paper introduces a novel framework for frequent pattern mining on multi-graphs using the multi-SPMiner method. The approach is inspired by SPMiner, which was the first approach to employ deep learning in graph motif mining tasks. Multi-SPMiner builds on this foundation and focuses on the extraction of frequent motifs in single multi-graphs, specifically spatiotemporal graphs. Multi-SPMiner employs a two-step approach to extract the most frequent motifs in a graph with a high support value. In the first step, it embeds the nodes into an embedding order space, and in the second step, it performs a walk in the space to obtain the frequent motifs by iteratively growing the motif starting from a single node. The results obtained highlight the effectiveness of the proposed approach in identifying frequent motifs in single multigraphs, which is a crucial task in many real-world applications. Moreover, we demonstrate that our method is a generalization of SPMiner by testing it on single connection graphs.
A fast algorithm for mining temporal association rules in a multi-attributed graph sequence
2022, Expert Systems with Applications
Citation Excerpt :
With graph data becoming more and more popular in real life, researchers are interested in mining static graph data. Many algorithms have been proposed for mining interesting graph patterns, for instance, frequent subgraphs (Bhatia & Rani, 2018; Farhi & Boughaci, 2018; Ingalalli, Ienco, & Poncelet, 2018). In addition, as an extension of association rules, graph association rules (Wang, Xu, & Zhan, 2020; Wang & Xu, 2018) are mined.
In real life, there exist a lot of attributed graphs each of which contains attribute information as well as structural information. As time goes on, a group of attributed graphs form an attributed graph sequence. Being the generalization of single-attributed graph sequences, multi-attributed graph sequences are arising vastly and quickly. Mining the temporal associations hidden in a multi-attributed graph sequence is in urgent need from data owners. To meet the need and fill the gap of research on mining such kind of temporal associations, we first give a definition of temporal association rules for describing temporal associations in a multi-attributed graph sequence, and then propose a fast algorithm for mining temporal association rules in a multi-attributed graph sequence which is based on the anti-monotonicity of support. The proposed algorithm is designed in two steps, namely finding frequent temporal association rules and verifying the credibility of these rules. Equipped with two novel joining and pruning strategies, the proposed algorithm exhibits much higher efficiency which is specially pursued in the process of rule mining. Experiments performed on synthetic datasets and real datasets show that the proposed algorithm is effective and more efficient than other existing algorithms.
Machine learning for the security of healthcare systems based on Internet of Things and edge computing
2022, Cybersecurity and Cognitive Science
Using the Internet of Medical Things (IoMT) for treatment and diagnosis has exponentially grown due to its diverse use cases and efficient planning with defined resources. IoMT in the e-healthcare system enables continuous monitoring of a patient's medical indicators, which eases routine patient follow-ups and increases the productivity of human life. Nowadays, biomedical data can be easily collected from patients remotely thanks to the integration of wireless communications, wearable devices, and big data. In edge computing-based healthcare applications, the importance of privacy protection is increasing because of the openness and data sensitivity of communication channels. Dealing with the privacy and security issues for medical devices used for real-time processing and analysis is necessary through big data analysis. In this chapter, we investigate the security risks of big data platforms in health care and how machine learning can mitigate security risks.
On enumerating algorithms of novel multiple leaf-distance granular regular α-subtrees of trees
2022, Information and Computation
Citation Excerpt :
Because of the important and diverse applications of special topological structures or graph patterns, the related enumeration problems have received extensive attention for several decades. These include finding or enumerating sparse spanning subgraphs [2], connected k-subgraphs [3], frequent subgraphs [16], constrained spanning trees [20,21], and graphlet in social science [4,32]. As one of the most studied counting-based topological indices, the number of subtrees (or simply subtree number) of a graph has been extensively studied in recent years.
Subtrees and BC-subtrees (subtrees in which the distance between any two leaves is even) are important concepts in the study of complex graphical structures. In this article, we propose a novel generalization called the leaf-distance granular regular α-tree (abbreviated as LDR α-tree for short). This is a tree in which the distance between any two leaves is divisible by α (α is a positive integer). A LDR α-subtree is simply a subtree that is also a LDR α-tree. We present basic properties and generating functions related to the LDR α-subtrees enumeration. Based on those theoretical results we provide efficient algorithms for enumerating various LDR α-subtrees of trees. Our algorithms can serve as multi-distance granularity sifters of a graph to screen all the α-subtree uniformly, and thus provide novel insights into exploring new structural properties from the perspective of multiple leaf-distance granularity.

View all citing articles on Scopus

View full text

Mining frequent subgraphs in multigraphs

Abstract

Introduction

Section snippets

Related work

Preliminaries and problem definition

MuGraM: an algorithm for mining multigraphs

Experimental analyses

Conclusions

Acknowledgment

Protein Interaction Networks: Computational Analysis

Mining coherent subgraphs in multi-layer graphs with edge labels

Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Distance oracles in edge-labeled graphs

EDBT

What is frequent in a single graph?

Pacific-Asia Conference on Knowledge Discovery and Data Mining

GRAMI: frequent subgraph and pattern mining in a single large graph

Proc. VLDB

Subgraph support in a single large graph

Seventh IEEE International Conference on Data Mining Workshops

Efficient mining of graph-based data

Proceedings of the AAAI Workshop on Learning Statistical Models from Relational Data

Substucture discovery in the SUBDUE system

KDD Workshop

SuMGra: querying multigraphs via efficient indexing

International Conference on Database and Expert Systems Applications

An apriori-based algorithm for mining frequent substructures from graph data

European Conference on Principles of Data Mining and Knowledge Discovery