Skip to main content
Log in

Beyond rankings: comparing directed acyclic graphs

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Defining appropriate distance measures among rankings is a classic area of study which has led to many useful applications. In this paper, we propose a more general abstraction of preference data, namely directed acyclic graphs (DAGs), and introduce a measure for comparing DAGs, given that a vertex correspondence between the DAGs is known. We study the properties of this measure and use it to aggregate and cluster a set of DAGs. We show that these problems are \(\mathbf {NP}\)-hard and present efficient methods to obtain solutions with approximation guarantees. In addition to preference data, these methods turn out to have other interesting applications, such as the analysis of a collection of information cascades in a network. We test the methods on synthetic and real-world datasets, showing that the methods can be used to, e.g., find a set of influential individuals related to a set of topics in a network or to discover meaningful and occasionally surprising clustering structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Most often the Kendall-tau distance is defined to be a value between 0 and 1 by normalizing with the total number of vertex pairs \({{|V|} \atopwithdelims ()2}\).

  2. The dataset can be downloaded at http://users.ics.aalto.fi/emalmi/artist_preference_data.zip.

References

  • Ailon N (2010) Aggregation of partial rankings, p-ratings and top-\(m\) lists. Algorithmica 57(2):284–300

    Article  MathSciNet  MATH  Google Scholar 

  • Ailon N, Charikar M, Newman A (2008) Aggregating inconsistent information: ranking and clustering. J ACM 55(5):23

    Article  MathSciNet  Google Scholar 

  • Anagnostopoulos A, Kumar R, Mahdian M (2008) Influence and correlation in social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. pp 7–15

  • Barbieri N, Bonchi F, Manco G (2013) Cascade-based community detection. In: Proceedings of the sixth ACM international conference on Web search and data mining. pp 33–42

  • Bender MA, Fineman JT, Gilbert S, Tarjan RE (2011) A new approach to incremental cycle detection and related problems. arXiv:1112.0784

  • Borda J (1781) Mémoire sur les élections au scrutin. Histoire de l’Académie Royale des Sciences

  • Brandenburg F, Gleißner A, Hofmeier A (2012) Comparing and aggregating partial orders with Kendall tau distances. In: WALCOM: algorithms and computation. Lecture notes in computer science, vol 7157. Springer Berlin Heidelberg, pp 88–99

  • Brandenburg F, Gleißner A, Hofmeier A (2013) The nearest neighbor Spearman footrule distance for bucket, interval, and partial orders. J Comb Optim 26(2):310–332

    Article  MathSciNet  MATH  Google Scholar 

  • Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3):255–259

    Article  MATH  Google Scholar 

  • Dinur I, Safra S (2005) On the hardness of approximating minimum vertex cover. Ann Math 162(1):439–485

    Article  MathSciNet  MATH  Google Scholar 

  • Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international conference on World Wide Web. pp 613–622

  • Even G, Naor J, Schieber B, Sudan M (1995) Approximating minimum feedback sets and multi-cuts in directed graphs. In: Proceedings of the 4th international conference on integer programming and combinatorial optimization. pp 14–28

  • Fagin R, Kumar R, Mahdian M, Sivakumar D, Vee E (2006) Comparing partial rankings. SIAM J Discrete Math 20(3):628–648

    Article  MathSciNet  MATH  Google Scholar 

  • Fagin R, Kumar R, Sivakumar D (2003) Comparing top-\(k\) lists. SIAM J Discrete Math 17(1):134–160

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann Stat 7(4):697–717

    Article  MathSciNet  MATH  Google Scholar 

  • Gomez-Rodriguez M, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks. In: Proceedings of the 28th international conference on machine learning. pp 561–568

  • Gomez-Rodriguez M, Leskovec J, Krause A (2012) Inferring networks of diffusion and influence. ACM Trans Knowl Discov Data 5(4):21

    Article  Google Scholar 

  • Goodman LA, Kruskal WH (1972) Measures of association for cross classifications, iv: simplification of asymptotic variances. J Am Stat Assoc 67(338):415–421

    Article  MATH  Google Scholar 

  • Goyal A, Bonchi F, Lakshmanan LVS (2008) Discovering leaders from community actions. In: Proceedings of the 17th ACM conference on information and knowledge management. pp 499–508

  • Goyal A, Bonchi F, Lakshmanan LVS (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining. pp 241–250

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  Google Scholar 

  • Jiang X, Munger A, Bunke H (2001) An median graphs: properties, algorithms, and applications. IEEE Trans Pattern Anal Mach Intell 23(10):1144–1151

    Article  Google Scholar 

  • Kann V (1992) On the approximability of np-complete optimization problems. Ph.D. thesis, KTH

  • Karp RM (1972) Reducibility among combinatorial problems. In: Complexity of computer computations. Springer, New York

  • Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. pp 137–146

  • Kendall M (1938) A new measure of rank correlation. Biometrika 30:81–93

    Article  MathSciNet  MATH  Google Scholar 

  • Kendall M (1976) Rank correlation methods, 4th edn. Hodder Arnold, London

    Google Scholar 

  • Kenyon-Mathieu C, Schudy W (2007) How to rank with few errors. In: Proceedings of the 39th annual ACM symposium on theory of computing. pp 95–103

  • Laming D (2003) Human judgment: the eye of the beholder. Cengage Learning EMEA

  • Macchia L, Bonchi F, Gullo F, Chiarandini L (2013) Mining summaries of propagations. In: Proceedings of the 13th IEEE international conference on data mining. pp 498–507

  • Madden JI (1995) Analyzing and modeling rank data. Chapman & Hall, London

    Google Scholar 

  • Murphy TB, Martin D (2003) Mixtures of distance-based models for ranking data. Comp Stat Data Anal 41(3–4):645–655

    Article  MathSciNet  MATH  Google Scholar 

  • Saito K, Nakano R, Kimura M (2008) Prediction of information diffusion probabilities for independent cascade model. In: Knowledge-based intelligent information and engineering systems. pp 67–75

  • Su H, Gionis A, Rousu J (2014) Structured prediction of network response. In: Proceedings of the 31st international conference on machine learning. pp 442–450

Download references

Acknowledgments

The authors are grateful to Nicola Barbieri for providing the Last.fm dataset. We also thank the anonymous reviewers for their constructive feedback. This work was supported by Academy of Finland grant 118653 (ALGODAN).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric Malmi.

Additional information

Responsible editors: Joao Gama, Indre Zliobaite, Alipio Jorge, Concha Bielza.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malmi, E., Tatti, N. & Gionis, A. Beyond rankings: comparing directed acyclic graphs. Data Min Knowl Disc 29, 1233–1257 (2015). https://doi.org/10.1007/s10618-015-0406-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0406-1

Keywords

Navigation