Skip to main content

Intra-Firm Information Flow: A Content-Structure Perspective

  • Conference paper
  • 1340 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7014))

Abstract

This paper endeavors to bring together two largely disparate areas of research. On one hand, text mining methods treat each document as an independent instance despite the fact that in many text domains, documents are linked and their topics are correlated. For example, web pages of related topics are often connected by hyperlinks and scientific papers from related fields are typically linked by citations. On the other hand, Social Network Analysis (SNA) typically treats edges between nodes according to ”flat” attributes in binary form alone. This paper proposes a simple approach that addresses both these issues in data mining scenarios involving corpora of linked documents. According to this approach, after assigning weights to the edges between documents, based on the content of the documents associated with each edge, we apply standard SNA and network theory tools to the network. The method is tested on the Enron email corpus and successfully discovers the central people in the organization and the relevant communications between them. Furthermore, Our findings suggest that due to the non-conservative nature of information, conservative centrality measures (such as PageRank) are less adequate here than non-conservative centrality measures (such as eigenvector centrality).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wasserman, S., Faust, K.: Social network analysis: Methods and applications. Cambridge University Press, Cambridge (1994)

    Book  MATH  Google Scholar 

  2. Newman, M.E.J.: Who is the best connected scientist? A study of scientific coauthorship networks in Complex Networks. In: Ben-Naim, E., Frauenfelder, H., Toroczkai, Z. (eds.) pp. 337–370. Springer, Berlin (2004)

    Google Scholar 

  3. Onnela, J.-P., Saramäki, J., Hyvonen, J., Szabó, G., Argollo de Menezes, M., Kaski, K., Barabási, A.-L., Kertész, J.: Analysis of a large-scale weighted network of one-to-one human communication. New J. Phys. 9, 179 (2007)

    Article  Google Scholar 

  4. Wu, F., Huberman, B.A., Adamic, L.A., Tyler, J.R.: Information flow in social groups. Physica A 337, 327–335 (2004)

    Article  MathSciNet  Google Scholar 

  5. Kleinbaum, A.M., Stuart, T.E., Tushman, M.L.: Communication (and Coordination?) in a Modern, Complex Organization. Harvard Business School Working Paper, no. 09-004 (July 2008)

    Google Scholar 

  6. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, 1st edn. The MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  7. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.U.: Complex networks: structure and dynamics. Physics Reports 424, 175–308 (2006)

    Google Scholar 

  8. Athreya, K.B., Ney, P.E.: Branching Processes. Courier Dover Publications (2004)

    Google Scholar 

  9. Shetty, J., Adibi, J.: The Enron email dataset database schema and brief statistical report (Technical Report). Information Sciences Institute (2004)

    Google Scholar 

  10. McCallum, A., Corrada-Emmanuel, A., Wang, X.: Topic and Role Discovery in Social Networks. In: IJCAI (2005)

    Google Scholar 

  11. Kleinberg Jon, M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)

    Book  MATH  Google Scholar 

  13. Kurland, O., Lee, L.: Respect my authority! HITS without hyperlinks, utilizing cluster-based language models. In: Proceedings of SIGIR 2006, pp. 83–90 (2006)

    Google Scholar 

  14. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Tech. rep. Stanford Digital Library Technologies Project (1998)

    Google Scholar 

  15. Burgess, M., Canright, G., Engø-Monsen, K.: Mining location importance from the eigenvectors of directed graphs (2006), http://research.iu.hio.no/papers/directed.pdf

  16. Langville Amy, N., Meyer Carl, D.: Deeper inside PageRank. Internet Mathematics Journal (2004)

    Google Scholar 

  17. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, vol. 20 (2004)

    Google Scholar 

  18. Hirsch, J.E.: An index to quantify an individual’s scientific research output. PNAS 102(46), 16569–16572 (2005)

    Article  Google Scholar 

  19. Mimno, D., McCallum, A.: Mining a digital library for influential authors. In: Joint Conference on Digial Libraries, JCDL (2007)

    Google Scholar 

  20. Frikh, B., Djanfar, A.S., Ouhbi, B.: An intelligent surfer model combining web contents and links based on simultaneous multiple-term query. In: Computer Systems and Applications, AICCSA 2009 (2009)

    Google Scholar 

  21. Richardson, M., Domingos, P.: Combining Link and Content Information in Web Search. Web Dynamics, 179–194 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Berchenko, Y., Daliot, O., Brueller, N.N. (2011). Intra-Firm Information Flow: A Content-Structure Perspective. In: Gama, J., Bradley, E., Hollmén, J. (eds) Advances in Intelligent Data Analysis X. IDA 2011. Lecture Notes in Computer Science, vol 7014. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24800-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24800-9_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24799-6

  • Online ISBN: 978-3-642-24800-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics