Abstract
We study the problem of distributed resource sharing in peer-to-peer networks and focus on the problem of information filtering. In our setting, subscriptions and publications are specified using an expressive attribute-value representation that supports both the Boolean and Vector Space models. We use an extension of the distributed hash table Chord to organise the nodes and store user subscriptions, and utilise efficient publication protocols that keep the network traffic and latency low at filtering time. To verify our approach, we evaluate the proposed protocols experimentally using thousands of nodes, millions of user subscriptions, and two different real-life corpora. We also study three important facets of the load-balancing problem in such a scenario and present a novel algorithm that manages to distribute the load evenly among the nodes. Our results show that the designed protocols are scalable and efficient: they achieve expressive information filtering functionality with low message traffic and latency.
Part of this work was performed while the authors were with the Technical University of Crete, Chania, Greece. C. Tryfonopoulos was partially supported by programme Heraclitus of the Greek Ministry of Education.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
A proximity formula is an expression of the form \(w_1 \prec _{\xi _1} \cdots \prec _{\xi _k} w_{k}\), where \(w_i\) is a word and \(\xi _i\) is a distance interval of the form \(\{[l,u]\): \(l,u \in {\mathbb {N}}, l \ge 0 \, \text {and}\, l\le u \} \cup \{[l,\infty )\): \(l \in {\mathbb {N}}\, \text {and} \, l \ge 0 \}\). The proximity operator \(\prec _{\xi }\) is used to capture the concepts of order and distance between words in a text document using intervals that impose lower and upper bounds on distances between words.
References
Hameurlain, A., Hussain, F.K., Morvan, F., Tjoa, A.M. (eds.): Globe 2012, vol. 7450. Springer, Heidelberg (2012)
Sinha, V., Gupta, A., Kohli, G.S.: Comparative study of P2P and cloud computing paradigm usage in research purposes. In: Das, V.V., Stephen, J., Chaba, Y. (eds.) CNC 2011. CCIS, vol. 142, pp. 341–347. Springer, Heidelberg (2011)
Kavalionak, H., Montresor, A.: P2P and cloud: a marriage of convenience for replica management. In: Kuipers, F.A., Heegaard, P.E. (eds.) IWSOS 2012. LNCS, vol. 7166, pp. 60–71. Springer, Heidelberg (2012)
Trajkovska, I., Salvachua Rodriguez, J., Mozo Velasco, A.: A novel P2P and cloud computing hybrid architecture for multimedia streaming with QoS cost functions. In: ACM Multimedia (2010)
Kontominas, D., Raftopoulou, P., Tryfonopoulos, C., Petrakis, E.G.: DS4: a distributed social and semantic search system. In: ECIR (2013)
Loupasakis, A., Ntarmos, N., Triantafillou, P.: eXO: decentralized autonomous scalable social networking. In: CIDR (2011)
Graffi, K., Gross, C., Mukherjee, P., Kovacevic, A., Steinmetz, R.: LifeSocial.KOM: a P2P-based platform for secure online social networks. In: P2P (2010)
Stoica, I., Morris, R., Karger, D., Kaashoek, M., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup service for internet applications. In: ACM SIGCOMM (2001)
Koubarakis, M., Skiadopoulos, S., Tryfonopoulos, C.: Logic and computational complexity for boolean information retrieval. IEEE TKDE 18(12), 1659–1666 (2006)
Tryfonopoulos, C., Idreos, S., Koubarakis, M.: Publish/Subscribe functionality in IR environments using structured overlay networks. In: ACM SIGIR (2005)
Carzaniga, A., Rosenblum, D.S., Wolf, A.: Design and evaluation of a wide-area event notification service. ACM TOCS 19(3), 332–383 (2001)
Koubarakis, M., Tryfonopoulos, C., Idreos, S., Drougas, Y.: Selective information dissemination in P2P networks: problems and solutions. SIGMOD Rec. 32(3), 71–76 (2003)
Rowstron, A., Kermarrec, A.-M., Druschel, P.: SCRIBE: the design of a large-scale event notification infrastructure. In: Crowcroft, J., Hofmann, M. (eds.) NGC 2001. LNCS, vol. 2233, pp. 30–43. Springer, Heidelberg (2001)
Pietzuch, P., Bacon, J.: Hermes: a distributed event-based middleware architecture. In: DEBS (2002)
Tam, D., Azimi, R., Jacobsen, H.-A.: Building content-based publish/subscribe systems with distributed hash tables. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) DBISP2P 2003. LNCS, vol. 2944, pp. 138–152. Springer, Heidelberg (2004)
Terpstra, W., Behnel, S., Fiege, L., Zeidler, A., Buchmann, A.: A peer-to-peer approach to content-based publish/subscribe. In: DEBS (2003)
Gedik, B., Liu, L.: PeerCQ: a decentralized and self-configuring peer-to-peer information monitoring system. In: ICDCS (2003)
Karger, D., Lehman, E., Leighton, T., Levine, M., Lewin, D., Panigrahy, R.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In: ACM STOC (1997)
Bender, M., Bender, M., Michel, S., Michel, S., Parkitny, S., Parkitny, S., Weikum, G., Weikum, G.: A comparative study of pub/sub methods in structured P2P networks. In: Moro, G. (ed.) DBISP2P 2005 and DBISP2P 2006. LNCS, vol. 4125, pp. 385–396. Springer, Heidelberg (2007)
Triantafillou, P., Aekaterinidis, I.: Content-based publish-subscribe over structured P2P networks. In: DEBS (2004)
Gupta, A., Sahin, O.D., Agrawal, D.P., El Abbadi, A.: Meghdoot: content-based publish/subscribe over P2P networks. In: Jacobsen, H.-A. (ed.) Middleware 2004. LNCS, vol. 3231, pp. 254–273. Springer, Heidelberg (2004)
Aekaterinidis, I., Triantafillou, P.: PastryStrings: a comprehensive content-based publish/subscribe DHT network. In: ICDCS (2006)
Aekaterinidis, I., Triantafillou, P.: Internet scale string attribute publish/subscribe data networks. In: CIKM (2005)
Tran, D., Pham, C.: Enabling content-based publish/subscribe services in cooperative P2P networks. Comput. Netw. 54(11), 1739–1749 (2010)
Lo, S.C., Chiu, Y.T.: Design of content-based publish/subscribe systems over structured overlay networks. IEICE Trans. E91–D(5), 1504–1511 (2008)
Liau, C.Y., Ng, W.S., Shu, Y., Tan, K.-L., Bressan, S.: Efficient range queries and fast lookup services for scalable P2P networks. In: Ng, W.S., Ooi, B.-C., Ouksel, A.M., Sartori, C. (eds.) DBISP2P 2004. LNCS, vol. 3367, pp. 93–106. Springer, Heidelberg (2005)
Tryfonopoulos, C., Zimmer, C., Koubarakis, M., Weikum, G.: Architectural alternatives for information filtering in structured overlay networks. IEEE Internet Comput. 11(4), 24–34 (2007)
Zheng, X., Luo, J., Cao, J.: Pat: a P2P based publish/subscribe system for QoS information dissemination of web services. In: ICWS (2009)
Cheung, A.Y., Jacobsen, H.A.: Load balancing content-based publish/subscribe systems. ACM TOCS 28(4), 46–100 (2010)
Bernard, S., Potop-Butucaru, M.G., Tixeuil, S.: A framework for secure and private P2P publish/subscribe. In: Dolev, S., Cobb, J., Fischer, M., Yung, M. (eds.) SSS 2010. LNCS, vol. 6366, pp. 531–545. Springer, Heidelberg (2010)
Drosou, M., Stefanidis, K., Pitoura, E.: Preference-aware publish/subscribe delivery with diversity. In: DEBS (2009)
Tang, C., Xu, Z.: pFilter: global information filtering and dissemination using structured overlays. In: FTDCS (2003)
Zhu, Y., Hu, Y.: Ferry: a P2P-based architecture for content-based publish/subscribe services. IEEE TPDS 18(5), 672–685 (2007)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. In: ACM SIGCOMM (2001)
Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)
Stonebraker, M., Aoki, P., Litwin, W., Pfeffer, A., Sah, A., Sidell, J., Staelin, C., Yu, A.: Mariposa: a wide-area distributed database system. VLDB J. 5(1), 48–63 (1996)
Litwin, W., Neimat, M.A., Schneider, D.A.: LH* - a scalable, distributed data structure. ACM TODS 21(4), 480–525 (1996)
Balakrishnan, H., Kaashoek, M., Karger, D., Morris, R., Stoica, I.: Looking up data in P2P systems. CACM 46(2), 43–48 (2003)
Huebsch, R., Hellerstein, J., Lanham, N., Loo, B., Shenker, S., Stoica, I.: Querying the internet with PIER. In: VLDB (2003)
Harren, M., Hellerstein, J.M., Huebsch, R., Loo, B.T., Shenker, S., Stoica, I.: Complex queries in DHT-based peer-to-peer networks. In: Druschel, P., Kaashoek, F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 242–250. Springer, Heidelberg (2002)
Idreos, S., Tryfonopoulos, C., Koubarakis, M.: Distributed evaluation of continuous Equi-join queries over large structured overlay networks. In: ICDE (2006)
Palma, W., Akbarinia, R., Pacitti, E., Valduriez, P.: DHTJoin: processing continuous join queries using DHT networks. DPD 26(2–3), 291–317 (2009)
Dédzoé, W.K., Lamarre, P., Akbarinia, R., Valduriez, P.: Efficient early top-k query processing in overloaded P2P systems. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011, Part I. LNCS, vol. 6860, pp. 140–155. Springer, Heidelberg (2011)
Cai, M., Frank, M., Yan, B., MacGregor, R.: A subscribable peer-to-peer RDF repository for distributed metadata management. J. Web Semant. 2(2), 109–130 (2004)
Liarou, E., Idreos, S., Koubarakis, M.: Continuous RDF query processing over DHTs. In: ISWC (2007)
Lohrmann, B., Battré, D., Kao, O.: Towards parallel processing of RDF queries in DHTs. In: Hameurlain, A., Tjoa, A.M. (eds.) Globe 2009. LNCS, vol. 5697, pp. 36–47. Springer, Heidelberg (2009)
Battré, D., Heine, F., Höing, A., Hovestadt, M., Kao, O., Liebetruth, C.: Dynamic knowledge in DHT based RDF stores. In: SWWS (2008)
Belkin, N., Croft, W.: Information filtering and information retrieval: two sides of the same coin? CACM 35(12), 29–38 (1992)
Li, J., Loo, B., Hellerstein, J., Kaashoek, M., Karger, D., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: Frans Kaashoek, M., Stoica, I. (eds.) IPTPS 2003, vol. 2735, pp. 207–215. Springer, Heidelberg (2003)
Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D. (eds.) Middleware 2003, vol. 2672, pp. 21–40. Springer, Heidelberg (2003)
Hsiao, H.C., King, C.T.: Similarity discovery in structured P2P overlays. In: ICPP (2003)
Tryfonopoulos, C., Idreos, S., Koubarakis, M.: LibraRing: an architecture for distributed digital libraries based on DHTs. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds.) ECDL 2005. LNCS, vol. 3652, pp. 25–36. Springer, Heidelberg (2005)
Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: MINERVA: collaborative P2P search (Demo). In: VLDB (2005)
Gounaris, A., Fernandes, A., Papadopoulos, A., C. Yfoulis: Parallel query processing on the grid. In: Advances in Parallel Computing (2009)
Narendula, R., Papaioannou, T., Aberer, K.: My3: a highly-available P2P-based online social network. In: P2P (2011)
Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, M.F., Dabek, F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM TON 11(1), 17–32 (2003)
Tryfonopoulos, C., Koubarakis, M., Drougas, Y.: Information filtering and query indexing for an information retrieval model. ACM TOIS 27(2), 1–47 (2009)
Yan, T., Garcia-Molina, H.: The SIFT information dissemination system. ACM TODS 24(4), 529–565 (1999)
Huebsch, R.: Content-based multicast: comparison of implementation options. Technical Report UCB//CSD-03-1229, UC Berkeley (2003)
Pitoura, T., Ntarmos, N., Triantafillou, P.: Replication, load balancing and efficient range query processing in DHTs. In: Ioannidis, Y. (ed.) EDBT 2006. LNCS, vol. 3896, pp. 131–148. Springer, Heidelberg (2006)
Gopalakrishnan, V., Silaghi, B., Bhattacharjee, B., Keleher, P.: Adaptive replication in peer-to-peer systems. In: ICDCS (2004)
Shen, H.: Efficient and effective file replication in structured P2P file sharing systems. In: P2P (2009)
Deb, S., Linga, P., Rastogi, R., Srinivasan, A.: Accelerating lookups in P2P systems using peer caching. In: ICDE (2008)
Bhattacharjee, B., Chawathe, S., Gopalakrishnan, V., Keleher, P., Silaghi, B.: Efficient peer-to-peer searches using result-caching. In: Frans Kaashoek, M., Stoica, I. (eds.) IPTPS 2003, vol. 2735, pp. 225–236. Springer, Heidelberg (2003)
Dong, L.: Automatic term extraction and similarity assessment in a domain specific document corpus. Master’s thesis, Department of Computer Science, Dalhousie University (2002)
Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. IJDL 3(2), 115–130 (2000)
Karger, D.R., Ruhl, M.: Simple efficient load balancing algorithms for peer-to-peer systems. In: SPAA (2004)
Datta, A., Schmidt, R., Aberer, K.: Query-load balancing in structured overlays. In: CCGRID (2007)
Miliaraki, I., Kaoudi, Z., Koubarakis, M.: XML data dissemination using automata on top of structured overlay networks. In: WWW (2008)
Kaoudi, Z., Koubarakis, M., Kyzirakos, K., Miliaraki, I., Magiridou, M., Papadakis-Pesaresi, A.: Atlas: storing, updating and querying RDF(S) data on top of DHTs. J. Web Sem. 8(4), 271–277 (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Tryfonopoulos, C., Idreos, S., Koubarakis, M., Raftopoulou, P. (2014). Distributed Large-Scale Information Filtering. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIII. Lecture Notes in Computer Science(), vol 8420. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54426-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-54426-2_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54425-5
Online ISBN: 978-3-642-54426-2
eBook Packages: Computer ScienceComputer Science (R0)