Abstract
In current big data era, there has been an explosive growth of various data. Most of these large volume of data are non-structured or semi-structured (e.g., tweets, weibos or blogs), which are difficult to be managed and organized. Therefore, an effective and efficient classification algorithm for such data is essential and critical. In this article, we focus on a specific kind of non-structured/semi-structured data in our daily life: recipe data. Furthermore, we propose the document model and similarity-based classification algorithm for big non-structured and semi-structured recipe data. By adopting the proposed algorithm and system, we conduct the experimental study on a real-world dataset. The results of experiment study verify the effectiveness of the proposed approach and framework.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: A database benchmark based on the facebook social graph. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1185–1196. ACM (2013)
Bischoff, K., Firan, C.S., Nejdl, W., Paiu, R.: Can all tagsbe used for search? In: Proceedings of CIKM 08, Napa Valley, California, USA, October 26-30, pp. 193–202. ACM, New York, NY, USA (2008)
Cai, Y., Li, Q., Xie, H., Yu, L.: Personalized resource search by tag-based user profile and resource profile. In: Chen, L., Triantafillou, P., Suel, T. (eds.) WISE 2010. LNCS, vol. 6488, pp. 510–523. Springer, Heidelberg (2010)
Chen, Y., Wang, W., Liu, Z., Lin, X.: Keyword search on structured and semi-structured data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1005–1010. ACM (2009)
Feng, X., Peng, Y., Xie, H., Yan, Z.: Role-based learning path discovery for collaborative business environment. In: International Conference on Control, Automation and Systems Engineering (CASE), pp. 1–4. IEEE (2011)
Feng, X., Xie, H., Peng, Y., Chen, W., Sun, H.: Groupized learning path discovery based on member profile. In: Luo, X., Cao, Y., Yang, B., Liu, J., Ye, F. (eds.) ICWL 2010. LNCS, vol. 6537, pp. 301–310. Springer, Heidelberg (2011)
Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32, 198–208 (2006)
Gou, L., Zhou, M.X., Yang, H., Knowme, S.: Understanding automatically discovered personality traits from social media and user sharing preferences. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 955–964. ACM (2014)
Gupta, M., Li, R., Yin, Z., Han, J.: Survey on social tagging techniques. SIGKDD Explor. Newsl. 12, 58–72 (2010)
Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Disc. Data (TKDD) 2(2), 10 (2008)
Jin, T., Xie, H., Lei, J., Li, Q., Li, X., Mao, X., Rao, Y.: Finding dominating set from verbal contextual graph for personalized search in folksonomy. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol. 1, pp. 367–372. IEEE (2013)
Kuncheva, L., Bezdek, J.C., et al.: Nearest prototype classification: Clustering, genetic algorithms, or random search? IEEE Trans. Syst. Man Cybern., Part C: Appl. Rev. 28(1), 160–164 (1998)
Lesbegueries, J., Gaio, M., Loustau, P.: Geographical information access for non-structured data. In: Proceedings of the ACM Symposium on Applied Computing, pp. 83–89. ACM (2006)
Li, X., Xie, H., Chen, L., Wang, J., Deng, X.: News impact on stock price return via sentiment analysis. Knowl. Based Syst. 69, 14–23 (2014)
Li, X., Xie, H., Song, Y., Li, Q., Shanfeng Zhu, F., Wang, L.: Does summarization help stock prediction? News impact analysis via summarization. IEEE Intell. Syst. 30, 26–34 (2015)
Mansmann, S., Rehman, N.U., Weiler, A., Scholl, M.H.: Discovering olap dimensions in semi-structured data. Inf. Syst. 44, 120–133 (2014)
Mao, X., Li, Q., Xie, H., Rao, Y.: Popularity tendency analysis of ranking-oriented collaborative filtering from the perspective of loss function. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part I. LNCS, vol. 8421, pp. 451–465. Springer, Heidelberg (2014)
Rao, Y., Lei, J., Wenyin, L., Li, Q., Chen, M.: Building emotional dictionary for sentiment analysis of online news. World Wide Web 17(4), 723–742 (2014)
Tang, J., Chang, Y., Liu, H.: Mining social media with social theories: A survey. ACM SIGKDD Explorations Newsletter 15(2), 20–29 (2014)
Xindong, W., Zhu, X., Gong-Qing, W., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Xie, H.-R., Li, Q., Cai, Y.: Community-aware resource profiling for personalized search in folksonomy. J. Comput. Sci. Technol. 27(3), 599–610 (2012)
Xie, H., Li, Q., Mao, X.: Context-aware personalized search based on user and resource profiles in folksonomies. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds.) APWeb 2012. LNCS, vol. 7235, pp. 97–108. Springer, Heidelberg (2012)
Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Rao, Y.: Community-aware user profile enrichment in folksonomy. Neural Netw. 58, 111–121 (2014)
Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Zheng, Q.: Mining latent user community for tag-based and content-based search in social media. Comput. J. 57(9), 1415–1430 (2014)
Xie, H., Yu, L., Li, Q.: A hybrid semantic item model for recipe search by example. In: IEEE International Symposium on Multimedia (ISM), pp. 254–259. IEEE (2010)
Xiong, C., Callan, J.: Esdrank: Connecting query and documents through external semi-structured data. In: International Conference on Information and Knowledge Management, pp. 951–960. ACM (2015)
Yang, W., Ren, L.-Y., Tang, R.: A dictionary mechanism for chinese word segmentation based on the finite automata. In: International Conference on Asian Language Processing (IALP), pp. 39–42. IEEE (2010)
Yi, J., Sundaresan, N.: A classifier for semi-structured documents. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 340–344. ACM (2000)
Yu, L., Li, Q., Xie, H., Cai, Y.: Exploring folksonomy and cooking procedures to boost cooking recipe recommendation. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds.) APWeb 2011. LNCS, vol. 6612, pp. 119–130. Springer, Heidelberg (2011)
Zou, D., Xie, H., Li, Q., Wang, F.L., Chen, W.: The load-based learner profile for incidental word learning task generation. In: Popescu, E., Lau, R.W.H., Pata, K., Leung, H., Laanpere, M. (eds.) ICWL 2014. LNCS, vol. 8613, pp. 190–200. Springer, Heidelberg (2014)
Zou, D., Xie, H., Wang, F.L., Wong, T.-L., Wu, Q.: Investigating the effectiveness of the uses of electronic and paper-based dictionaries in promoting incidental word learning. In: Cheung, S.K.S., Kwok, L.-F., Yang, H., Fong, J., Kwan, R. (eds.) ICHL 2015. LNCS, vol. 9167, pp. 59–69. Springer, Heidelberg (2015)
Acknowledgement
This work is supported by Fundamental Research Funds of Agricultural Information Institute, Chinese Academy of Agricultural Sciences (No. 2014-J-011), and Project of Ministry of Agriculture of China “Agricultural information monitoring and early-warning”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, W., Zhao, X. (2016). Similarity-Based Classification for Big Non-Structured and Semi-Structured Recipe Data. In: Gao, H., Kim, J., Sakurai, Y. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9645. Springer, Cham. https://doi.org/10.1007/978-3-319-32055-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-32055-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32054-0
Online ISBN: 978-3-319-32055-7
eBook Packages: Computer ScienceComputer Science (R0)