Next Article in Journal
Back Pain and Knowledge of Back Care Related to Physical Activity in 12 to 17 Year Old Adolescents from the Region of Murcia (Spain): ISQUIOS Programme
Previous Article in Journal
The Contribution of Chicken Products’ Export to Economic Growth: Evidence from China, the United States, and Brazil
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting the Helpfulness of Online Restaurant Reviews Using Different Machine Learning Algorithms: A Case Study of Yelp

1
College of Business Administration, Capital University of Economics and Business, Beijing 100070, China
2
School of Business Administration, Southwestern University of Finance and Economics, Chengdu 611130, China
*
Author to whom correspondence should be addressed.
Sustainability 2019, 11(19), 5254; https://doi.org/10.3390/su11195254
Submission received: 31 August 2019 / Revised: 18 September 2019 / Accepted: 23 September 2019 / Published: 25 September 2019
(This article belongs to the Section Economic and Business Aspects of Sustainability)

Abstract

:
Helpful online reviews could be utilized to create sustainable marketing strategies in the restaurant industry, which contributes to national sustainable economic development. This study, the main aspects (including food/taste, experience, location, and value) from 294,034 reviews on Yelp.com were extracted empirically using the Latent Dirichlet Allocation (LDA) and positive and negative sentiment were assigned to each extracted aspect. Positive sentiments were associated with food/taste, while negative sentiments were associated with value. This study further proves a robust classification algorithm based on Support Vector Machine (SVM) with a Fuzzy Domain Ontology (FDO) algorithm outperforms other traditional classification algorithms such as Naïve Bayes (MB) and SVM ontology in predicting the helpfulness of online reviews. This study enriches the literature on managerial aspects of sustainability by analyzing a large amount of plain text data that customers generated. The results of this study could be used as sustainable marketing strategy for review website developers to design sophisticated, intelligence review systems by enabling customers to sort and filter helpful reviews based on their preferences. The extracted aspects and their assigned sentiment could also help restaurateurs better understand how to meet diverse customers’ needs and maintain sustainable competitive advantages.

1. Introduction

Many people hold the misconception that sustainability relates only to the natural environment and overlook the importance of promoting sustainable economic development and creating sustainable business strategy. Metrics of sustainable economic development include, but are not limited to: Local economic growth, local and small business growth, and cost of living [1]. The restaurant industry plays an important role in promoting sustainable economic development in the U.S. According to National Restaurant Association [2], as of 2019, revenues generated by the restaurant industry in the U.S. was estimated at $863 billion, accounting for 4% of the U.S. gross domestic product (GDP). Moreover, the restaurant industry employs approximately $15.3 million people, which accounts for 10% of the overall U.S. workforce.
However, it is noticed that restaurants are struggling to survive due to a number of factors such as intense competitions, rising food prices, and high labor costs [3]. An earlier research study has demonstrated that around 60% of restaurants fail within three years [4]. Forbes [5] further reported that the restaurant failure rate is 30% within the first year, and 30% of those that survive shutter in the following two operation years. As such, how to achieve sustained business performance becomes a critical issue for the restaurant industry [6]. Given that the development of restaurant industry could reduce unemployment rate and promote local economic growth, it is important to develop sustainable marketing strategies to drive restaurant performance.
In today’s digital era, eWOM (electronic Word-of-Mouth) have outweighed the traditional marketing strategies regarding influence on customers’ purchase decisions. According to 2017 TripAdvisor Restaurant Marketing Survey, it is reported that 94% of diners choose a restaurant based on online reviews. Restaurateurs agreed that online listing service is one of the most effective marketing channels for driving more businesses [7]. The amount of online reviews is growing at an unprecedented rate. According to Statista, the number of reviews submitted to Yelp has reached $148.3 million, which is more than twice the amount from 2014 [8]. Customers often feel overwhelmed while confronting the abundance of messages online [9]. Kwon et al. [10] observed that customers tend to rely on a very limited number of reviews in making purchase decisions. Therefore, they may resort to helpful reviews to gain a general idea of products or services. A helpful review is defined as “a peer-generated product evaluation that facilitates the consumer’s purchase decision process” [11]. Presenting helpful reviews could help customers reduce the time and effort of searching relevant information from a large volume of online reviews [12]. It is also valuable for marketers to gain customers feedback to improve their products or services. Thus, it is important for restaurant owners and marketers to understand how to make use of helpful online reviews to make their businesses stand out from competitors listed online.
Numerous prior studies have identified factors influencing review helpfulness for both search goods (e.g., furniture, digital camera, cell phone) and experience goods (e.g., restaurant, hotel) [11,13,14]. However, these factors are dominantly measured using Likert scales or numerical metrics (e.g., review volume, star rating; sentence length) [11,15], neglecting more hidden semantic structures, such as emotions and linguistic styles conveyed through a textual content. With digital texts increasing in size, a number of studies have applied machine learning (ML), which studies a computer’s ability to learn from data without being explicitly programmed [16], to predict review helpfulness. However, prior scholars tend to predict review helpfulness in the hotel and e-commerce industry (e.g., Amazon.com), there is a relatively small number of studies predicting review helpfulness in the restaurant industry. Also, few studies have made effort to propose an appropriate ML-based text mining technique to predict restaurant review helpfulness using both important dining aspects and emotional contents.
To address these literature gaps, this study aims to extract both emotions and most frequently mentioned dining aspects, then predict review helpfulness by comparing different ML-based text mining techniques. This study is guided by the following three questions: (1) What dining aspects are most important to customers? (2) What attitudes (positive or negative) are expressed regarding each dining aspect? (3) Considering restaurant aspects and their sentiment, which machine learning method performs better in predicting review helpfulness? It is expected that these study results could serve as a guidance for website developers to design better review systems to keep and achieve a sustainable competitive advantage over numerous online review websites. The extracted restaurant aspects and results from sentiment analysis could also help restaurateurs better understand how to constantly improve customers’ dining experiences.
The rest of this paper proceeds as follows. Section 2 illustrates related work on eWOM and review helpfulness prediction. The methods, including data collection and data analysis process are explained in Section 3. The main results are presented in Section 4. In the final section, the conclusion, a discussion on theoretical contributions and practical implications, limitations and future research are presented.

2. Literature Review

2.1. The Role of eWOM in Promoting Business Sustainability

With the proliferation of the Internet, eWOM becomes a popular mode of communication. eWOM is defined as any positive or negative statement about a product or company made by actual, potential, or former customers to a multitude of people in an online format [17]. The importance of maintaining a sustainable business using eWOM lies in its impact on customers’ trust, purchase intentions and sales performance [18,19,20,21]. It is reported that 87% of customers will not consider businesses with low ratings, while 92% of customers rely on online reviews to determine whether the businesses are good [22].
Online reviews are particularly important in purchasing experiential goods featured with intangible attributes (e.g., hotels, restaurants) since prospective customers cannot experience the products in advance [23]. Given that the restaurant industry is a collection of experiences dominated by intangible and impalpable elements, customers were unable to assess objectively the characteristics of products or services prior to consumption or spatial movement. Therefore, customers tend to obtain detailed information from different sources before making a decision as a means to reduce perceived uncertainty and risk [24]. The volume of restaurant reviews customers make was found to be an indicator of the popularity of restaurant [25]. Kim et al. [26] further found the number of online reviews had a positive impact on restaurant performance.
Additionally, understanding online reviews is also beneficial for business owners and marketers to gain clearer insight into customers’ attitudes and behavior, which can be used by practitioners to improve their service, and create a sustainable competitive advantage. Prior studies have analyzed online reviews to understand customer experience and satisfaction across different service contexts including hotel industry, short-term rental industry, airline industry, and wellness industry [27,28,29]. In the restaurant industry, Pantelids [30] empirically examined meal experience using 2471 online restaurant comments, revealing six salient factors in a diner’s evaluation of a restaurant: food, service, ambience, price, menu, and décor. Yan et al. [31] analyzed quantitative scores of 10,136 Chinese restaurant reviews and found a similar result revealing food quality, price and value, service quality, and atmosphere influenced customers revisit intention.

2.2. Studies on Review Helpfulness Prediction

A number of online platforms have offered the mechanisms for other users to evaluate online reviews [32]. For example, Amazon.com and TripAdvisor allow customers to vote for the reviews that are perceived as helpful in their decision-making process. The number of helpful votes could signal the quality of message contents [33]. Retail website developers could also increase their website traffic by presenting helpful reviews as a differentiation strategy [11].
Numerous previous scholars have identified the factors influencing “helpfulness” of online customer reviews. Based on a heuristic systematic model (HSM), explored factors influencing review helpfulness could be divided into two types: (1) Central route cues, which are associated with review content features, such as review content quality, review length, review readability, review types and review extremity [13,34,35]; and (2) peripheral cues, which are associated with information source features, such as reviewer expertise, reviewers’ gender, reviewer reputation [36,37,38]. However, Hong et al. [39] noticed that the predictors of review helpfulness yield inconstant conclusion. They conducted a meta-analysis and found that review readability and review ratings did not significantly influence review helpfulness. However, it is observed that the aforementioned determinants of review helpfulness are dominantly featured by numerical features [15]. In recent years, research on review helpfulness has focused on semantic features and linguistic style in online reviews [15,40].
Table 1 summarizes the studies on predicting review helpfulness on two popular business review sites (Yelp and Tripadvisor) during the period from 2015 to 2019. Racherla and Friske [37] examined the impact of reviewer factors and review factors on perceived review usefulness across three types of categories. Among which, restaurants were chosen as experiential-based services. The results indicated that experiential-based services are a function of individual taste. Liu and Park [38] later extended the work of Racherla and Friske [37] by focusing exclusively on restaurant reviews. They added review readability and customer perceived enjoyment as two qualitative antecedents of review usefulness and identified these two variables as the most influential factors of review helpfulness. Ngo-Ye and Sinha [41] further conducted recency (the recency of purchase), frequency (frequency or total number of purchases), and the average amount spent per transaction (monetary value) (RFM) analysis in predicting review helpfulness. In addition to widely examined review and reviewer-specific characteristics, other factors influencing review helpfulness include review orders [42], emotions, and linguistic styles [15], temporal, exploratory and sensory cues [43]. Restaurants in New York City, Las Vegas, and Los Angeles are most likely to be selected as study samples [15,37,42]. Although recent studies have started to examine the deeper structure and patterns of textual data, few studies have taken into account both emotions and restaurant experience aspects in predicting restaurant review helpfulness.

2.3. Machine Learning for eWOM

A few decades ago, researchers tend to conduct content analysis manually to identify the product or service features most important to customers based on the word frequency [30,44]. To better understand aspects that contribute to a helpful review, machine learning for textual data analysis, which allows a machine to extract and classify online reviews, has been utilized to provide more insights and make predictions from high volumes of reviews [45]. Compared to traditional forms of manual content analysis, machine learning methods for text data are less time consuming and labor intensive. It also provides additional information, such as semantics, structures, sequences, and context around nearby words.
Text mining classification, which labels unstructured data with relevant categories from a predefined set, is a fundamental text-mining task [46]. Most frequently used machine learning techniques for classification and regression analysis include Naïve Bayes (NB), Support Vector Machine (SVM), K-Nearest Neighbors (KMN), and classical ontology [47]. NB usually produces less accurate predictive outcomes, but its high processing speed on big data was favored by scholars [48]. SVM is currently one of the most effective methods to categorize unlabeled data [49]. Zhang et al. [50] analyzed restaurant reviews written in Cantonese, revealing that NB achieved equal or better accuracy than SVM. Rafi et al. [51] compared SVM and NB classifiers for text categorization with Wikitology and found that NB performed better. Lau et al. [52] found that fuzzy ontology-based semantic analysis outperformed other algorithms (e.g., SVM, embedded in an experimental system (OBPRM)) given its effectiveness in automatically identifying the aspect-oriented sentiments captured in product ontology. Ali et al. [47] proposed SVM with Fuzzy Domain Ontology (FDO) as a more accurate and efficient algorithm to extract hotel features given its improved ability to remove irrelevant reviews and classify the feature reviews into more degrees of polarity terms.
It is concluded that the predictive power of these classifiers varies across different online review contexts and could be influenced by interactions between classification models and feature options [50]. Therefore, comparisons should be made across different machine learning algorithms to determine which data-mining algorithm in the restaurant industry provides the highest precision and accuracy.

3. Methodology

3.1. Data Collection

A web-crawler was programmed in Python to automatically retrieve reviews from Yelp.com. Data were collected from Yelp during 10–16 October, 2018. The study selects three best cities to travel in the U.S. based on TripAdvisor reviews [64], including New York, Los Angeles and Las Vegas. During the crawling process, identifiable information of reviewers and restaurants was removed carefully for privacy protection. In total, 294,034 reviews were crawled by the program. The number of reviews extracted in each city is shown in Table 2. In addition to textual feedback of consumers, other relevant information, such as the elite status of consumer, type of restaurant, review date and star rating of individual reviewers for the restaurant, were also collected. Additionally, the number of votes on “useful” specific to each individual review was acquired to measure the usefulness of review in the study.

3.2. Data Analysis Process

3.2.1. Step 1: Data Preprocessing

The text preprocessing procedure follows steps adapted from prior studies [65,66,67], including eliminating non-English characters and words, word text tokenization, part-of-speech tagging (POS tagging or POST), replacing common negative words, word stemming, and removing low frequency words (less than 2%) [65].

3.2.2. Step 2: Restaurant Aspect Extraction

After eliminating irrelevant and non-textual contents in the pre-step, reviews were transformed into proper vectors. The step aims at identifying major dining aspects from obtained reviews. The Latent Dirichlet Allocation (LDA) was applied to identify underlying aspects that restricted human interventions from mass reviews [65].

3.2.3. Step 3: Sentiment Detection

This stage aims at detecting customers’ sentiments of different restaurant aspect embedded in their reviews. First, each of the review is cut into sentences by SentiStrength, and each sentence then is assigned with a tuple of negative value and positive value, since in reality, one sentence may contain positive sentiments and simultaneously, negative ones. the SentiStrength also fixedly scores the dictionary tokens that include regular emoticons. For instances, “good” is scored {3, −1}, and {1, −4} for the “bad”. Note that merely when a word presents within the dictionary, it is characterized by a single score. In addition, additional marks or attributive terms may lead to score change, such as the score of “goood” equals that of “good!!!”, and they possibly extend the dictionary. Feature sentiments were calculated by applying SentiStrength as follows. Denote the collection of reviews by R = {r1, r2, ..., rn} and the collection of obtained aspects by T = {t1, t2, ..., tm}. LDA outputs a matrix Wn×m, of which the entry w i , j represents the number of times a feature from ith review associating to jth aspect. Subsequently, the sentiment score attached to given aspect is the weighted average over the reviews. For every aspect tj, we calculate the aspect sentiment score tsj as noted in Equation (1):
t s j = i = 1 n w i , j × s i i = 1 n w i , j
where S = {s1, s2, ..., sl} denotes the sentiment score of each feature associated with aspect tj.

3.2.4. Step 4: Classifier Set Up

Previous helpfulness predictions mainly rely on descriptive review features, such as review rating, review length, and review text, as the most useful features. However, in this exploratory analysis, review helpfulness depends more on review semantic and its sentiment rather than descriptive aspects. Specifically, the aspects and their sentiments indicated in the review are the key criteria to determine if a review is helpful.
Because of the lack of useful votes and the limited schema of review arrangement from the websites, we apply classical learning algorithms to the binary classification of the online-review helpfulness, in another word, to discriminate if any particular review from the review collection is helpful or not, based on the emotion data and the best performing features. These algorithms are: (1) NB+LR; (2) NB and SVM; and (3) SVM accompanied by FDO. Based on these aspects obtained via the aforementioned steps, reviews were separated into training and testing datasets. The test data are used to estimate the performance of each machine-learning algorithm.
NB classifier. NB was defined as a classifier on the basis of Bayes’ rule [68]. NB is a scheme based on statistics. Under its assumption, attributes are of equal independence and importance. For classifying an unknown cast, NB selects classes that are the most likely to contain evidence in test cast.
NB is widely applied in classifying sentiments for the classification of a given review document d to class c as noted in Equation (2).
p ( x i | c ) = c o u n t   o f   x i   i n   d o c u m e n t   d   o f   c l a s s   c t o t a l   n u m b e r   o f   w o r d s   i n   d o c u m e n t   d   o f   c l a s s   c
Based on Bayesian law, the likelihood that any given document being a member of class ci is implied by Equation (3).
p ( c i | d ) = p ( d | c i ) × p ( c i ) p ( d )
In our context, the hypothesis of conditional independence that gives the particular class (yes or no) is adopted, and no independence exists between words. This is the reason why the model is called “naïve” (Equation (4)).
p ( c i | d ) =   ( Π p ( x i | c j ) ) × p ( c j ) p ( d )
Further, Logistic Regression (LR) was employed to examine the relationship between discrete variables. LR is often utilized when there is a dichotomous dependent variable, such as fault prone or non-fault prone. Although this statistical technique yields better performance on numerical data, it allows the prediction of discrete variables by a mix of continuous and discrete predictors.
Thus, the performance curves regarding different review amounts are displayed via the classification methods of NB and LR. Based on the description of Afzal [69,70] united NB and LR after the comparison of NB and LR (see Equation (5)).
p ˘   ( y = q | x , a , p ) = 1 / ( 1 + e ( i = 1 n a i x i β ) )
In the derivation, p ˘   ( y = q | x , a , p ) represents the association between NB and LR, α and β are discrete outcomes of LR. It is also observed from the equation that the two classifiers are linear. If the assumed data distribution is met, discriminate function analysis can yield greater performance. When the outcome of processed data is of continuity, the performance of multiple regressions is enhanced under given assumptions.
NB+SVM. The SVM is a machine learning approach with effectiveness [71]. It builds a hyperplane or a group of hyperplanes in a space with high dimension, such as ~w. When the margin is larger, the classifier will exhibit lower error, thus helping to achieve the maximum distance of the support vector from the nearest training data point following training in any class. Hence, the issue of margin maximization will arise.
This experiment employed the kernel functions in SVM’s training phase. The SVM classifier is trained via restaurant reviews with semantic annotation. While classifying reviews with SVM, training is performed to modify kernel parameters. Then, the most appropriate kernel parameters are identified. Upon the process of training, SVM has a basic goal of finding the largest margin hyperplane for solving the classification task of feature review.
SVM_FDO. The results are calculated with the SVM accompanied by FDO. A fuzzy ontology acts as a quadruple Ont =< X, C, RXC, RCC >; thus, X and C denote a set of objects and concepts, respectively. The set of objects is mapped to the set of concepts by the fuzzy relation RXC: X × C _→ [0,1] through assigning the value of respective membership. The fuzzy relation RCC: C × C _→ [0,1] refers to the fuzzy taxonomy relations among the set of concepts C.

3.3. Measurements

The performance of the helpfulness vote classification system is evaluated by prominent methods as noted in previous studies [47,72] with the recall, precision, accuracy and function measure accuracy being computed by means of Equations (6)–(8). F1 score, i.e., F-measure, is used to measure the accuracy of a test by combining the recall and the precision below:
F 1 = 2 × precision × recall precision + recall
Recall = correct   positive   predictions   amount positive   example   amount
Precision = correct   positive   predictions   amount positive   predictions   amount
The entire research process is illustrated in Figure 1.

4. Results

4.1. Descriptive Analysis of Online Reviews

A summary of all usable reviews of each city is presented in Table 3. The customer ratings of the restaurants in the three cities were dominated by 5 stars (53.62%), followed by 4 stars (22.78%), and 3 stars (9.94%). Of the textual reviews, 69.02% do not receive any votes of helpfulness, while 15.45% of textual reviews have one vote of helpfulness. Only approximately 1% of textual reviews have higher than 5 votes of helpfulness.

4.2. LDA Results

LDA, a generative probabilistic model for discovering latent semantic topics from a large text corpus, is utilized to extract and label the dimensions of all yelp customer generated reviews in this study. The LDA-identified four restaurant aspects and the top-20 frequent words within each aspect are shown in Figure 2. The font size is linearly proportional to the word frequency. There were two scholars originally conducting the naming of the restaurant aspects, which stands for the recognition on the logical connection among the most frequently-used words for given aspect. Subsequently, the naming was evidenced in another researches. Four aspects were considered: value, food/taste, location, and experience. Specifically, food/taste described the tangible products (e.g., food, drinks) that restaurant provided to the reviews. Experience was defined as customers’ internal responses to any direct interaction with staff in the restaurant. Accordingly, the experience aspect described greeting, serving, consumption, and after-sale processes involved in the reviewers’ dining experience. Location depicted the geographical convenience of the Yelp restaurant. The payoff which indicate the difference between the benefit received and the cost paid, and the monetary outcomes both are explicated by the value. In terms of significance order of these aspects, taste/food and their associated words are shown to be most commonly referred in the online reviews that are most frequently mentioned in the online reviews (N = 1,509,172) followed by value (N = 1,219,085), experience (N = 1,123,405), and location (N = 967,192).

4.3. Sentiment Results

After running the aspect extraction, lists of words were designated for each aspect. The extracted words contain not only restaurant features, but also the likelihood or sentiments of users (e.g., great, bad, good, like, hate). Figure 3 presents the degree of sentiment of each derived restaurant aspect (shown in blue). The sentiment is positive if the point representing a restaurant aspect lies outside the inner diamond-shaped rectangle (shown in gray) and negative if the point lies inside the rectangle shape. As shown in Figure 3, positive sentiments tend to be associated with the food/taste followed by experience and location. Customer perceived value of restaurant was generally associated with negative feelings.

4.4. Model Comparison

The comparison results of different classification methods are illustrated in Table 4. First, by adding SVM, the second method increased the F1 accuracy rate from 67.68 to 71.20 indicating that the performance of helpful classification by NB with SVM for online Yelp reviews is better than simple NB. SVM+FDO is further used for more precise examination. F1 accuracy, recall, and precision increased significantly during review classification in the case of SVM with FDO. Thus, the third method is most efficient for in helpfulness classification of opinion mining compared with the other simple SVM scheme.

5. Concluding Remarks

5.1. Summary of Results and Discussion

On the basis of 294,034 reviews from Yelp, this study proposes a restaurant review helpfulness prediction model with an emphasis on both dining aspects and emotional aspects. It is revealed that restaurant online reviews are associated with four fundamental aspects: taste/food, experience, value, and location. Most positive reviews are associated with food/taste, while negative reviews are associated with value. Utilization of SVM with FDO algorithm achieved highest prediction accuracy (79.59%) and precision rate (81.62%) to predict restaurant review usefulness of three U.S.-based cities on Yelp.
Among four extracted fundamental dining aspects, taste/food, value, and experience is consistent with prior studies that apply text-mining analysis to discover hidden restaurant aspects in online reviews [73,74,75,76]. However, location is barely mentioned by previous studies. Based on aspect frequency, it is suggested that quality of food appeared to be the most important aspect to customers, which is consistent with prior studies indicating food is the greatest contributor to the success of any restaurant [77]. However, Cuizon et al. [73] found that service was the most frequently mentioned aspect. Good taste and food quality are more likely to generate positive online reviews. However, this study highlights that customers tend to express negative emotions towards value, which is different from previous findings indicating restaurant ambience had the lowest and still positive sentiment score [78], and customers tend to complain about service quality [79]. The potential reason might be this study extracted restaurant reviews from three metropolitan cities in the U.S. where the costs of living are relatively high. Therefore, negative feelings are linked to low perceived value.
The comparison of three algorithms for predicting the review helpfulness revealed that SVM with FDO for predicting online review usefulness is superior to two other algorithms: NB and the amalgamation of NB and SVM. The SVM with FDO algorithm increased the F1-score, recall and precision metrics by 11.91%, 13.31%, and 10.23%, respectively, compared to NB. This study finding supplements prior studies that compared different algorithms in predicting review helpfulness without considering a combination of two algorithms [54,80].

5.2. Implications, Limitations, and Future Studies

The implications of this study can be explained from both theoretical and managerial perspectives. First, unlike prior studies that used either a perceptional survey to examine the impact of review contents or reviewer characteristics on review helpfulness, this study contributes to the emerging review helpfulness literature in the hospitality and tourism industry in terms of methodology. As one of the few attempts, this study reveals that the SVM with FDO algorithm significantly improve the accuracy of predicting review helpfulness in restaurant business domain. This approach is an innovative technique that combines both traditional natural language processing and advanced machine learning algorism in predicting helpful reviews.
Second, this study provides new insight on sustainable economic development by developing sustainable marketing strategies to maximize restaurant industry’s performance growth. As revealed in Figure 1, words associated with food/taste includes menu, special, fresh, option, and portion. It is suggested that customers are not only satisfied with food quality, but also the varied option of the menu, and the large portion food sizes. In addition, waiters’ service quality plays an important in creating a good experience. It is also revealed that location can have a big impact on restaurant performance. To maintain a sustainable business development, restaurateurs should consider choosing a high-traffic location where the surrounding area has a well-developed transportation infrastructure. In terms of value, restaurateurs should put much thought and consideration into developing and prioritizing a food pricing strategy as well as take notes of how much the customers are willing to pay. As suggested by Cao et al. [81], review platforms could use this approach to develop a sorting or recommendation algorithm that accurately shows helpful and valuable reviews to increase readers’ stickiness to the review websites. The restaurant attributes identified in this study can help filter large amount of online reviews and could be used as guidelines to assist restaurant marketers and managers to improve their services and develop sustainable online marketing strategies. Especially for small start-ups, making effective use of online reviews could increase the chances of being discovered and gain a sustainable competitive advantage. However, it is important to note that sustainable aspects of a restaurant (e.g., green packaging, waste management, preservation of energy, and public relations on green activity) identified by Ju and Chang [82] are not emerged as a frequently-mentioned aspect. The potential reason might be that many of U.S. customers are unable to define green restaurant, even though they have eaten in a green restaurant [83]. Due to fact that customers are willing to pay more for the green restaurant experience [83] and practices with a focus on food and environment could form customers’ positive attitudes, which in turn lead to buying behavior [84], restaurateurs should take sustainable practices to attract customers or encourage customers to post reviews regarding their sustainability efforts. To raise customers’ awareness of the restaurant’s sustainability activities, review websites should consider taking sustainable aspects of restaurants into account while designing a ML-based technique to predict review helpfulness.
Third, due to limited amount of helpfulness votes, this study could also help review websites such as Yelp to derive more helpful reviews even though some of them are buried in thousands of reviews. The filter mechanism created upon the SVM + FDO reveals substantial higher accuracy opposed to that built on NB and SVM in previous studies. The outcomes of the existing study propose an innovative review filtering mechanism that can boost up more helpful reviews using different aspect. Thus, the review websites’ ability to provide more effective and useful information could attract more visitors. Consequently, adding new pop-up predicted helpful reviews in the online review websites contribute to long-term development of these platforms.
Fourth, for someone who would like to become opinion leaders in online communities write high-quality reviews to become a Yelp Elite Squad member, which is a community with active evangelists and role models [85]. However, review manipulation constantly occurs [15], and efforts should be made by review platforms practitioners to ban and detect fake helpful reviews.
This study is subject to some limitations. First, this study only focused on restaurant reviews written in English, and the proposed helpfulness prediction approach might not be applicable to predict helpfulness of restaurant reviews written in other languages. Future work could test the accuracy of FDO approach on other languages. Second, only one site with reviews from restaurants located in three U.S. cities was chosen for data collection, which limits the study sample. According to CNBC [86], in addition to New York, Los Angeles, and Las Vegas, the top ten foodie cities in the U.S. also include Portland, San Francisco, Miami, Orlando, Seattle, San Diego, and Austin. Future research should collect more restaurant reviews on a larger sample of U.S. cities. Also, it will be interesting to make comparisons in the importance of dining aspects across different cities or regions. Third, this study does not take the temporal feature of online reviews into account. Yang et al. [57] explained that older online reviews were more likely to receive helpful votes than recent reviews. However, this is not always the case since customers tend to read most recently posted reviews to gain the most up-to-date information. Therefore, temporal dimensions should be controlled in future studies. Fourth, this study predicted helpfulness on the basis of emotions and restaurant features conveyed in online reviews. Future studies could examine the impact of reviewers’ characteristics (e.g., expertise; history helpful votes) and dining context (e.g., dining purpose, dining companions) highlighted in the study of Gan et al. [78] through experimental designs. Finally, this study only examines the textual reviews, future studies could examine the impact of video presentation formats and imagery format in predicting review helpfulness.

Author Contributions

Funding acquisition, X.X. and Y.L.; Conceptualization, methodology, software, data curation, and formal analysis, Y.L.; original draft preparation, review and editing, validation, and project management, X.X.

Funding

This research was funded by the National Natural Science Foundation of China (No. 71602125, Research on Generation Mechanism of Consumers’ Word-of-Mouth in Social Network from the Perspective of Scarcity Marketing), the Start-Up Research Grant for Newly Recruited Faculty (No. 00191965271305, Mining the Hidden dimensions from customer generated reviews: An application of tourism industry), and the Fundamental Research Funds for the Central Universities (No. JBK1801039).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. International Economic Development Council. Green Metrics: Common Measures of Sustainable Economic Development. 2017. Available online: https://www.iedconline.org/clientuploads/Downloads/edrp/IEDC_Greenmetrics.pdf (accessed on 29 August 2019).
  2. National Restaurant Association. Restaurant Industry Factbook. 2019. Available online: https://www.restaurant.org/Downloads/PDFs/Research/SOI/restaurant_industry_fact_sheet_2019.pdf (accessed on 26 July 2019).
  3. Lee, C.; Hallak, R.; Sardeshmukh, S.R. Innovation, entrepreneurship, and restaurant performance: A higher-order structural model. Tour. Manag. 2016, 53, 215–228. [Google Scholar]
  4. Parsa, H.G.; Self, J.T.; Njite, D.; King, T. Why restaurants fail. Cornell Hosp. Q. 2005, 46, 304–322. [Google Scholar]
  5. Forbes. Restaurants Don’t Fail, Lenders Do. 2013. Available online: https://www.forbes.com/sites/marccompeau/2013/12/03/restaurants-dont-fail-lenders-do/#4701cae121c6 (accessed on 22 July 2019).
  6. Hua, N.; Lee, S. Benchmarking firm capabilities for sustained financial performance in the U.S. restaurant industry. Int. J. Hosp. Manag. 2014, 36, 137–144. [Google Scholar]
  7. Guta, M. 94% Diners will Choose Restaurant Based on Online Reviews. Small Business Trends, June 2018. Available online: https://smallbiztrends.com/2018/06/how-diners-choose-restaurants.html (accessed on 20 August 2019).
  8. Statista. Cumulative Number of Reviews Submitted to Yelp from 2009 to 2017 (in Millions). Available online: https://www.statista.com/statistics/278032/cumulative-number-of-reviews-submitted-to-yelp/ (accessed on 29 July 2019).
  9. Malhotra, N.K. Reflections on the information overload paradigm in consumer decision making. J. Consum. Res. 1984, 10, 436–440. [Google Scholar]
  10. Kwon, B.C.; Kim, S.H.; Duket, T.; Catalán, A.; Yi, J.S. Do people really experience information overload while reading online reviews? Int. J. Hum. Comput. Interact. 2015, 31, 959–973. [Google Scholar]
  11. Mudambi, S.M.; Schuff, D. What makes a helpful online review? A study of customer reviews on amazon.com. Society for Information Management and The Management Information Systems Research Center. MIS Q. 2010, 34, 185–200. [Google Scholar]
  12. Li, M.; Huang, L.; Tan, C.; Wei, K. Helpfulness of online product reviews as seen by consumers: Source and content features. Int. J. Electron. Commer. 2013, 17, 101–136. [Google Scholar]
  13. Chua, A.Y.; Banerjee, S. Helpfulness of user-generated reviews as a function of review sentiment, product type and information quality. Comput. Hum. Behav. 2016, 54, 547–554. [Google Scholar]
  14. Huang, A.H.; Chen, K.; Yen, D.C.; Tran, T.P. A study of factors that contribute to online review helpfulness. Comput. Hum. Behav. 2015, 48, 17–27. [Google Scholar]
  15. Wang, X.; Tang, L.R.; Kim, E. More than words: Do emotional content and linguistic style matching matter on restaurant review helpfulness? Int. J. Hosp. Manag. 2019, 77, 438–447. [Google Scholar]
  16. Samuel, A.L. Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 1959, 3, 210–229. [Google Scholar]
  17. Hennig-Thurau, T.; Gwinner, K.P.; Walsh, G.; Gremler, D.D. Electronic word-of-mouth via consumer-opinion platforms: What motivates consumers to articulate themselves on the Internet? J. Interact. Mark. 2004, 18, 38–52. [Google Scholar]
  18. Blal, I.; Sturman, M.C. The differential effects of the quality and quantity of online reviews on hotel room sales. Cornell Hosp. Q. 2014, 55, 365–375. [Google Scholar]
  19. Ladhari, R.; Michaud, M. eWOM effects on hotel booking intentions, attitudes, trust, and website perceptions. Int. J. Hosp. Manag. 2015, 46, 36–45. [Google Scholar]
  20. Nieto-García, M.; Muñoz-Gallego, P.A.; González-Benito, Ó. Tourists’ willingness to pay for an accommodation: The effect of eWOM and internal reference price. Int. J. Hosp. Manag. 2017, 62, 67–77. [Google Scholar]
  21. Sparks, B.A.; Browning, V. The impact of online reviews on hotel booking intentions and perception of trust. Tour. Manag. 2011, 32, 1310–1323. [Google Scholar] [Green Version]
  22. Rampton, J. How Online Reviews Can Help Grow Your Small Business. Forbes. 31 May 2017. Available online: https://www.forbes.com/sites/johnrampton/2017/05/31/how-online-reviews-can-help-grow-your-small-business/#7ecbc990737b (accessed on 13 July 2019).
  23. Vlachos, G. Online Travel Statistics. Info Graphics Mania. 2012. Available online: http://infographicsmania.com/online-travel-statistics-2012/ (accessed on 6 August 2019).
  24. Mauri, A.G.; Minazzi, R. Web reviews influence on expectations and purchasing intentions of hotel potential customers. Int. J. Hosp. Manag. 2013, 34, 99–107. [Google Scholar]
  25. Zhang, Z.; Ye, Q.; Law, R.; Li, Y. The impact of e-word-of-mouth on the online popularity of restaurants: A comparison of consumer reviews and editor reviews. Int. J. Hosp. Manag. 2010, 29, 694–700. [Google Scholar]
  26. Kim, W.G.; Li, J.J.; Brymer, R.A. The impact of social media reviews on restaurant performance: The moderating role of excellence certificate. Int. J. Hosp. Manag. 2016, 55, 41–51. [Google Scholar]
  27. Guo, Y.; Wang, Y.; Wang, C. Exploring the Salient Attributes of Short-Term Rental Experience: An Analysis of Online Reviews from Chinese Guests. Sustainability 2019, 11, 4290. [Google Scholar] [Green Version]
  28. Jia, S.S. Leisure Motivation and Satisfaction: A Text Mining of Yoga Centres, Yoga Consumers, and Their Interactions. Sustainability 2018, 10, 4458. [Google Scholar] [Green Version]
  29. Nam, S.; Ha, C.; Lee, H. Redesigning In-Flight Service with Service Blueprint Based on Text Analysis. Sustainability 2018, 10, 4492. [Google Scholar] [Green Version]
  30. Pantelidis, I.S. Electronic meal experience: A content analysis of online restaurant comments. Cornell Hosp. Q. 2010, 51, 483–491. [Google Scholar]
  31. Yan, X.; Wang, J.; Chau, M. Customer revisit intention to restaurants: Evidence from online reviews. Inf. Syst. Front. 2015, 17, 645–657. [Google Scholar]
  32. Danescu-Niculescu-Mizil, C.; Kossinets, G.; Kleinberg, J.; Lee, L. How opinions are received by online communities: A case study on amazon.com helpfulness votes. In Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 20–24 April 2009; pp. 141–150. [Google Scholar]
  33. Otterbacher, J. “Helpfulness” in online communities: A measure of message quality. In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, Boston, MA, USA, 4–9 April 2009. [Google Scholar]
  34. Kwok, L.; Xie, K.L. Factors contributing to the helpfulness of online hotel reviews: Does manager response play a role? Int. J. Hosp. Manag. 2016, 28, 2156–2177. [Google Scholar]
  35. Pan, Y.; Zhang, J.Q. Born unequal: A study of the helpfulness of user-generated product reviews. J. Retail. 2011, 87, 598–612. [Google Scholar]
  36. Baek, H.; Ahn, J.; Choi, Y. Helpfulness of online consumer reviews: Readers’ objectives and review cues. Int. J. Electron. Commer. 2012, 17, 99–126. [Google Scholar]
  37. Racherla, P.; Friske, W. Perceived ‘usefulness’ of online consumer reviews: An exploratory investigation across three services categories. Electron. Commer. Res. Appl. 2012, 11, 548–559. [Google Scholar]
  38. Liu, Z.; Park, S. What makes a useful online review? Implication for travel product websites. Tour. Manag. 2015, 47, 140–151. [Google Scholar] [Green Version]
  39. Hong, H.; Xu, D.; Wang, G.A.; Fan, W. Understanding the determinants of online review helpfulness: A meta-analytic investigation. Decis. Support Syst. 2017, 102, 1–11. [Google Scholar]
  40. Luo, Y.; Tang, R.L. Understanding hidden dimensions in textual reviews on Airbnb: An application of modified latent aspect rating analysis (LARA). Int. J. Hosp. Manag. 2019, 80, 144–154. [Google Scholar]
  41. Ngo-Ye, T.L.; Sinha, A.P. The influence of reviewer engagement characteristics on online review helpfulness: A text regression model. Decis. Support Syst. 2014, 61, 47–58. [Google Scholar]
  42. Zhou, S.; Guo, B. The order effect on online review helpfulness: A social influence perspective. Decis. Support Syst. 2017, 93, 77–87. [Google Scholar]
  43. Li, H.; Wang, C.R.; Meng, F.; Zhang, Z. Making restaurant reviews useful and/or enjoyable? The impacts of temporal, explanatory, and sensory cues. Int. J. Hosp. Manag. 2018. [Google Scholar] [CrossRef]
  44. Barreda, A.; Bilgihan, A. An analysis of user-generated content for hotel experiences. J. Hosp. Tour. Technol. 2013, 4, 263–280. [Google Scholar]
  45. Lewis, S.C.; Zamith, R.; Hermida, A. Content analysis in an era of big data: A hybrid approach to computational and manual methods. J. Broadcast. Electron. Media 2013, 57, 34–52. [Google Scholar]
  46. Allahyari, M.; Pouriyeh, S.; Assefi, M.; Safaei, S.; Trippe, E.D.; Gutierrez, J.B.; Kochut, K. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. 2017. Available online: https://arxiv.org/pdf/1707.02919.pdf (accessed on 4 August 2019).
  47. Ali, F.; Kwak, K.S.; Kim, Y.G. Opinion mining based on fuzzy domain ontology and Support Vector Machine: A proposal to automate online review classification. Appl. Soft Comput. 2016, 47, 235–250. [Google Scholar]
  48. Aliandu, P. Sentiment analysis to determine accommodation, shopping and culinary location on foursquare in Kupang city. Procedia Comput. Sci. 2015, 72, 300–305. [Google Scholar]
  49. Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: New York, NY, USA, 2013. [Google Scholar]
  50. Zhang, Z.; Ye, Q.; Zhang, Z.; Li, Y. Sentiment classification of internet restaurant reviews written in Cantonese. Expert Syst. Appl. 2011, 38, 7674–7682. [Google Scholar]
  51. Rafi, M.; Hassan, S.; Shaikh, M.S. Content-Based Text Categorization Using Wikitology. 2012. Available online: https://arxiv.org/pdf/1208.3623.pdf (accessed on 6 August 2019).
  52. Lau, R.Y.K.; Li, C.; Liao, S.S.Y. Social analytics: Learning fuzzy product ontologies for aspect-oriented sentiment analysis. Decis. Support Syst. 2014, 65, 80–94. [Google Scholar]
  53. Qazi, A.; Syed KB, S.; Raj, R.G.; Cambria, E.; Tahir, M.; Alghazzawi, D. A concept-level approach to the analysis of online review helpfulness. Comput. Hum. Behav. 2016, 58, 75–81. [Google Scholar]
  54. Hu, Y.H.; Chen, K. Predicting hotel review helpfulness: The impact of review visibility, and interaction between hotel stars and review ratings. Int. J. Inf. Manag. 2016, 36, 929–944. [Google Scholar]
  55. Fang, B.; Ye, Q.; Kucukusta, D.; Law, R. Analysis of the perceived value of online tourism reviews: Influence of readability and reviewer characteristics. Tour. Manag. 2016, 52, 498–506. [Google Scholar]
  56. Lee, M.; Jeong, M.; Lee, J. Roles of negative emotions in customers’ perceived helpfulness of hotel reviews on a user-generated review website: A text mining approach. Int. J. Contemp. Hosp. Manag. 2017, 29, 762–783. [Google Scholar]
  57. Yang, S.B.; Shin, S.H.; Joun, Y.; Koo, C. Exploring the comparative importance of online hotel reviews’ heuristic attributes in review helpfulness: A conjoint analysis approach. J. Travel Tour. Mark. 2017, 34, 963–985. [Google Scholar]
  58. Hu, Y.H.; Chen, K.; Lee, P.J. The effect of user-controllable filters on the prediction of online hotel reviews. Inf. Manag. 2017, 54, 728–744. [Google Scholar]
  59. Gao, B.; Hu, N.; Bose, I. Follow the herd or be myself? An analysis of consistency in behavior of reviewers and helpfulness of their reviews. Decis. Support Syst. 2017, 95, 1–11. [Google Scholar]
  60. Filieri, R.; Raguseo, E.; Vitari, C. When are extreme ratings more helpful? Empirical evidence on the moderating effects of review characteristics and product type. Comput. Hum. Behav. 2018, 88, 134–142. [Google Scholar]
  61. Ma, Y.; Xiang, Z.; Du, Q.; Fan, W. Effects of user-provided photos on hotel review helpfulness: An analytical approach with deep leaning. Int. J. Hosp. Manag. 2018, 71, 120–131. [Google Scholar]
  62. Lee, P.J.; Hu, Y.H.; Lu, K.T. Assessing the helpfulness of online hotel reviews: A classification-based approach. Telemat. Inform. 2018, 35, 436–445. [Google Scholar]
  63. Liang, S.; Schuckert, M.; Law, R. How to improve the stated helpfulness of hotel reviews? A multilevel approach. Int. J. Contemp. Hosp. Manag. 2019, 31, 953–977. [Google Scholar]
  64. Michaels, M. The 25 Best Places to Travel in the US This Year, According to TripAdvisor Reviews. Business Insider. March 2018. Available online: https://www.businessinsider.com/tripadvisor-best-places-to-travel-america-2018-3 (accessed on 1 July 2019).
  65. Guo, Y.; Barnes, S.J.; Jia, Q. Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent dirichlet allocation. Tour. Manag. 2017, 59, 467–483. [Google Scholar] [Green Version]
  66. Hong, Y.; Lu, J.; Yao, J.; Zhu, Q.; Zhou, G. What reviews are satisfactory: Novel features for automatic helpfulness voting. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, OR, USA, 12–16 August 2012; pp. 495–504. [Google Scholar]
  67. Kim, S.M.; Pantel, P.; Chklovski, T.; Pennacchiotti, M. Automatically assessing review helpfulness. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, 22–23 July 2006; pp. 423–430. [Google Scholar]
  68. Metsis, V.; Androutsopoulos, I.; Paliouras, G. Spam filtering with näive bayes-which näive bayes? In Proceedings of the Third Conference on Email and Anti-Spam (CEAS), Mountain View, CA, USA, 27–28 July 2006; Volume 17, pp. 28–69. [Google Scholar]
  69. Ng, A.Y.; Jordan, M.I. On discriminative vs. generative classifiers: A comparison of logistic regression and näive Bayes. In Advances in Neural Information Processing Systems; Morgan Kaufmann Publishers: San Mateo, CA, USA, 2002; pp. 841–848. [Google Scholar]
  70. Gladence, L.M.; Karthi, M.; Anu, V.M. A statistical comparison of logistic regression and different Bayes classification methods for machine learning. ARPN J. Eng. Appl. Sci. 2015, 10, 5947–5953. [Google Scholar]
  71. Suykens, J.A.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar]
  72. Malik, M.; Hussain, A. Helpfulness of product reviews as a function of discrete positive and negative emotions. Comput. Hum. Behav. 2017, 73, 290–302. [Google Scholar] [CrossRef] [Green Version]
  73. Cuizon, J.C.; Lopez, J.; Jones, D.R. Text mining customer reviews for aspect-based restaurant rating. Int. J. Comput. Sci. Inf. Technol. 2019, 10, 43–51. [Google Scholar]
  74. Liu, H.; He, J.; Wang, T.; Song, W.; Du, X. Combining user preferences and user opinions for accurate recommendation. Electron. Commer. Res. Appl. 2013, 12, 14–23. [Google Scholar]
  75. Pronoza, E.; Yagunova, E.; Volskaya, S. Aspect-Based Restaurant Information Extraction for the Recommendation System. In Lecture Notes in Computer Science, Proceedings of the Human Language Technology. Challenges for Computer Science and Linguistics (LTC 2013), Poznań, Poland, 7–9 December 2013; Vetulani, Z., Uszkoreit, H., Kubis, M., Eds.; Springer: Cham, Switzerland, 2016; Volume 9561. [Google Scholar]
  76. Gao, S.; Tang, O.; Wang, H.; Yin, P. Identifying competitors through comparative relation mining of online reviews in the restaurant industry. Int. J. Hosp. Manag. 2018, 71, 19–32. [Google Scholar]
  77. Alonso, A.D.; O’neill, M.; Liu, Y.; O’shea, M. Factors driving consumer restaurant choice: An exploratory study from the Southeastern United States. J. Hosp. Mark. Manag. 2013, 22, 547–567. [Google Scholar]
  78. Gan, Q.; Ferns, B.H.; Yu, Y.; Jin, L. A text mining and multidimensional sentiment analysis of online restaurant reviews. J. Qual. Assur. Hosp. Tour. 2017, 18, 465–492. [Google Scholar]
  79. Bilgihan, A.; Seo, S.; Choi, J. Identifying restaurant satisfiers and dissatisfiers: Suggestions from online reviews. J. Hosp. Mark. Manag. 2018, 27, 601–625. [Google Scholar]
  80. Park, Y.-J. Predicting the helpfulness of online customer reviews across different product types. Sustainability 2018, 10, 1735. [Google Scholar]
  81. Cao, Q.; Duan, W.; Gan, Q. Exploring determinants of voting for the “helpfulness” of online user reviews: A text mining approach. Decis. Support Syst. 2011, 50, 511–521. [Google Scholar]
  82. Ju, S.; Chang, H. Consumer perceptions on sustainable practices implemented in foodservice organizations in Korea. Nutr. Res. Pract. 2016, 10, 108–114. [Google Scholar] [PubMed] [Green Version]
  83. Dewald, B.; Bruin, B.J.; Jang, Y.J. US consumer attitudes towards “green” restaurants. Anatolia 2014, 25, 171–180. [Google Scholar]
  84. Namkung, Y.; Jang, S.C. Effects of restaurant green practices on brand equity formation: Do green practices really matter? Int. J. Hosp. Manag. 2013, 33, 85–95. [Google Scholar]
  85. Yelp. What Is Yelp’s Elite Squad? Available online: https://www.yelp-support.com/article/What-is-Yelps-Elite-Squad?l=en_US (accessed on 30 July 2019).
  86. CNBC. 10 Best Foodie Cities in America (No.1 May Surprise You). November 2018. Available online: https://www.cnbc.com/2018/11/05/wallethub-best-food-cities-in-america.html (accessed on 8 August 2019).
Figure 1. The research process.
Figure 1. The research process.
Sustainability 11 05254 g001
Figure 2. Word cloud generated for each extracted aspect.
Figure 2. Word cloud generated for each extracted aspect.
Sustainability 11 05254 g002
Figure 3. Results of sentiment analysis of four topics.
Figure 3. Results of sentiment analysis of four topics.
Sustainability 11 05254 g003
Table 1. Summary of literature on predicting the helpfulness of reviews from Yelp and TripAdvisor.
Table 1. Summary of literature on predicting the helpfulness of reviews from Yelp and TripAdvisor.
Author (Year)Antecedents of Review Helpfulness/UsefulnessReview PlatformNumber of ReviewsTargeted LocationReview CategoryMethodsMain Conclusion
Liu and Park [38]Reviewer characteristics (identity disclosure; expertise; reputation); review content features (review star rating; review length; review readability; review sentiment)Yelp5090New York City, LondonRestaurantTobit regression modelA combination of both reviewer and review characteristics positively influence on the review helpfulness
Qazi et al. [53]Average number of concepts per sentence; number of concepts per review; review typesTripAdvisor1366NAHotelTobit regression modelThe number of concepts contained in a review, the average number of concepts per sentence, and the review type contribute to the perceived helpfulness of online reviews
Hu and Chen [54]Review content; review sentiment; review author; review visibilityTripAdvisor349,582Las Vegas, OrlandoHotelModel TreeReview visibility and interaction effect of hotel star class and review rating improve the prediction accuracy
Fang et al. [55]Review readability; review sentiment; reviewer mean rating; reviewer rating habit (skewness of rating distribution)TripAdvisor19,674New OrleansAttractionsNegative binomial regression and Tobit regression modelText readability and reviewer characteristics affect preceived review helpfulness
Kwok and Xie [34]Number of words; number of sentences; reviewer gender; reviewer age; ratings; reviewer experience (status; membership; city visited)TripAdvisor56,284Austin, Dallas, Fort Worth, Houston, San AntonioHotelLinear RegressionThe helpfulness of online hotel reviews is positively affected by manager response and reviewer status
Lee et al. [56]Negative emotional expressionsTripAdvisor520,668New York CityHotelNegative binomial regressionNegative reviews are more influential than positive reviews when potential customers read online hotel reviews for their future stay
Yang et al. [57]Heuristic attributes (reviewer location, reviewer level, reviewer helpful vote, review rating, review length, and review photo)TripAdvisor1158New York CityHotel (a single case)Conjoint analysisReview rating and reviewer helpful vote attributes are the two most important factors in predicting review helpfulness
Hu et al. [58]Review quality; review sentiment; reviewer characteristicsTripAdvisor1,434,004New York City, Las Vegas, Chicago, Orlando, MiamiHotelLinear regression, reduced error-pruning tree, random forestReview rating and number of words predict review helpfulness across different users’ travel regions, travel seasons, and travel types
Zhou and Guo [42]Review orderYelp70,610Atlanta, Chicago, Los Angeles, New York, Washington, D.C.RestaurantNegative binomial regressionA review’s position in the sequence of reviews influences review helpfulness
Gao et al. [59]Reviewer characteristics (e.g., absolute rating bias; number of cities visited; total number of reviews); hotel ratingTripAdvisor8676New York CityHotelOrdinary Least Squares (OLS) and ordered logistic regressionReviews by reviewers with higher absolute bias in rating in the past influences helpfulness of future reviews
Filieri et al. [60]Extreme ratingTripAdvisor11,358FranceHotelTobit regression analysisExtreme reviews that are long and accomopanied by the reviewers’ photos are perceived to be more helpful
Ma et al. [61]Textual content; visual contentTripAdvisor; Yelp37,392OrlandoHotelDecision tree, Support Vector Machine with linear kernel (SVM), logistic regression Deep learning models combining both review texts and user-provided photos were more useful in predicting review helpfulness than other models
Lee et al. [62]Review quality; review sentiment; reviewer characteristicsTripAdvisor1,170,246New York City, Las Vegas, Chicago, Orlando, MiamiHotelClassification-based approachReviewer characteristics are good predictors of review helpfulness, whereas review quality and review sentiment are poor predictors of review helpfulness
Li et al. [43]Temporal cues (time related words); explanatory cues (causation-related words); sensory cues (see, hear, feel)Yelp186,714Las VegasRestaurantNegative binomial regressionTemporal cues have the strongest impact on review usefulness
Liang et al. [63]Review content quality (review depth; review extremity; review readability); reviewer characteristics (expertise; reputation; identity disclosure; cultural background); hotel features (ratings; ranking; number of rooms and photos)TripAdvisor246,963Beijing, Shanghai, Guangzhou, Hong Kong (China)HotelMultilevel modelInformative and readable reviews accompanied by extreme ratings are perceived to be more helpful
Wang et al. [15]Emotional content; linguistic styleYelp262,205San Diego, Philadelphia, Houston, Atlanta, Las Vegas, Miami, Anaheim, Chicago, New York City, and OrlandoRestaurantNegative binomial regressionJoy, sadness, anger, fear, trust, disgust, and linguistic style matching impact review helpfulness
Table 2. Number of reviews extracted in each city.
Table 2. Number of reviews extracted in each city.
CityNumber of Reviews
Las Vegas85,558
Los Angeles105,513
New York102,963
Grand Total294,034
Table 3. Descriptive summary of reviews in each overall rating category.
Table 3. Descriptive summary of reviews in each overall rating category.
Star RatingReview Count%
122,8027.76%
217,3565.90%
329,2409.94%
466,97422.78%
5157,66253.62%
Grand Total294,034100%
Table 4. Performance of each algorithm used for Yelp usefulness prediction.
Table 4. Performance of each algorithm used for Yelp usefulness prediction.
ModelF1Recall%Precision%
NB (Naïve Bayes)67.6864.3471.39
NB+SVM (Support Vector Machine)71.2072.9669.52
SVM_FDO (Fuzzy Domain Ontology)79.5977.6581.62

Share and Cite

MDPI and ACS Style

Luo, Y.; Xu, X. Predicting the Helpfulness of Online Restaurant Reviews Using Different Machine Learning Algorithms: A Case Study of Yelp. Sustainability 2019, 11, 5254. https://doi.org/10.3390/su11195254

AMA Style

Luo Y, Xu X. Predicting the Helpfulness of Online Restaurant Reviews Using Different Machine Learning Algorithms: A Case Study of Yelp. Sustainability. 2019; 11(19):5254. https://doi.org/10.3390/su11195254

Chicago/Turabian Style

Luo, Yi, and Xiaowei Xu. 2019. "Predicting the Helpfulness of Online Restaurant Reviews Using Different Machine Learning Algorithms: A Case Study of Yelp" Sustainability 11, no. 19: 5254. https://doi.org/10.3390/su11195254

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop