Introduction

In the summer 2019, a Twitter spat between the fast-food giants Popeyes and Chick-fil-A went viral. The so-called “Chicken Sandwich War” led to an unexpected surge in demand that caused Popeyes across the United States to run out of their newly released chicken sandwich (Suddath 2019). The heightened social media activity provided information about a product launch and increased customer engagement around its brand. This story illustrates how online word-of-mouth (WoM) communication can drive traffic to offline stores.

With sales at brick-and-mortar stores representing 85.2% of U.S. retail (U.S. Census Bureau 2021), it is crucial to understand the ways online activity and communication affect physical retailers. Additionally, the average internet user spends 147 min on social media a day (Statistica 2022). Current research on retail and social media is predominantly focused on e-commerce data, and those insights do not necessarily apply to brick-and-mortar stores. The limited literature that does measure social media’s impact on physical retail only evaluates a single product, company, or product category without estimating global elasticities. Our article addresses this research gap while providing a means for modeling hierarchical data, such as products in stores or stores in countries.

Our research question is twofold: what measures of social media activity lead to changes in retail foot traffic and what are the precise magnitudes of those effects for comparable stores? The purpose of the study is to address this gap in the literature by estimating elasticities of online WoM communication (i.e., sentiment, disagreement, subjectivity, popularity, likes, followers, and recency) and offline store visits. We consider these seven metrics within the framework of the Social Impact Theory (Latané 1981) and test which can dampen or amplify the effect of a brand mention on consumer behavior. This paper is meant to inform branded retailers how to use publicly available social media data to anticipate changes to near-future demand.

Our research makes three primary contributions to measure the impact of social media activity on consumer visits to stores of nationally known retail brands. The first primary contribution is to connect brand-specific social media activity to offline purchases for those brands. There is a wide literature on how social media influences e-commerce sales (Kim et al. 2019) and financial products (Antweiler and Frank 2004; Bollen et al. 2011), but these disregard where most retail sales occur.Footnote 1 Consider You et al. (2015) who performed the most extensive meta-analysis for estimating sales elasticities from electronic WoM communication. Of the fifty articles they reviewed, only two exclusively measured how online communication affected goods sold in person. Our research is meant to expand the understanding of how online chatter spills over into the physical world.

The second contribution of this paper is to assess how social media activity affects a broad array of retailers. When online activity is connected to in-person sales, it is often for a narrow type of location, such as movie theaters (Kim et al. 2019; Liu et al. 2016) or restaurants (Cheung and Thadani 2012; Luca 2016). Other research only relates electronic WoM communication to in-person sales for a solitary product type (Deloitte 2013; Sanchez et al. 2020; Zhang et al. 2012) or a single retail brand (Pauwels et al. 2016). This poses a problem in the literature by relying on results too narrow in scope and subject to the consumption patterns of an individual good, product category, or retailer. Our paper advances this research by leveraging a large dataset and considering a wider array of retail store brands. On the social media side, we collect 2.7 million tweets mentioning one of fifteen brands. Those brands span store types from general/grocery, fast food, and general merchandise that include 3870 unique locations across the United States. Our approach is less sensitive to possible variability associated with individual brands and is more generalizable to out-of-sample retailers wanting to quantify how social media activity affects store foot traffic. Additionally, we use actual foot traffic data taken from anonymized cell phone users. This is an improvement on current research that only measure purchase intention (Mainardes and Cardoso 2019) or participants aware they are part of a study (Godes and Mayzlin 2009).

The third primary contribution is our mixed-level specification. This is critical to statistically control for the distinct relationship each store and each brand has with social media. This allows us to estimate how social media for a brand-level variable (i.e., daily measures of social media activity) can impact a store-level variable (i.e., daily customer visits), shown in the example below (Fig. 1).

Fig. 1
figure 1

Hierarchical data structure for a single brand

This specification with hierarchical regression provides foot traffic estimates for each store and avoids the ecological fallacy of interpreting aggregated data at the individual level (Hox et al. 2010). It also avoids assumptions of homogeneity across stores (Snijders and Bosker 2011), which is especially important given the diversity of brands in our analysis.

We find that a one standard deviation increase in either the per-tweet popularity or disagreement about a brand on Twitter leads to a 0.04 standard deviation increase (3–4%) in next-day foot traffic to stores of that brand. The results are slightly stronger when measuring out the next 3 days but weaker when using less-common measures of social media activity, such as on a per-like or per-follower basis. Although this indicates social media globally plays a small role in total retail visits, the results are statistically significant and economically meaningful when extrapolated across all stores of a national brand. We qualify these results as sensitive to different measures of social media activity and only representing an average store. Our results also show that social media activity only has a statistically signification relationship with foot traffic for a few days, and any affect fully dissipates within a week. Additionally, the impacts are much stronger for individual brands and stores, indicating wide heterogeneity.

Literature

Social impact theory

Early research into social communication was pioneered by Hovland (1948) where he defined the aspects of communication necessary to make it “social.” Hovland defined the four components of social communication (which readily maps to our research) as a communicator (sender) who transmits a stimulus (tweet) to a communicatee (user reading a tweet) who responds to that communication (considers visiting a store). Latané (1981) elaborated by defining mediators of how social connections impact our feelings, thoughts, or behaviors. His Social Impact Theory identifies three such mediators as the source strength (e.g., salience, power, potency), number of sources, and source immediacy (closeness in time or space). An increase in any of those dimensions will make the receiver of a message more likely to perform an action. Latané then defines an associated Social Forces Law whereby the three mediators have a multiplicative relationship. This means that the impact of a message is not only determined by the strength, number of sources, and immediacy of the message source but by the interaction of all three. Our research does not measure the direct interaction of these terms—instead, we estimate the effect of isolated changes of the variables. This is like Argo et al. (2005) but expanded on different subsets of data. The cross-tabulation approach is ideal in scenarios with insufficient data or methods too computationally demanding to estimate interaction terms.

More recent researchers have applied Social Impact Theory (SIT) to modern retail settings. Kwahk and Ge (2012) used SIT to find how social media impacts consumers’ visit intention and purchase intention for e-commerce. Chenga and Linb (2020) expanded on SIT to find that a consumer’s perceived confidence in an online message increases purchase intention on social commerce. Finally, Naeem and Ozuem (2021) apply SIT in how social media impacts fashion retail and found that source strength and immediacy lead to greater brand engagement.

We first add to the literature by offering a newer, more economically meaningful application of Social Impact Theory to actual foot traffic across a variety of retail brands. Second, we expand on SIT by proposing and testing novel definitions of source strength and number of sources relevant to different online platforms. Our approach can be used by others looking to apply SIT to different social media contexts by leveraging data on message content and impressions.

Word-of-mouth communication

Social networks inform and influence how we live and what we buy (Jun and Park 2016). Online consumers prefer peer reviews over corporate messaging, as the former better focuses on and evaluates usage situations for a good from the perspective of the user (Kim and Kim 2018). This “word-of-mouth” communication—or simply “WoM,” the traditionally oral communication between two, non-commercial individuals about a product or service—is a key form of social learning used in many purchase decisions. One of the earliest large-scale WoM examples comes from Antweiler and Frank (2004) who used 1.5 million posts on finance message boards to find that messages help predict stock market volatility. The subsequent research is best summarized by You et al. (2015) in a meta-analysis that considers hundreds of volume and sentiment WoM elasticities of product sales from fifty-one studies. They estimate a 0.236 volume elasticity and 0.417 sentiment elasticity (p. 19); however, they also emphasize the heterogeneity of results, and the way context matters. For example, they find both elasticities to be higher for goods that are privately consumed, have low-trialability, and are from lower-competitive industries. The researchers also find many examples where either the volume or sentiment elasticity is statistically significant but not the other.

Social media and retail

Since its launch in 2006, Twitter has become one of the most popular social media sites worldwide with 500 million tweets sent per day (Stricker 2014). In fact, 22% of U.S. adults are on Twitter and 9% of U.S. adults use the platform daily (Pew Research Center 2020). These users broadcast short messages (“tweets”) out to their subscribers (“followers”). Followers may “like” or “retweet” a post which broadcasts the message to their own followers.

Twitter is an ideal source of WoM data for its written and public communication format. Other top U.S. social media sites are less accessible due to their privacy walls (Facebook, Snapchat, WhatsApp), visual content emphasis (Instagram, Pinterest, YouTube), lower popularity (Nextdoor, Reddit), or professionalism (LinkedIn). Alternatively, Twitter is an “information intermediary” allowing users to create new information, compile existing information, and disseminate it to their followers. This includes daily chatter, conversations, sharing information, and reporting news (Java et al. 2009). The platform is also important for retail since, relative to other social networks, users are more likely to keep the brand central to a message and less likely to include self-promotion (Smith et al. 2012). Due to its culture of public sharing, Twitter makes our own research more replicable to academics and practitioners.

Microblogging sites like Twitter also reflect the non-virtual world, predicting changes in stocks (Bollen et al. 2011) and can even be an early detector of earthquakes (Sakaki et al. 2010). Current research, however, is limited to social media impacting e-commerce sales (Kim et al. 2019) or even just a consumer’s purchase intention (Mainardes and Cardoso 2019). If an article does address in-person sales, it is often an individual product, such as soft drinks (Sanchez et al. 2020), video games (Deloitte 2013), or movies (Kim et al. 2019; Liu et al. 2016). There is no research on whether social media drives foot traffic to multiple retail brands; however, adjacent work by Kalyanam et al. (2018) found higher Google Ads spending leads to higher sales volumes in various brick-and-mortar stores.

Theory

Models of changed behavior

This paper considers whether online behavior on Twitter impacts offline foot traffic to brick-and-mortar retailers. We apply the Social Impact Theory, discussed in “Social impact theory” section, to a four-step process that describes the journey a customer goes from viewing social media activity mentioning a brand to the decision of whether to visit a brand store. The diagram, shown below, applies to physical or virtual stores as well as user-generated or sponsored content. It is most like the work of Cheung and Thadani (2012, p. 464) who construct an integrative framework of the impact of electronic WoM communication on purchases. We provide added value by offering a more parsimonious representation based on the Social Impact Theory.

The framework considers a single user on social media seeing WoM communication around a specific brand over a defined time interval. They observe seven components of that social media activity where the first three relate to the message content. Sentiment is the average tone of posts and represents how positive, negative, or neutral the text in the messages is. Disagreement is the distribution of sentiment, which also represents the diversity of opinions around a brand. Subjectivity is the average degree to which text in posts mentioning a brand use opinion-based language instead of fact-based. Together these three form the source strength of discussing a brand.

The next components of social media activity reflect the number of sources speaking about a given brand. This is visible to the user in three ways: popularity as the volume of messages about that brand, likes as the number of likes a message receives, and followers as the number of followers the user has who sent a branded message.

Finally, there is recency, which is how soon social media activity is from the decision to visit a retail store. Recency leads singularly to the immediacy of social communication. These seven components of a social media activity then form the three dimensions of the Social Impact Theory to predict the directional impact on retail visits. Latané (1981) conjects that an increase in either source strength, number of sources, or immediacy will increase the likelihood communication leads to an action.

Hypotheses

The arrows from Step 2 to Step 3 in Fig. 2 show how social media activity (SMA) mediates an individual’s perception of a brand, making them more or less likely to visit a particular retailer. We use these channels of influence to hypothesize the expected directional relationship between a component of SMA and a use’s visit decision (Fig. 2).

Fig. 2
figure 2

User’s path from social media activity (SMA) to retail visit

H1

Sentiment of brand tweets positively impacts store visits.

When brand opinions on social media improve, sentiment increases, which is expected to increase store traffic. Positive changes in online sentiment increases video game sales (Deloitte 2013), box office sales, (Liu et al. 2016), and restaurant patronage (Luca 2016). We expect a more positive online discussion of a brand increases its source strength, making a consumer more likely to visit a store as peers speak highly of that brand.

H2

Disagreement of brand tweets negatively impacts foot traffic.

Recall that disagreement is the deviation in sentiment, which can come from either (1) a lack of consensus around brand opinions or (2) from a diversity of posts instead of a single, viral message (Chen et al. 2011). When there is a low disagreement (i.e., consensus) online around a brand, then users will update their perceptions and be more likely to visit that brand’s stores because there is less uncertainty. This is shown in the model by a dashed arrow that connects disagreement and strength, indicating a negative relationship. Conversely, high disagreement also indicates many users are sharing diverse opinions, providing more information, and signaling greater engagement with the brand (Cui et al. 2012). Heightened conversation can lead to more awareness for the behavior of others, more feature awareness, and—accordingly—more visits. Therefore, we hypothesize the cumulative effect of disagreement to be ambiguous.

H3

Subjectivity of brand tweets negatively impacts store visits.

We predict that fact-based conversations about a brand are more likely to discuss features of that retailer (Loria et al. 2014). If true, then more objective messages offer greater information about a retailer and increased sales. This compares to Archak et al. (2011) who found customer reviews that mention specific product features have a higher impact on sales than the customer reviews alone. We, therefore, represent subjectivity with a dashed arrow to source strength because of its negative expected relationship with store visits.

H4

Popularity of a brand positively impacts store visits.

H5

Number of likes for brand tweets positively impacts store visits.

H6

Number of followers for a user tweeting about a brand positively impacts store visits.

Popularity, likes, and followers all connect to number of sources with a solid arrow, indicating a positive anticipated impact on foot traffic. They follow the same mechanism by allowing more users to see a branded message and by making a single user see more likely to see multiple branded messages. Higher visibility of a brand on Twitter leads to greater attention—brining it front of mind to more consumers (Kim et al. 2019). Additionally, an increase in posts about a brand will signal to a user through observational learning and peer influence the value of visiting a store (Deloitte 2013; Joshi and Musalem 2021). In short, these three hypotheses contend that hype matters.

We could also connect popularity, likes, and followers to the source strength, meaning that more social media activity not only increases views but also signals a higher quality user, higher quality content, or merely a higher impact on other users who opted to follow them (Araujo et al. 2017; Ismagilova et al. 2020). However, such a specification would not change any hypotheses, as the directional impact on retail visits remains positive.

H7

Recency in days from brand tweets positively impacts store visits.

Social media activity is more likely to impact consumer behavior in the short run, as documented by Lovett et al. (2019) and You et al. (2015). Social media platforms tend to be ephemeral and microblogging sites like Twitter especially emphasize trending content over historical posts. We expect stronger short-run effects here as well and represent a positive connection between recency and immediacy.

Methodology

We test the above hypotheses on how social media activity about brands impacts store visits. Our process, shown below, follows a similar approach to Liu et al. (2016) (Fig. 3). 

Fig. 3
figure 3

System architecture

Information extraction

The above evaluation begins with raw Twitter data of mentioned brands and undergoes natural language processing to define the variables of interest. It also takes the number of tweets, the number of likes each tweet gets, and the number of followers for the user who sent each tweet to create a system of weights for the social media variables. The data is then centered and represented as various lags and moving averages. Foot traffic data is centered but not lagged. We perform a baseline linear regression followed by hierarchical linear regression on different permutations of the data before evaluating model variance, marginal effects, and causal direction of those effects.

Feature engineering

We also consider separate ways to weigh social media activity. The primary purpose is to separately test if each measure of social media activity impacts foot traffic, as described in H4, H5, and H6. The secondary purpose of weights is to serve as a robustness check on whether our results are sensitive to a specific definition of social media activity. We define three distinct weights for our measures of social media activity. First, is our preferred model where we treat all tweets equally. The second representation is to consider the engagement a post receives by weighing it against accumulated likes. For example, a brand-day sentiment of 0.5 represents average sentiment per like received. Third, we consider each tweet’s potential reach. Here we weigh measures according to the number of followers that could have seen each tweet.

The variables are then converted as the number of standard deviations from their own brand average on a particular day. This centering has the benefit of allowing for cross-brand comparisons by measuring social media activity in relative terms. Additionally, standardization improves interpretation in hierarchical linear models and facilitates convergence when estimating parameters by maximum likelihood (Hox et al. 2010, p. 63).

Next, we consider multiple temporal definitions of social media activity: same-day, previous-day, previous-3-days, and previous-7-days. To recap, social media activity is represented with one of three weights and as one of four temporal measures. We compare these twelve combinations to more widely view how Twitter impacts retail visits.

Baseline model

Consider the following model:

$$\widetilde{visits}_{tij} = \alpha + \varvec{\beta ^{\prime}\tilde{X}}_{{\varvec{\tau j}}} + \gamma date_{t} + \theta store_{ij} + \delta brand_{j} + R_{tij}$$
(1)

Observations occur at time t for store i of brand j. The dependent variable, \(\widetilde{visits}_{tij}\), has a tilde representing it is standard deviations from a store’s day-of-week average number of visits. Similarly, \(\tilde{\user2{X}}_{{\varvec{\tau j}}}\) is a vector of Twitter variables measured as standard deviations from each brand average, and β is their marginal effects on visits. The subscript τ is the different time lags for social media activity. \(R_{tij}\) is the error. Since the dependent variable is centered around zero at the store level, we expect α, θ, and δ are zero.

We first test for a relationship between social media activity and foot traffic by estimating Eq. (1) with ordinary least squares (OLS) linear regression. One important assumption of OLS, with later implications, is the independent and identical distribution of the error term \(R_{tij} .\)

Full hierarchical linear model

Now we update the baseline model to a hierarchal specification including random effects, where \({\varvec{U}}_{{\varvec{j}}}\) represents the brand-specific impact the social media variables have on visits, and \({\varvec{V}}_{{{\varvec{ij}}}}\) represents the store-specific impact.

$$\widetilde{visits}_{tij} = \alpha + \left( {\varvec{\beta ^{\prime}} + {\varvec{U}}_{{\varvec{j}}} \varvec{^{\prime}} + {\varvec{V}}_{{{\varvec{ij}}}} {^{\prime}}} \right)\tilde{\user2{X}}_{{\varvec{\tau j}}} + \gamma date_{t} + \theta store_{ij} + \delta brand_{j} + R_{tij}$$
(2)

Note that fixed effects are represented as Greek letters and random effects are Latin. The random effects express store-specific and brand-specific ways social media activity influences foot traffic for each store. Fixed effects express the constant, expected impact social media activity has on any given store within the dataset. We are most interested in β, the global impact across all brands.

There are three primary benefits for estimating a hierarchical linear model (HLM). First, it addresses the correlation of error across stores by correcting for heterogeneous brand-level effects and improving estimates of standard errors (Hox et al. 2010, p. 3). Second, HLM guards against the ecological fallacy (interpreting aggregated data at the individual level) and atomistic fallacy (interpreting individual data on the aggregate) (ibid). Finally, HLM controls for within-group dependence of errors and within-group heteroskedasticity (De Leeuw et al. 2008, p. 14). Other variable-centered methods can also meet these criteria for grouped data, such as structural equation modeling. However, HLM performs better in use cases like this where (1) there are more than two levels, (2) groups are indistinguishable (e.g., Store 1 of Panera has no relation to Store 1 of Aldi), and there are many low-level observations (Huta 2014).

Data

Our first step was to identify fifteen retailers with nationwide brand awareness ranging in market, size, and target customer. These brands represent diverse subsectors—namely—grocery/general, fast food, and specialty merchandise. We intentionally selected brands that met specific technical requirements, had available foot traffic data, and provided reliable search results with minimal false positives (e.g., searching for “Staples” stores but getting results for “staples” office supplies) or false negatives (e.g., searching for “T.J. Maxx” but not getting results for “TJ Maxx” or “T.J. Max”).The below table describes more specific exclusion criteria as which brands we removed from consideration. Ultimately, we as researchers had to make an intuitive decision as to which, of the brands of those made available to us, were most appropriate and representative for this analysis (Table 1).

Table 1 Exclusion criteria

Some of our selection criteria was driven by technical reasons, such as the Twitter API replacing punctuation with whitespaces in queries. For example, Dunkin’ was included because our query of “Dunkin” alone would return “Dunkin’” (with the apostrophe), “Dunkin Donuts,” and “Dunkin’ Donuts.” The same applies to “Baskin Robbins” which returns both “Baskin Robbins” and “Baskin-Robbins.” We excluded the brand Popeyes,” which would not have returned tweets mentioning Popeye’s.” We do not claim our results capture all tweets mentioning a brand. For example, we miss mentions of the abbreviation “DD,” which would have turned up more false positives than true positives of tweets discussing the brand Dunkin’. Instead, we content that looking at primary brand names captures a sufficient share of the conversation happening on Twitter about that brand. Also see “Social media data” and “Centering” sections for additional standardizations of the data to better represent deviations from typical behavior.

We collected 394,998 unique observations over 110 days, from November 12, 2019, to February 29, 2020—avoiding panic buying in March 2020 from the coronavirus pandemic.Footnote 2 We then created our variables of interest from the raw data, as summarized in the table below and further described within this section (Table 2).

Table 2 Construction of study variables

Foot traffic data

Foot traffic data is superior to traditional surveys since the customer reveals their preference by incurring the travel cost to visit a store in person. Foot traffic data, which comes from SafeGraph, has been used to evaluate diverse areas of research such as the financial impact of Starbucks’s open bathroom policy (Gurun et al. 2020).Footnote 3 The data only measures a subset of store visitors and likely does not represent a random sample; however, an internal analysis by SafeGraph asserts users are representative of the broader country according to multiple observable characteristics (Squire 2019).

We received store visit data from the fifteen brands defined in Table 3, which included 60,295 U.S. stores. 13.6% were dropped for missing data or presumably being closed (e.g., having fewer than two average visits a day). The brand with the most remaining stores is Dollar General with 14,518, almost double the next highest of Walgreens with 7667. Finally, we randomly select 258 stores per brand to create a balanced dataset.

Table 3 Foot traffic descriptive statistics

We calculate descriptive statistics from the 258-store samples and refer to average store-day values by brand. Costco has the highest average daily foot traffic at over 213 user visits. Home Depot follows with 73 user visits. AutoZone has the lowest average numbers of visits at less than eight. This small value reminds us that SafeGraph data measures only a subset of total visits collected from mobile phones. All brands have at least one store-day where there were no visitors, shown by the minimum column with all zeroes. The highest maximum daily foot traffic is 1473 user visits at a Dunkin’ in a convention center.

Social media data

Social media data comes from the Twitter API with search terms defined in Table 4. We collected 2.7 million tweets—each between 2 to 8 days old so sufficient time passes for “likes” to accumulate. We find more than 90% of all likes received occur within the first 2 days of a tweet being posted. There were minor technical issues, resulting in several days with missing social media data; however, this involved less than 0.73% of the brand-days under consideration. Only public, English-language tweets that mention one of the fifteen brands are considered. Replies and retweets are treated the same as original tweets, since they are visible to that user’s followers.

Recall that popularity is the daily number of tweets mentioning a brand. For sentiment we score the text of each tweet using three algorithms and average to the brand-day level.Footnote 4 The sentiment of tweets ranges from -1 (completely negative) to 1 (completely positive). Disagreement is the standard deviation of a brand’s daily sentiment. Finally, subjectivity is also scored with SentimentAnalyzer and ranges from 0 (completely objective) to 1 (completely subjective). Popularity is summed to the brand-day level and the remaining variables of interest are averaged to the brand-day level. See the descriptive statistics below Table 4

Table 4 Average brand-day Twitter variables

The first column shows the exact query used when searching for tweets of each brand. Next are the three measures of popularity on Twitter. It is unsurprising that Costco is mentioned the most in tweets and receives the most likes since this is also the brand with the highest foot traffic. AutoZone is the least popular in terms of tweets and likes but Qdoba is the least popular in terms of followers. Tweets that mention Nordstrom reach the most followers (59.7 million)—more than double that of Costco (26.4 million).

The next section of variables describes our three measures of sentiment by brand. Positivity is more common on Twitter than negativity, shown by each brand with an average positive daily sentiment. Petco has the highest sentiment across all three dimensions. Turning to the overall mean for all brands, sentiment per like is approximately the same as sentiment per tweet, indicating users on average do not “like” tweets more when they are more positive. Conversely, tweets per follower have higher sentiment, indicating more influential users are also more positive.

We next see that disagreement on all three measures is rather similar across brands, except for Dollar General which has the greatest consensus (i.e., lowest disagreement). Finally, there is subjectivity with more heterogeneity across brands. Dollar General again is an outlier as having tweets with their brand mentioned with the highest degree of subjectivity.

Centering

We next center and standardize the variables of interest, providing multiple benefits. First, centering allows for cross-brand comparisons for stores of assorted sizes and lends more credence to out-of-sample interpretations. Second, foot traffic data only represents a share of the true numbers of visitors to each store. Since that share is also unknown, it would be inappropriate to interpret the dependent variable in terms of the absolute number of store visits. Centering overcomes this problem. Third, we can control for predictable fluctuations by centering on the mean of each individual store’s day-of-week average. Fourth, this reduces the overall range of observations, and accordingly, reduces the impact of outliers.Footnote 5 Social media data, however, does not exhibit similar weekly functions, so we center Twitter variables at the brand average.

Results

Linear regression results

We first use Ordinary Least Squares (OLS) regression to estimate Eq. (1). These results, presented below, show the baseline relationship between social media activity and retail foot traffic (Table 5).

Table 5 OLS results

In this specification, virtually every measure of social media activity has a statistically significant relationship with visits. The coefficients are the same sign across different definitions and time dimensions of social media activity. These strong results provide preliminary evidence that word-of-mouth communication online impacts foot traffic to brick-and-mortar stores. Social media activity from the previous day, shown in Columns (4) through (6), also affects store visits the following day, indicating a causal relationship.

A brand’s popularity on social media has consistently positive and statistically significant effects on store visits of that brand. For example, consider Column (1) that measures social media activity on the same day of store visits. The coefficient indicates a one standard deviation (SD) increase in the average number of tweets about a brand is associated with a 0.047 SD increase in the number of visits a store expected for that day of the week. Column (5) tells a similar story, where a one SD increase in average number of likes received on tweets that mention a brand is associated with a 0.022 SD increase in average foot traffic the next day to that bran’s stores.

Increases in sentiment and disagreement about a brand are also both associated with higher expected foot traffic. Conversely, a decrease in subjectivity has a negative relationship with store visits. These results are true whether measuring on a per-tweet, per-like, or per-follower basis and whether social media activity is from the current day or previous 1, 3, or 7 days.

Unfortunately, there is one critical limitation. OLS regression assumes observations are independently and identically distributed, and any violation will lead to biased estimates. Given the nested structure of the data, it is unreasonable to presume daily visits do not fluctuate in a way that is store-dependent or brand-dependent. In other words, OLS assumes there is no random error unique to a store or brand (Snijders and Bosker 2011, p. 46). Additionally, OLS estimates global coefficients, meaning the relationship between social media and foot traffic is assumed constant across stores and brands. The fixed effects, shown in Eq. (1), control for the level of visits by day, store, and brand, but they do not control for the way brands react differently to online communication. This heterogeneity is evident below.

Figure 4 shows scatter plots for select brands of daily store visits (as measured in standard deviations from each store’s day-of-week average) and daily popularity for a brand on social media (as measured same-day, per-tweet, and in standard deviations from the brand average) as well as the line of best fit for these two variables. The first store, Qdoba, shows a positive relationship between an increase in the number of tweets about Qdoba and an increase in foot traffic. Hobby Lobby also exhibits a positive relationship, but the relationship appears stronger than that of Qdoba, shown by the steeper slope. There appears to be no relationship between popularity on social media and foot traffic to Pizza Hut, as seen by the flat line. Surprisingly, the scatter plot for Costco reveals a negative relationship visible in the downward sloping line. This is comparable to the heterogeneous impact of flyers on grocery store foot traffic observed in Gijsbrechts et al. (2003, p. 12). Overall, Fig. 4 provides compelling evidence of heterogeneity between brands, and that a random effects model can allow for the varying ways in which social media activity impacts foot traffic to stores and brands differently.

Fig. 4
figure 4

Scatter plots of select brands

Hierarchical linear regression results

We use an HLM regression to estimate Eq. (3) below. This is equivalent to Eq. (2) without the fixed effects for stores and brands. Excluding those factors makes it easier for the model to converge and should not impact the estimation due to the centered dependent variable.

$$\widetilde{visits}_{tij} = \alpha + \left( {\varvec{\beta^{\prime}} + {\varvec{U}}_{{\varvec{j}}}^{{^{\prime}}} + {\varvec{V}}_{{{\varvec{ij}}}}^{{^{\prime}}} } \right)\tilde{\user2{X}}_{{\varvec{\tau j}}} + \gamma date_{t} + R_{tij}$$
(3)

We estimate Eq. (3) using the lme4 package in R and present results in Table 6.Footnote 6

Table 6 Full HLM results for same day and prior day social media activity

Columns (13) through (15) evaluate the same-day impact social media has on retail foot traffic to stores while Columns (16) through (18) use social media activity from the previous day. Furthermore, each column measures social media activity differently, either per tweet, like, or follower. We first draw attention to the Likelihood Ratio (LR) Tests that show with a 0.001 level of significance how all HLM models are better specified than OLS.

We now turn to the random effects component of the model. The variance is small (< 0.01) for all Twitter coefficients compared to a residual variance of 0.78. This indicates that our Twitter variables are only capturing a small portion of the total variability in foot traffic, which intuitively makes sense given how social media is far from the dominant reason consumers visit a store.

We next consider the fixed effect component of the model. The same widespread statistical significance from OLS in Table 5 is absent. The popularity of a brand on social media is associated with a positive and statistically significant (at the 0.1 level) increase in foot traffic to those brands stores. For example, consider Column (13) that shows a 1 standard deviation (SD) increase in the popularity (measured here as the number of tweets sent that same day) will lead to a 0.0451 SD increase in average store visits.Footnote 7 This effect, however, is not found in Column (14) when social media activity is measured on a per-like basis. Disagreement has a comparable effect. When disagreement per tweet increases by 1 SD, on the same day, average store visits are 0.0364 SD higher. On the other hand, measures of sentiment and subjectivity on social media do not have a statistically significant impact on foot traffic in any of the three specifications.

It is important to consider how endogeneity enters Columns (13) through (15). The model framing implies social media drives foot traffic to retail stores, but it is also plausible that foot traffic leads social media users to post about the brands they recently visited. To address this possible reverse causality, we also consider how the previous day’s social media activity impacts store visits the following day. As shown in Columns (16) through (18), lagged social media activity still has a statistically significant relationship with foot traffic. We rule out concerns of reverse causality here since we believe it implausible that customers visit a store because they anticipate tweeting about it the next day.

Two of the three social media measures show a 1 SD increase in popularity leads to a 0.0245 to 0.0364 SD increase in store visits and is statistically significant at the 0.05 level. Unlike in the same-day measures, we now see sentiment having a positive and statistically significant effect on foot traffic in two of the three specifications. Finally, we see disagreement having a similar positive and significant impact and subjectivity still having no discernible impact.

We now consider multi-day averages of social media to discern how longer time horizons of online activity effect changes in store visits, shown below. Columns (19) through (21) reveal how a 3-day average of social media activity has a weaker impact on foot traffic than the previous day alone. Popularity is qualitatively the same, with two of the three measures being positive and statistically significant, but now sentiment and disagreement are only statistically significant (and positive) in one of the three measures. Furthermore, subjectivity is never statistically different from zero. Turning to Columns (22) through (24), only one of the twelve combinations of measures is statistically significant at the 0.01 level or below within a 7-day average. This demonstrates social media’s effects are short-lived to only a few days, which is consistent with other research (Lovett et al. 2019) (Table 7).

Table 7 Full HLM results for 3- and 7-day average social media activity

The results are also meaningful in how different they are from the previous WoM communication literature. For example, our popularity elasticities and sentiment elasticities are approximately one-tenth and one-twentieth, respectively, similar elasticities measured in the meta-analysis by You et al. (2015). This also follows Pauwels et al. (2016) who find WoM elasticities between online reviews and in-person sales to be half the elasticities between online reviews and online sales. This makes intuitive sense that the further away online activity is from the point of sale, the lesser the impact it will have. Our results provide a sobering reminder that not all online activity is created equal and that we should expect estimated elasticity values to change between communication platforms.

Evaluating hypotheses

We now test the seven hypotheses from “Hypotheses” section against the hierarchical linear model results (Table 8).

Table 8 Summary of hypotheses and results

Sentiment is not statistically significant in the preferred measures of social media on a per-tweet basis. However, when measuring social media activity on a per-like or per-follower basis, positive emotions lead to similarly positive increases in foot traffic to those brands’ stores. The overall effect of sentiment is limited but positive in all instances where the variable is statistically significant. For this reason, we conclude there is modest, yet ultimately inconclusive, evidence H1 is true.

Disagreement on social media, as measured by the standard deviation of sentiment, is also positive and statistically significant in some specifications. This result is contrary to H2, where we expected lower disagreement corresponds to message strength about a brand. The finding is like Cui et al. (2012) who show that divergence in sentiment on social media better predicts new product sales than sentiment itself. Subjectivity is not statistically different from zero in any model—a stark lack of evidence to support H3. In testing H1–H3, we find that source strength of the Social Impact Theory is meaningful for our research question, indicating that the effect is weakly tied to these measures and overall difficult to measure.

Next, popularity broadly has a positive impact on foot traffic, as speculated in H4. This follows Cui et al. (2012) who show popularity is more important than sentiment for experience products, which would include shopping experiences such as the retailers considered here. In testing H5, we find that the number of likes—at times—is associated with higher levels of retail visits, but the effects are less than the per-tweet impact from H4. Finally, there is virtual no effect on store visits when measuring social media activity on a per-follower basis. Thus, we fail to reject the null hypothesis in favor of H6. Overall, our results show a statistically significant effect across most, but not all, specifications of measuring the number of sources within the Social Impact Theory. We, thus, give a cautious interpretation that online popularity generally leads to more foot traffic; however, context can shape how online popularity matters.

In H7, we speculated that more recent tweets have a stronger effect on foot traffic than later tweets. We find a more nuanced result where social media has weak effects on the same-day decision to visit a store, stronger effects in the next 1 to 3 days, and virtually no effect by 7 days. This is a clear signal that the immediacy of a source, as stated in the Social Impact Theory, corresponds to higher foot traffic; however, it may take a day or two for an action to occur.

Discussion

The overall results point to a modest but economically meaningful relationship between social media activity and retail foot traffic. In a world where customers have thousands of reasons to visit a store, these results show a sliver of motivation comes from word-of-mouth communication on social media. We qualify these outcomes according to the limitations of the study. The data comes from fifteen high-recognition brands with stores located around the United States. Therefore, these results point to the impact social media activity has on established, well-known brands.

The results also demonstrate how the definition of social media activity matters. We recommend focusing on the per-tweet models, as that is the most widely used in the literature. The results are dampened when measuring on a per-like or per-follower basis; however, the values are often either directionally the same or virtually indistinguishable from zero. This broader consistency shows the results in the paper reflect a global impact that is present but modest. By comparing the weaker hierarchical results against the stronger OLS results, we conclude that the predominant impact social media has on retailers is specific to the brand and the store. Companies, therefore, should be skeptical of external research and instead quantify the unique ways social media impacts their own business.

This brand salience relates to Luca (2016), who finds a higher rating on Yelp.com will raise a restaurant’s revenue, unless it is a national chain. Online reviews also have a lower impact on video game sales (Zhu and Zhang 2010) and DVD/Blu-ray sales (Ho-Dac et al. 2013) when the product is a so-called “strong brand.” Importantly, these results focus on a generalizable impact of social media, but there is still wide heterogeneity between brands and even stores of the same brand. Much of the previous literature showed how social media influences product sales, but we were able to show that social media about a retail brand can impact foot traffic. Our results also demonstrate how these empirical effects are sensitive in how social media is measured.

Since we find wide evidence of store and brand heterogeneity, marketing managers should evaluate the distinct, and sometimes opposing, ways social media activity influences their own customers and stores. We also suggest marketing managers can use publicly available social media data to forecast retail foot traffic. Our results show that activity on Twitter can serve as a leading indicator to future store visits, suggesting a direction of causality. The demand signal is most effective for measuring next-day activity; however, it fully dissipates by 7 days. Finally, brand managers should monitor the less-common measure of sentiment spread (i.e., disagreement), which we find is more important than sentiment itself.

This article also contributes to diverse academic fields. In economics, the research shows how social media provides a demand signal to monopolistically competitive firms. In marketing, the article demonstrates that using hierarchical data in a nonhierarchical model can lead to overstated claims in how the predictors are associated with the outcome variable. In communications, we describe and formulate methods for considering distinct ways to quantify social media activity in terms of measures, weights, and time horizons. We also believe hierarchical methods deserve more attention in the economics and marketing literature, where there are many opportunities to measure how variables on one level affect the outcome on another level. Some laudable exceptions, however, include Aiello and Bonanno (2018) assessing the profitability of banks nested within local markets and Keller et al. (2019) assessing the effect of a promotion event on sales of brands nested within retailers.

Conclusion

The first aspect of our research was to identify which measures of social media activity led to changes in retail foot traffic. Here we identified that the disagreement of brand mentions on Twitter has a moderate impact on store traffic, but sentiment has no discernable effect. Although brand disagreement is rather consistent, the magnitude and significance of brand sentiment becomes positive and meaningful when defined on a per-like or per-follower basis. Subjectivity has no discernible influence. We show that the strongest effect of social media activity on foot traffic comes from the number of tweets that mention a brand, and next from the number of likes received by tweets mentioning a brand. However, increases in the number of followers of users that mention a brand also has no discernible effect. Finally, the recency of posts matter. Same-day social media activity around retail brands has minimal impact on same-day store visits, but the effect is greatest within 1 to 3 days and negligible within a week. These results support the Social Impact Theory, showing how the number sources, the immediacy of a source, and—to a lesser extent—the strength of a source all lead to greater changes in consumer behavior.

The second aspect of our research question was to estimate the marginal impact of changes in social media activity on changes to the number of retail visits. We first identify the wide heterogeneity between the ways traffic to individual brands and individual stores responds to changes in online chatter about a brand. We then control for that heterogeneity using a hierarchical linear model (HLM). The resulting elasticities provide better estimates than could be obtained from ordinary linear regression, which is based on overly strong assumptions of error independence among nested observations.

For example, we find that a one standard deviation increase in brand mentions or disagreement will increase foot traffic the next day to stores of those brands by about 0.04 standard deviations (3–4%). Although this may appear small, it is economically significant when considering all stores of a national brand. Overall, the results point to a modest but meaningful relationship between social media activity and foot traffic to retailers.

We provide improved global estimates for recognizable brands outside of our sample; however, the diversity we observe should provide a sober reminder that all measures represent average effects. Therefore, we recommend that more applications of hierarchical linear models where there is nested data, such as product sales within stores. At the same time, we recognize there are limitations in how average effects can be applied to business situations with wide heterogeneity, such as our research problem. Because of this, we recommend reproducing the methodology outlined here to estimate the unique way social media can impact other organizations. This paper shows how Twitter, being publicly available, is a useful source of large-scale, online word-of-mouth communication that can signal near-future consumer demand.

There are multiple ways to expand this paper. First is to test if these results hold across multiple platforms, each with their own unique set of consumers and rules on how users interact and share information. It would be of note if the magnitudes outlined here apply to a larger, more private virtual space like Facebook. Second, it would be meaningful to see whether the described elasticities compare to lower-recognition retail brands. Luca (2016) suggests how more local brands are more susceptible to online reviews—future research could test if this also applies to other online formats. Third, although we did not find any long-term effect of social media on retail foot traffic, our analysis was limited to daily measures. An expanded model could consider the impact of accumulated social media impressions on store visits. Finally, we built a framework based on the Social Impact Theory to measure the directional influence social media. Additional research could relax this perspective and consider if retail visits lead to more activity on Twitter.