1 Introduction

Human immunodeficiency virus (HIV) is a serious and challenging public health concern affecting an estimated 38 million people globally [1]. Although the infection cannot be completely cured, recent research has shown that vigilance by regularly monitoring personal health information (e.g., medication adherence, CDU counts, viral loads) may improve the health and wellness of people living with HIV. Likewise, preventive and cautionary measures in the form of informational resources can potentially decrease the vulnerability of acquiring HIV. Affected people, their partners, and family members utilize various sources (e.g., primary care providers, community support groups, printed material, and online channels) to seek guidance and support about HIV. Furthermore, information disseminated through some of these resources is also targeted to the general public at large.

HIV surveillance systems play a significant role in public health monitoring. They inform and support HIV prevention and intervention initiatives such as United Nations 90–90-90 and Ending the HIV epidemic: A plan for America by the U.S. Department of Health and Human Services (HHS). HIV surveillance systems have progressed exponentially over the years, yet several key challenges still prevail. The traditional data sources for monitoring HIV are often inaccurate, incomplete, unrepresentative, and partially disconnected from each other [2, 3]. These discrepancies limit the intelligence to track, report, and respond to emerging HIV epidemiological trends periodically and effectively.

Recent advances and the prevalence of digital technologies have exponentially expanded the opportunities for sharing and seeking public health information. By aggregating and analyzing data from various sources (e.g., social media, search engines, mobile phones, location services, crowdsourcing, and wearable devices), digital surveillance complements traditional public health surveillance systems [3,4,5]. Novel potentials of digital data (such as real-time, organic, and volume) from location logs, web searches, and online reviews have supported in tracking, predicting, and preventing diseases [6, 7] as well as other critical public health concerns such as substance abuse [8], public attitudes towards vaccination [9], and self-injurious thoughts and behaviors [10]. Similarly, social media has recently emerged as a significant vein of digital surveillance systems. Platforms such as Twitter, Facebook, Yelp, and Instagram offer “novel settings” and have been increasingly used for communicating and understanding numerous public health and wellness issues. A wide range of information and resources facilitate social interactions between care providers and patients and among the public to support discussions about various health issues. Due to promising contributions and implications to the public health surveillance system, there is an increased interest in developing tools and techniques utilizing digital data originating outside the public health system that usually relies on the hospital, outpatient, or laboratory-based systems.

New media technologies, including online forums, computerized counseling tools, gamified interventions, specialized portals (e.g., CDC HIV Resource Library, WHO AIDS, and HIV.Gov), and social media facilitate the delivery of a wide spectrum of information on HIV, STD, and, AIDS. These digital solutions offer evidence-based HIV education, preventive measures, testing promotions, and treatment and adherence program [4, 11,12,13]. The latest research focusing on HIV prevention and care point out a significant rise and interest in people seeking, sharing, and engaging with information on social media [14, 15]. Within the context of social media surveillance for HIV, most of the work focuses on developing and testing interventions aimed at risk reduction and prevention [16, 17]. In understanding the vulnerable or high-risk groups, statistics reveal that the “risk of acquiring HIV is 26 times higher among gay men or other men who have sex with men (MSM), 29 times higher among people who inject drugs, 30 times higher for sex workers, and 13 times higher for transgender people” [1]. Studies spanning social media interventions tailored for vulnerable, high-risk, and underserved communities show promising potential for disseminating information and effective delivery of interventions for HIV care and prevention [18].

Given the global prevalence of social media and its emerging role in public health surveillance, the current study sought to investigate the HIV-related discussions prevailing on social media. Intending to advance the current understanding of social media conversations about HIV, our study seeks to address the following research aims: (1) determine the characteristics of the most engaging tweets; (2) comprehend the salient events that drove engagement, and (3) identify the prevalent HIV-related discussions on Twitter. Answering these research questions will enable us to recommend relevant implications and facilitate subsequent discussion regarding HIV surveillance systems.

2 Methods

The current study consisted of three phases. We employed the Twitter API to extract data during the first stage by querying hashtag #HIV during a one-year period (November 12, 2018, to November 12, 2019). In total, we managed to collect 160,658 tweets together with associated metadata (tweet full text, numbers of retweets and favorites for each tweet, number of followers and friends, and geolocation of the user). The retrieved data was consequently cleaned by removing the duplicate tweets (n = 2948), tweets containing URL(s), mentions, or hashtags only (n = 10,023), and non-English tweets (n = 24,880). The cleaning process resulted in an analytical sample of 122,807 tweets.

We pre-processed the analytical sample during the second stage to transform data that can be reliably used for further analysis. All the standard techniques, including stop words removal, basic normalization, and lemmatization, were applied to the tweets. Furthermore, URLs, hashtags, and non-printable characters were removed from the corpus.

In the final stage, we conducted multiple analyses to answer our research questions. An engagement analysis provided insights about the prominent users and descriptives of the top 20 tweets concerning the associated number of retweets and favorites. The temporal analysis was conducted to chart out the volume of tweets per week. The list of hashtags separated during the pre-processing stage was used to get the 20 frequently used hashtags in the dataset. Finally, to infer the salient topics within the analytical sample about HIV-related conversations on Twitter, Latent Dirichlet Allocation (LDA), an unsupervised generative modeling technique was applied [19]. LDA is a prominent topic-modeling approach for discovering clusters of documents (tweets in the current case) through a representative set of words. Within each cluster, words bearing the highest weights provided us an overview of the topic. The algorithm further requires the number of topics as an input. The extraction of the number of topics using the LDA model was based on the coherence score of models. The topic coherence score measures a topic by calculating the degree of semantic similarity between high scoring (high probability of the occurrence) words. A higher coherence score is used to determine the optimal number of topics and indicative of a better model as the similarity in the words within each topic is higher [20]. For the model selection process, multiple LDA models were trained to find the one that has the highest coherence score and the lowest number of topics. During the model selection process, an elbow plot is used to assess and pick the model that gives the highest score before the curve flattens (see Fig. 1). Our final model specified 8 topics across the tweets dataset used as it indicates an optimal balance between high coherence score and the number of topics.

Fig. 1
figure 1

Topic model graph with coherence scores and number of topics

One of the authors labeled the topics by going through 10 frequently used words ordered by their weights within 20 tweets with the highest probability about the topic. The topic labels were refined after a mutual consensus was reached among all the authors. Following Twitter’s privacy policies and terms of use, usernames and tweets were not reported verbatim in the current study.

3 Results

3.1 Engagement analysis

Engagement on the #HIV tweets was assessed on two dimensions i.e., favorites and retweets count. Engagement through favorites (or likes) and retweets (or shares) are robust measures for understanding active user participation on social media [21, 22]. To determine the most engaging tweets within the analytical sample, we identified 20 tweets with the highest numbers of favorites and retweets (see Table 1). Additionally, the user type and location of the user account were also recorded manually by visiting each of the Twitter profiles in the list. Tweets of users hailing from Europe (n = 9) followed by users from North America (n = 4) received the highest number of retweets. Likewise, the highest number of favorites were recorded for the users originating from Europe (n = 12) and North America (n = 4). Concerning user type, activists received the highest number of retweets (n = 7) and favorites (n = 8), followed by physicians and personal accounts.

Table 1 Top 20 tweets with highest retweets and favorites

3.2 Temporal analysis

Conducting a temporal analysis provided us a deeper insight into the volume of #HIV tweets over a period of one year. This analysis guided us to better understand the triggers through increased activity. Within the data corpus, we were able to identify and demarcate some of the major events behind the peaks. Some of the key events where Twitter users responded actively include World Aids Day, National Black HIV/AIDS Awareness Day, National Women and Girls HIV/AIDS Awareness Day, and National HIV Testing Day (see Fig. 2).

Fig. 2
figure 2

Temporal Analysis of #HIV: Days view

3.3 Content analysis

Based on the LDA model, 8 topics about HIV discussions on Twitter were identified. In the first topic, the discussion relates to HIV-associated stigma that contains keywords people, stigma, uequalsu, treatment, and status. Topic 2 focuses on the HIV prevention and care facilities containing keywords such as “care”, “prevent”, “treatment”, “support”, “women”, and “communities”. The keywords such in Topic 3 that include “epidemic”, “people”, “countries”, and “fund” capture the commentary underlining the situation and need for support in the developing countries. Tweets from three topics draw on activities and events that include World Aids Day, HIV Testing Day, and academic conferences. Finally, two topics (5 and 7) associate collectively with tweets about HIV treatment and cure and discussions about PrEP. Table 2 displays each topic together with words identified by the topic model and a representative tweet.

Table 2 Topic clusters identified through topic modeling

Computing the frequency of hashtags within the dataset provided us further insights into the key topical themes being discussed concerning HIV. One of those topics is about diseases relevant to HIV that include #AIDS, #STI, #STD, and #TB. Another theme of tweets relates to campaigns and events including #WorldAidsDay, #UequalsU, and #KnowYourStatus. Prevention and care discussion can be observed from the #PReP and #HIVPrevention. Finally, tweets relevant to high-risk communities that are often marginalized and stigmatized are represented through #LGBTQ and #Gay.

4 Discussion

Harnessing data from new media platforms for epidemiological research is an emerging approach that complements traditional public health surveillance systems. Real-time discussions and trends from social media provide timely and accurate reflections of critical public health issues that can facilitate prevention, control, and treatment efforts. Several studies have utilized data from social media to detect and assess the spread of pandemics and infectious disease outbreaks [23, 24], delivering interventions [25], and assessing public attitudes, behaviors, and concerns towards health and wellness topics [26,27,28,29]. For instance, Fung et al., 2017 studied global health-related tweets for malaria, HIV, tuberculosis, and other tropical diseases showing the conversations by advocates, policymakers, and health professionals [30].

Seminal scholarly work has also shown the utility of Twitter in a broader understanding of public opinions and sentiments to support the prevention, care, and treatment of various diseases, including diabetes and flu [31,32,33]. Within the context of social media surveillance for HIV, most of the work focuses on developing and testing interventions aimed at risk reduction and prevention [16, 17, 34]. In other instances, major yearly events have been studied to understand the highlighting of such global issues. A case study of HIV/AIDS-related tweets around World AIDS Day examined and found differences between user responses based on variations in income across countries [35]. In the current study, we extend the body of knowledge within the public health domain by examining the engagement factors, reasons for surges in the activity, and key topics being discussed about HIV on Twitter; a highly popular social media platform.

Based on the temporal analysis, we identified the triggers (major events and influencers) behind the amplified tweet activity (see Fig. 2). From the engagement perspective, tweets from the influencers received a higher number of retweets and favorites. In addition to World Aids Day and HIV Vaccine Awareness Days, which are celebrated to raise awareness about the HIV/Aids pandemic, many other annual campaigns dominated the tweets and engagement activity. The events detected through the activity peaks are predominantly US-based awareness campaigns targeting specific demographics such as women, blacks, transgender people, and seniors. Given the efficacy of reaching high-risk populations and stigmatized groups through social media [15, 36,37,38], promoting HIV, AIDS, and STD/STI-related campaigns through Twitter can raise awareness among these cohorts and deliver ensuing interventions. Concerning the engagement analysis (retweets and favorite count), the most engaging tweets were sent from user accounts based in Europe and North America. Tweets from HIV/AIDS/LGBTQ rights campaigners and feminist activists were also highly engaging. Tweets of physicians and employees of public health agencies also received high engagement. Given the large follower base of global and local organizations, a handful of them managed to receive high engagement. This finding may suggest that Twitter users endorse and trust individuals working to safeguard the interests and rights of people affected by HIV and AIDS. Organizations involved in HIV/AIDS awareness and activism might need to team up with prominent entities such as leading activists and celebrities to project their messaging to a wider audience. Given the strong potential to reach an extensive and diverse audience through these platforms [39], further social media participation is required from entities working for HIV and AIDS awareness and treatment in developing countries, as well as media organizations.

Results from the topic modeling reveal a wide cluster of conversations about HIV on Twitter. The identified topics can be broadly categorized under five key themes i.e., events and activities, preventive measures, treatment options, stigma, and call for additional resources. Campaigns and events such as World Aids Day, HIV Testing Days, and U = U received greater projection. Among these tweets, individuals and other entities encouraged and stressed their followers to know one’s status by getting tested for HIV. This finding also links to the temporal analysis where testing and awareness days for at-risk and marginalized communities such as transgender people, women, and seniors are observed. Treatment options are another significant theme of the discussion that focuses on antiretroviral therapy (ART), care locations, and the latest advances in scientific research. Even though no effective cure is available for HIV, diagnosis and getting proper medical care can slow down the progression of HIV in the human body.

Another important theme relates to tweets reflecting various modes of preventive measures. This stream of tweets reflects preventions that may slow down the spread of HIV. Many tweets point out different sites for testing and clinical care within the prevention and care facilities topic. Several tweets also discuss individual counseling, screenings, and access to biomedical interventions such as PrEP. Furthermore, several references point out self-management tools, games/gamified solutions, and smartphone apps for the prevention and treatment of HIV. These findings echo with prior research highlighting the potential of new media technologies for disseminating innovative interventions and building virtual communities, particularly among key communities such as sex workers, prisoners, transgender people, and PWID [5, 40, 41]. Similarly, social media platforms have been deemed effective for promoting general awareness, behavioral modification, and HIV testing [34, 42, 43].

PrEP is another relevant theme that deliberates various medications and subsequent concerns related to their costs and side effects in conjunction with preventive measures. Although PrEP is a relatively new mode of HIV prevention, it is regarded as one of the most significant breakthroughs for HIV prevention. Recent studies suggest that in addition to associated stigma, accessibility, and social-cultural barriers, high costs and concerns related to side effects are some of the key barriers responsible for the slow uptake of PrEP [44, 45]. Even though some of the concerns are being discussed on Twitter, there is a strong need to use social media more effectively to counter these concerns and conspiracy-related beliefs regarding PrEP. Several studies indicate that social media can facilitate PrEP uptake and care that caters to treatment initiation, monitoring, adherence, suppressions, and linkage to care facilities [5, 46, 47]. Likewise, social media interventions targeted towards the population at high risk for HIV acquisition (e.g., Transgender people, MSN, and youth) can also be an effective way to reach them.

Finally, stigma is another topic of imperative significance observed in the dataset. Despite numerous efforts ranging from global to community scale, there is a wide prevalence of HIV/AIDS stigma bundled with misinformation. Stigma not only imposes hardships on individuals living with HIV but also on their loved ones and communities. Social media platforms such as Twitter provides social (e.g., networking, communal events), emotional (e.g., seeking empathy and voicing emotional states), and informational support (e.g., requesting and sharing information and resources) that can facilitate coping with the stigma associated with HIV. Furthermore, these platforms can effectively reach and serve people at higher risk of HIV, such as bisexuals, gays, and other vulnerable groups in various world regions [1, 15, 36,37,38]. There is also a need to understand platform-related policies, legal frameworks, and societal norms that may lead to varying levels of discussions on certain issues such as HIV. For example, Youku, Weibo, and WeChat in China, and VK in Russia are working under different content sharing models. HIV-positive people use social media to create social ties and build a sense of community, such as accessing and sharing health information and obtaining emotional support with others alike [36, 48]. Likewise, these platforms can also be helpful in the update of treatment adherence and accessing prevention and testing services. At the same time, many tweets within the stigma topic emphasize the de-stigmatization of PLWHIV and misinformation associated with the causes and spread of the virus. One such example revealed during the analysis is the U = U campaign aiming to dismantle HIV stigma by encouraging testing and treatment [1]. Given that individuals with stably suppressed viral load cannot transmit the virus to others, social media may further reinforce the existing consensus and refute associated misinformation.

4.1 Limitations and future directions

The current study solely focuses on Twitter, a platform prevalent among adults. Future research on HIV and AIDS should further investigate the sharing behaviors, engagement patterns, and conversations on notable social media popular among youth and senior citizens such as Instagram, YouTube, Pinterest, and Facebook. Analysis of tweets in only one language could also serve as a limitation. As the current study analyzed English-only tweets, there could likely be other engaging tweets from users based in non-English-speaking countries. The addition of non-English tweets such as Chinese, Russian, Spanish, Arabic, and Urdu may result in a nuanced understanding of salient topics and issues. Likewise, comparative studies delving into social media platforms popular among communities and regions may also provide nuanced insights. Big data analytics may not capture nuanced insights into the communication style of users originating from different geographical regions or personal backgrounds. Considering this limitation of the current study, qualitative studies employing social media data may support extending our current understanding of why specific groups communicate about HIV in a particular way. Given that social media, notably, Twitter is being used actively and deliberately to spread fake news and misinformation (e.g., vaccine safety and efficacy), researchers should investigate the extent of HIV. There is a further need to understand how effectively hospitals, health agencies, and HIV/AIDS associations are leveraging the power of social media to dispel fear and myths associated with HIV in general, transmission routes, antiretroviral treatments such as highly active antiretroviral therapy or pre-exposure prophylaxis (PrEP).

4.2 Conclusions

The current study utilized publicly available Twitter data on HIV by applying a data-driven approach to understand the conversations and key entities and events driving the engagement. Findings from the current study provide actionable insights that may further support enhancing awareness and prevention messaging, developing testing services, bolstering PrEP uptake, and optimizing clinical and community interventions. The wide adoption of social media and its use for health purposes as a participatory and interactive source hold significant implications for HIV education, care, and prevention efforts. Social media holds great potential in digital surveillance and interventions related to HIV as it is rather easier to reach diverse, underserved, ethnic, and high-risk groups. Effective use of social media by offering relevant, timely, and factual information and interventions can be highly cost-effective and can change individual behaviors and social norms about HIV.