1 Introduction

With the rapid development of the Internet, crowdfunding has emerged accordingly (Mollick 2014). Zhang and Chen (2019) defined crowdfunding as a fundraising method in which individuals, groups or companies raise funds from the public through the Internet (Cox et al. 2018; Kim et al. 2017). It’s reported that the total amount of funds raised has exceeded US$5.2 billion, and the total number of participants has exceeded 18.47 million in Kickstarter (Kickstarter.com August 2020). Crowdfunding brings a totally new channel of fundraising for straterups which often encounter insufficient resources in the early stage (Cosh et al. 2009; Kim et al. 2017). Different from traditional fundraising channels, fundraisers even could build relationships with investors on the platform, and eventually become partners or customers. Moreover, entrepreneurs can also use crowdfunding to attract customers to participate in the development of new products to improve products and better meet consumer needs (Petruzzelli et al. 2019).

But it’s reported that the success rate of crowdfunding is very low. According to statistics of Kickstarter on August 2020, the overall success rate is only about 37.97%. It can be seen that initiating a crowdfunding project is not easy to succeed. Consquently, how to improve the success rate of crowdfunding projects is one of major concerns of all fundraisers. To achieve this goal, lots of researchers have paid attention on this issue. For instances, Liang et al. (2020) found that the number of pictures, the number of videos, the number of comments, and the number of updates have a positive impact on the success of crowdfunding, but the readability is negative. Fernandez-Blanco et al. (2020) found that comments and updates are helpful to the development of crowdfunding projects. Wang et al. (2017) believe that the text content of the crowdfunding project description will affect the decision-making of supporters. Lagazio and Querci (2018) detailed text content is more effective than introduction videos. Zhou et al. (2018) found that the length, readability, semantics of the project description and the fundraiser’s past fundraising experience will affect the success of fundraising. Kim et al. (2017) stated that detailed project descriptions have a positive impact on the success of fundraising projects. These studies reported that the project description will affect the success of the crowdfunding project, but there is no research to tell fundraisers which success factors should be included in the content of the project description.

Moreover, related works usually use qualitative research methods with data surveyed by questionnaires. In addition to requiring a lot of manpower and time, there may also be sampling bias. In recent years, text minig and feature selection have been succesfully applied to deal with the huge text reviews in social media (Wang et al. 2017). Feature selection mainly processes numerical data, although the natural language processing technology (NLP) in text mining technology can process text data. But, if we only use both feature selection and text mining approaches, we cannot obtain organized factors. The selected terms will be quite difficult to be explained, and the number of extracted features will be very huge (Chen et al. 2015).

Besides, in terms of total fundraising amount and number of projects, game projects have been ranked as top one in crowdfunding platdforms. Unfortunately, only few existing literatures focus on the research of game crowdfunding. To solve this problem, we present a lexicon-based feature selection method which aims to discover the crucial content features of project descriptions toward “video game” and “mobile game”.

To sum up, this study will collect real video games and mobile games projects as our research data. In our proposed method, we attempt to define “content features” that may affect the success of game crowfunding projects based on literature review, and establishing lexicons to determine the value of features. The natural language processing (NLP) technology in text mining can be used to process collect text data. Then, three feature selection methods including decision trees (DT), Least Absolute Shrinkage and Selection Operator (LASSO), support vector machine–recursive feature elimination (SVM–RFE) will be employed to select candidate key successful factors, and finally support vector machines (SVM) will be performed to evaluate the performance of candidate feature subsets. The discovered key success factors can provide fundraisers with a basis when establishing crowdfunding projects to help them increase the success rate of video game and mobile game projects.

2 Literature review

2.1 Crowdfunding

Crowdfunding is an innovative fundraising method for many entrepreneurs (Yang et al. 2020; Borrero-Dominguez et al. 2020). Traditional fundraising methods are very difficult for most of startups because they cannot propose equivalent values (Ramadani 2009). So, investors cannot assess risks (Schwienbacher and Larralde 2010). Unlike traditional fundraising channels, crowdfunding has lower costs and is very suitable for startups, non-profit organizations and individuals (Vanacker and Manigart 2010; Liang et al. 2020). Now, crowdfunding has become one of the most important fundraising methods (Cumming et al. 2020).

In crowdfunding, fundraisers initiate projects on platforms, express their products, services or ideas through projects, and use the power of the Internet to obtain funds (Ziegler et al. 2018; Hollas 2013; Colgren 2014; Mollick 2014; Mollick and Robb 2016). Gerber et al. (2012) believe that the use of crowdfunding can also build relationships and form a social network. While sponsors invest to obtain income or return, some sponsors wanted to gain recognition from others (Bretschneider et al. 2014). Results of crowdfunding could be considered as an indicator of a new product or service before entering the market (Mollick 2014; Meyskens and Bird 2015).

Most of researches aim to discover the success or failure factors of crowdfunding projects (Short et al. 2017), but there are also some studies that explore the contribution behavior and motivation of supporters (Xu et al. 2016; Cox et al. 2018; Shneor and Munim 2019). Other scholars have studied game crowdfunding projects. For examples, Lax (2017) interviewed crowdfunding creators in three different game industries. He found six success factors, including project rewards, project goals, the quality of products and projects, project team, fundraiser information and project preparation. Smith (2015) discussed the fundraising process of game fundraising projects and observed common occurrences between developers and players to study how interaction affects product development and production. Some published works (Wang et al. 2017; Kim et al. 2017; Lagazio and Querci 2018; Zhou et al. 2018) have reported that the project description is important for the success of the crowdfunding project, but no available studies can tell fundraisers which success factors should be included in the content of the project description. Therefore, this study will use the real game crowdfunding project for experiments and analysis, and try to discover key content features of game crowdfunding projects.

2.2 Factors of games

From the available literature, relatively few studies focus on game crowdfunding. In order to find the key successful factors of game crowdfunding projects, this study attempts to find the potential factors related to games from literature. For examples, Caci et al. (2018) analyzed players who play “Pokémon GO”, discussed the interaction between players’ play motivation, player’s personality, and play habits, and conducted research on individual differences in play motivation and personality characteristics. The work of Chan et al. (2017) shows the information, interactivity, and entertainments of mobile games have a positive impact on the trust and loyalty of players. Hsiao and Chen (2016) identified the factors that affect the perceived value and loyalty of games, including fun, flexibility of play time, interactivity, and rich rewards. Chen et al. (2010) defined 19 online game quality elements for massively multiplayer online role-playing games. Zhao and Fang (2009) showed that online game technical factors (game stories, game images, game duration, game control, and the quality of game service) has a significant impact on the game fun of players; game fun and social norms have a positive impact on game intentions; social norms, the quality of online game communities and game intentions are important predictive indicators for online game loyalty. Thereofre, we used results of literature review to build content features.

2.3 Text mining

Text mining can be regarded as a process of editing, organizing, and analyzing a large number of documents. Its main task is to convert text into data for analysis through language analysis and natural language processing (Dreisbach et al. 2019). In this era of information explosion, these large amounts of unstructured or semi-structured text data need to be processed with text mining technology to discover hidden knowledge (Thomaz et al. 2017). Miner et al. (2012) defined the main procedures of text mining as data retrieval and processing, word segmentation, feature selection, classification and clustering, text representation and interpretation.

Nowadays, lots of studies use text mining to find the potential characteristics in the introduction of crowdfunding projects. For examples, Wang et al. (2017) analyze the text content of crowdfunding project descriptions and the emotions of the project founders when making the project descriptions. Du et al. (2015) studied the quality and source credibility of the description of crowdfunding projects, and analyzed the impact on the success of crowdfunding projects. Text mining is also used in various areas. Loureiro et al. (2020) used text mining technology to analyze the full text of VR and AR-related journals and conference papers. Zhong et al. (2020) combined deep learning and text mining to automatically analyze the hazard records of building construction.

Moreover, because traditional questionnaire surveys are prone to experimental effects and the information brought by online text content will be more objective, massive, and less sample biased than using questionnaires (Schuckert et al. 2015). As a result, this study will use text mining methods to process the unstructured text content of crowdfunding projects. But, if we only use text mining approaches which acquire knowledge form term-document matrix (TDM), we cannot obtain organized factors, and the number of extracted features will be very huge (Chen et al. 2015). Therefore, this study proposed a lexicon-based feature selection method which uses NLP and lexicons to construct content features from project descriptions.

2.4 Feature selection

Feature selection aims to find important features from original feature set (Zhao et al. 2020). By using selected crucial features, it can improve the prediction accuracy of the classifier and reduce the training time (Devi and Sabrigiriraj 2018; Dash and Liu 1997). Therefore, this study will use three feature selection methods to find the important factors that will affect the success of crowdfunding from the project features and content features of crowdfunding.

2.4.1 Decision trees

DT is an easy-to-understand classification method. It is not only easy to use but also can quickly find rule conditions for high-dimensional data (Yang 2019; Ma et al. 2016; Sok et al. 2016). In this study, the leaf nodes of rule tree will be considered as important factors. Being one of famous feature selection methods, DT has been successfully applied in various fields. For examples, Kwon et al. (2020) used decision trees to determine important factors which would affect the shopping behavior of second-hand stores. Namazkhan et al. (2020) used decision trees to analyze key factors affecting household natural gas consumption. Chen et al. (2021) employed decision trees to select the crucial factors of increasing customers’ satisfaction from online customer reviews. Consequently, this study will use DT as one of the feature selection methods for crowdfunding projects.

2.4.2 LASSO

LASSO is a regression analysis method that can perform feature selection and regularization at the same time. The ultimate goal of reducing the variables to zero is to obtain the feature subset that minimizes the prediction error of the variable (Tibshirani 1996). LASSO obtains a more refined model by establishing a valve function, and reduces the sum of coefficients by the square of the least square method, and compresses the sum of absolute values of coefficients to less than the constant 1 (Wang et al. 2018).

LASSO has been suceesfully applied in various fields. For examples, McEligot et al. (2020) used LASSO to determine the most relevant variables that cause breast cancer. Schmidt et al. (2020) utilized LASSO to detect and identify global position system (GPS) fraud. Guenther and Sawodny (2019) employed LASSO to determine the important factors affecting the temperature comfort of open offices in Singapore. Therefore, this study will use SVM–RFE as one of the feature selection methods.

2.4.3 SVM–RFE

SVM–RFE is another famous feature selection technique. It can find relatively important features from a large number of features. Through the weight vector obtained during training, all the features are arranged in descending order, and the feature with the smallest coefficient is deleted in each down generation, and then retrained and sorted. Repeat the above steps to the end. We can get all the features in descending order (Frank et al. 2006; Witten et al. 2011). SVM–RFE has been successfully applied in many areas. For examples, Wang et al. (2019) used SVM–RFE to identify autism. Shao et al. (2017) applied SVM–RFE to predict the price of electricity in electricity market analysis. In the work of Liu et al. (2017), they used SVM–RFE to locate protein sub-cells in biomedicine. Chang et al. (2020) applied SVM–RFE as one of the feature selection methods to determine the important factors of influencing O2O trust (Chang et al. 2020). As a result, this study will use SVM–RFE as one of the feature selection methods.

2.5 SVM

SVM (Cortes and Vapnik 1995) is a supervised machine learning model used for classification and regression analysis. Because SVM can be applied to various linear and non-linear classification problems, and has a complete theoretical framework and tools, it has been widely used (Paul et al. 2016). For examples, Gamal et al. (2019) utilized SVM with other a variety of classifiers to perform sentiment analysis and classification of Arab tweets and comments on twitter. Vijayakumar and Muhammad (2019) employed SVM, NB and maximum entropy with natural language processing methods to indentify sapm comments on the online forum. Krammer et al. (2019) utilized SVM, RBF, and MLP to analyze online comments for abnormal behavior. To sum up, SVM has an excellent performance on text classification, so this study will use SVM to evaluate the performance of selected candidate feature subset.

3 Proposed methodology

This section will introduce the implementation procedure of the proposed lexicon-based feature selection method. The procedure shown in Fig. 1 can be divided into nine steps. First, we collect data from a real crowdfunding website. Next, the “project features” and “content features” have been induced from the relevant literature. In the content features, we focus on the aspect of the game’s success. Then, we set up the lexicons for content features, and compute attributes’ value based on built lexicons. Later, feature selection methods will be utilized to find the candidate feature subsets, and finally the SVM classifier has been performed to evaluate the performance of the selected feature subsets. After that, we can identify important features, and make specific suggestions accordingly. The following describes the steps of the method used in this study.

Fig. 1
figure 1

The implemental procedure of proposed method

Step 1: Collect data

We use Kickstarter (https://www.kickstarter.com), one of the most popular crowdfunding platforms in the world, as the data sources. The scope of the collection is the “video game” project and “mobile game” under the “game” category in the platform. Therefore, this research will be divided into experiment #1 (video game projects) and experiment #2 (mobile game projects).

Python language will be employed to write a web crawler tool. The crawled data is divided into two parts. The first part includes content features which will capture the content of project descriptions. The second part involves project features, including the target amount, video clip, number of updates, number of comments, project duration, fundraising experience, number of pictures, description length, and number of investors.

Step 2: Define factors

This study divides the features into two parts, project features and content features. The project features will be induced from related literature of crowdfunding projects. The content features are from the literature related to the game theme to find out the aspects and words related to the success of the game. For content features, we will gather feature related keywords to build lexicons. In one lexicon, the keywords’ similar words, synonyms, and antonyms also will be included in the lexicon. Based on constructed lexicons, we can calculate the values of features. We’ll clarify how to define project fetaures and content feature, respectively, as follows.

Step 2.1 Define project features

According to relevant literature on crowdfunding, we find out project features that may affect the success of crowdfunding projects, and give clear definitions, since these features have not been used for game crowdfunding projects. The definitions of project features have been shown in Table 1.

Table 1 Project features

Step 2.2 Define content features

The lexicons of content features will be constructed from game-related literature. From these published works, we collect rellant keywords, factors, and dimensions, then organize them into content features and representative words. Next, we use the website (https://www.thesaurus.com/) to find similar words, synonyms, antonyms and related vocabularies of these representative words to build a lexicon. The defined content features and their lexicons have been shown in Table 2. After these lexicons are established, it will be compared with the text of the project description, and the frequency of the words in the lexicon will be calculated, and the word frequency will be defined as the values of content features.

Table 2 Content features and their lexicons

Step 3: Pre-process collected data

The captured project descriptions belong to unstructured text data, so they need to be pre-processed before it can be compared with the constructed lexicons. The pre-processing steps of text data are as follows:

Step 3.1: Delete meaningless words, characters, or icons

We delete non-English text, special characters and emoticons.

Step 3.2: Segment sentences

We use uni-gram for word segmentation. The uni-gram method uses a word as a unit to segment the sentence. Take “Nice to meet you” for example. After uni-gram segmentation, we will get “Nice”, “to”, “meet” and “you”.

Step 3.3: Remove stop words

This study uses the stop word list from the Snowball website (http://snowball.tartarus.org/algorithms/english/stop.txt) to delete stop words, such as and, or, with.

Step 4: Build term-document matrix

This step is divided into two parts. The first part is to calculate the word frequency of the built corpus to be the values of content features. The second part is to establish the Term-Document Matrix (TDM). In this study, we use content features, project features, and all features to build TDM, respectively.

Step 4.1: Compute attributes’ values of content features

This study uses QDA miner to deal with textual data. We compare the constructed lexicons with the collected data, calculate occurrence frequency of words in lexicons, and use the frequency as the values of a content feature. After calculating the word frequency of all content features, we discretize all content features, and define the value above the average value as 1. Otherwise, the value below the average value will be defined as 0.

Step 4.2: Build TDM and normalize data

We use project features and content features to build TDM. The output value is Y. If Y equals to 1. It means the project is successful (the raised funds exceed the target amount). If Y equals to − 1. It means the project is failed. Then, all colletced data will be normalized into the interval [0, 1] according to Eq. (1). In this equation, X′n,m is the value to be normalized for the document dm in the feature Xn. X′n,min is the minimum value of all documents under the feature Xn. X′n,max is the maximum value and Xn,m is the normalized value of the data.

$$ X_{n,m} = \frac{{X^{\prime}_{n,m} - X^{\prime}_{n,min} }}{{X^{\prime}_{n,max} - X^{\prime}_{n,min} }}. $$
(1)

After normalization, we sort all the data randomly, and then we implement fivefold cross-validation (CV) experiement which divides all data into 5 equal parts. 4 of them are used as training set, and the other part is as a test set in turn.

Step 5: Implement feature selection

We use DT, LASSO, SVM–RFE to select key features. Since we implement fivefold CV experiment, we’ll have five results of feature selection for each method. And we will pick the features whose occurrence frequency is larger than 3 to be our candidate feature subsets.

Step 5.1: Decision tree

This study will use the C5.0 algorithm to build a decision tree. This method is a kind of intuitive and highly efficient supervised machine learning model which is suitable for classification and regression. It takes up less system resources and memory. Compared with other machine learning methods, execution speed is its major advantage. When used in feature selection, we treat the leaf nodes of the rule tree as an important factor. The steps to build a decision tree are as follows.

(1) Define the input (X) and output (Y) variables of the decision tree.

(2) Establish a decision tree using C5.0 algorithm.

(2.1) Establish an initial rule tree.

(2.2) Prune this rule tree.

(2.3) Select the best-performing rule tree and leaf nodes.

Step 5.2: LASSO

LASSO can perform feature selection and normalization at the same time. The main reference formula is shown in Eq. (2), where λ is the adjustment coefficient, K is the number of data, and x is the explanatory variable.

$$ min\mathop \sum \limits_{k = 1}^{K} (y_{k} - \beta_{0} - x_{1,k} \beta_{1} - \cdots - x_{g,k} \beta_{g} )^{2} ,\quad s. t. \mathop \sum \limits_{i = 1}^{g} \left| {\beta_{i} } \right| \le \lambda . $$
(2)

According to Eq. (2), it is found that the regression parameter value βj will be restricted by a specific penalty selection criterion, and then an appropriate variable can be selected. Given g explanatory variables, the value of λ will affect the parameter \(\hat{\beta }\) estimation. When the λ value approaches infinity, the parameter \(\hat{\beta }\) estimation value will not be restricted by Eq. (2). On the contrary, when the λ value is adjusted to 0, all parameter estimation values will be zero.

When the value of λ is gradually enlarged from 0, the explanatory variable x that is strongly correlated with yk will change and differ from 0, which means that the explanatory variable is important. Therefore, in this experiment, the basis for identifying important features is to check whether the coefficient is 0, and the coefficient is not 0 to represent an important factor. Perform LASSO on all fivefolds, then extract features that are not zero from the results as important factors, and calculate features that occur more than or equal to 3 times to build a feature subset.

Step 5.3 SVM–RFE

The detailed SVM–RFE (Chang et al. 5) algorithm has been shown as bellow.

figure a

This research uses fivefold CV experiment. In each fold, we execute SVM–RFE once. The sorted feature set of fivefold will be obtained. We extract features whose weights are larger than the median. Then, we will calculate the features with a frequency greater than or equal to 3 times to build a feature subset.

Step 6: Evaluate the performance of selected feature subsets by SVM

This study will use the SVM classifier to evaluate the original feature set and the candidate feature subset selected in step 5. In this study, we use the radial basis function (RBF) kernel for building the SVM classifier. The measurement indicators include positive accuracy (PA), negative accuracy (NA), geometric mean (G-Mean, GM) of PA and NA, overall accuracy (OA), F1-measure (F1), training time of building classifiers. If the feature candidate subset is built with a smaller number of features, the classifier can have almost or even better classification performance, which means that the extracted feature subset contains a higher amount of information, so it is regarded as a few important features.

Step 7: Identify important features

In this step, the important features will be identified based on the evaluation results of step 6. In addition to confirming important content features, we will also confirm the impact on content features after adding project features. In other words, we will confirm the performance of feature selection based on content features, project features, and all features, respectively. Then, we can identify the crucial content features and project features that affect the success rate of video game and mobile game projects, individually.

Step 8: Make discussion and conclusions

For the determined important features, we will carry out statistical analysis on every single one feature, and discover the relationship between these selected important features and fundraising success. Finally, based on the experimental results, this research will give fundraisers suggestions that will help improve the success rate of video games and mobile games crowdfunding projects.

4 Results

4.1 Data collection and prepreparation

This study uses the “Python” programming language to write a web crawler program, and extract project introduction text and project features. The data source comes from projects of “Video Game” and “Mobile Game” under the game category on the Kickstarter platform. The crawled data is the text content of the project description for 11 content features. The project content description example is shown in Fig. 2. In addition, 9 project features are also collected.

Fig. 2
figure 2

An example of project content description in Kickstarter.com

In video game projects, the project achievement rate is greater than or equal to 100% is labeled as a successful project, and the project achievement rate is less than 100% is marked as a failed project. The total number of video game projects is 14,495, because the total number of projects exceeds 10,000, and due to restrictions on the Kickstarter platform, many projects are no longer crawled. So, this study collected one-tenth of the total number of video game projects as our experimental data. From 2018 to 2020, a total of 1400 projects were collected, 700 successful projects, and 700 failed projects.

In mobile game projects, the project achievement rate is greater than or equal to 100% is marked as a successful project, and the project achievement rate is less than 100% is labeled as a failed project. The total number of mobile game projects is 3642. This study has collected all mobile game projects, with 336 successful projects and 3306 failed projects, from 2013 to 2020. The descriptive statistics of the collected data are shown in Tables 3 and 4.

Table 3 Descriptive statistics of video game projects
Table 4 Descriptive statistics of mobile game projects

4.2 Data preprocessing

For raw data, we firstly delete special characters and non-English words in the text to avoid errors in subsequent experiments. The package software “QDA Miner” has been employed to segment sentences (unigram) and then remove stop words. Examples of non-English word deletion and removal stop words are shown in Table 5.

Table 5 Examples of pre-process collected data

Next, we use the built lexicons to calculate the word frequency as values of content features and then discretize the content features into + 1 (larger or equals to mean value) and − 1 (below mean value). Then, we normalize data into [0, 1] and build TDMs of content features, project features, and all features, respectively.

4.3 Feature selection

This study uses three feature selection methods to select features for “all features”, “project features” and “content features”, respectively. DT uses the C 5.0 commercial version “See 5”, LASSO uses the “Matlab R2013b” to program, and SVM–RFE uses “Weka 3.8” to find important features and build candidate feature subsets.

See 5 uses the default “global pruning” and “pruning CF = 25%”. The parameter alpha value of LASSO is set to 1, and the λ value is preset to calculate the maximum λ value built in Matlab, and the ratio of the maximum value to the minimum value in the sequence is found with the Lambda Ratio function to calculate the value ratio of λ using CV = 5 (fivefold cross-validation method) to find the optimal λ value. In SVM–RFE, the attribute evaluator parameter selects “ReliefFattributeEval”, and the other parameters are default values. The results of experiment #1 (video game project) and #2 (mobile game project) are shown in Table 6.

Table 6 Summary of feature selection results

4.4 Evaluated by SVM

This study uses the SVM algorithm as the classifier, using the RBF kernel function and C-SVM classification in “LIBSVM”. The parameter settings use the grid.py tool which is built on LIBSVM to find the optimal parameter settings. These optimal parameters will be employed to buid SVM classifiers to perform performance evaluation, and find the best feature subset.

4.4.1 Results of experiment #1 (video game)

Table 7 summarizes the evaluation results of “all features” of video game projects. According to Table 7, it can be seen that the DT1 feature subset is superior to the original features and other feature subsets on all metrics, and the overall accuracy (OA) reaches 97.64%. Therefore, the DT1 feature set is selected as the best feature subset for “all features”.

Table 7 Evaluation results of all features in video game projects

The performance evaluation results of “project features” have been shown in Table 8. From this table, it can be seen that all the metrics of the DT2 feature subset and the DT3 feature subset are very similar. Both DT2 and DT3 are better than the original and other feature subsets. Their overall accuracies reach 97.49% and 97.79%, while the DT3 feature subset contains only 2 features less than 4 features in the DT2 feature set. Fewer features can show the same performance. Therefore, the DT3 feature subset is selected as the best feature subset for “project features”.

Table 8 Evaluation results of project features in video game projects

The performance evaluation results of “content features” are shown in Table 9. From this table, we can see that the overall accuracy (OA) of the DT5 feature set and the LASSO4 feature set reached 64.21% and 64.07%, respectively. The F1 index reached 63.13% and 63.05%, respectively. And the training time was 0.19 s and 0.05 s. But LASSO4 only has three features. Relatively speaking, a small number of features can show the same performance. This means that these three features are more important. Therefore, the LASSO4 feature set is selected as the best feature set for “content feature”.

Table 9 Evaluation results of content features in video game projects

4.4.2 Results of experiment #2 (mobile game)

Table 10 shows the performance evaluation results of “all features”. From this table, it can be seen that all the indicators of the DT#1 feature set and the DT#2 feature set are quite close, and are remarkably better than the original set and other feature subsets, and the overall accuracy (OA) reaches 99.31% and 99.40%, respectively. F1 indicators also outperform others. However, the DT#2 feature set only uses 2 features, but can show the same performance. Therefore, the DT#2 feature set is selected as the best feature set of “all features”.

Table 10 Evaluation results of all features in mobile game projects

The performance evaluation results of “project features” are summarized in Table 11. From Table 11, we can find that all the metrics of the DT#3 feature set and the DT#4 feature set are quite close, and are dramatically better than the original set and other feature subsets. Their overall accuracies have reached 99.40% and 98.93%, respectively. F1 indicator also points out the same situation. It can be seen that the DT#3 feature set is better than the DT#4 feature set, so the DT#3 feature set is selected as the best feature set of the “project feature”.

Table 11 Evaluation results of project features in mobile game projects

Table 12 lists the evaluation results of the “content features”. From Table 12, it can be seen that the performance evaluation results of the original feature set and all feature subsets show that PA is 0. It means that the established classifier cannot identify any minority examples (successful projects). This is a typical class imbalance problem. Probably because the number of failed projects is 10 times the number of successful projects, and the attribute values of the content features are not very similar between successful and failed projects. Therefore, the constructed classifier cannot recognize any successful project (positive example).

Table 12 Evaluation results of content features in mobile game projects

Due to the class imbalance problems caused by skew class distribution in “content feature” data sets, we will use under-sampling and over-sampling (Synthetic Minority Oversampling Technique, SMOTE) to process the imbalanced data, and re-do feature selection and re-build classifiers. The under-sampling method is to randomly sample from the majority class examples (failed projects), and extract the same number of samples as the minority class examples (successful projects) to rebalance. Then, we use three feature selection methods to select feature candidate subsets, and use SVM for performance evaluation. After the under-sampling method, there are 336 successful projects and 336 failed projects.

SMOTE is a well-known over-sampling method. The main principle is to duplicate minority class examples (successful projects) until the number of samples is equal to the majority class examples (failed projects). After the processing of data oversampling, three feature selection methods are used for feature selection to find feature subsets, and finally SVM is used for performance evaluation. After SMOTE processing, the number of successful projects and failed projects were 3306, respectively.

The results of under-sampling are shown in Table 13. According to Table 13, it can be seen that all performance indicators of all feature subsets are quite close. The under-sampling technique has solved class imbalance problems. Therefore, the SVM–RFE-II feature set with the highest OA is selected as the best feature set for under-sampling.

Table 13 Results of implementing under-sampling (content features of mobile game projects)

The results of SMOTE are shown in Table 14. According to this table, it can be seen that all indicators of the original feature set and DT-III feature set are better than the LASSO-II feature set and SVM–RFE-III feature set. But, the original feature set and DT-III feature set have the same features. It’s meaningless if we select DT-III. Therefore, only the LASSO-II feature set could be compared with the SVM–RFE-III feature set. All indicators of the LASSO-II feature set are superior to the SVM–RFE-III feature set. Therefore, the LASSO-II feature set is selected as the best feature set for the over-sampling of “content features”.

Table 14 Results of implementing SMOTE (content features of mobile game projects)

Table 15 summarizes the comparison results of under-sampling and SMOTE. It can be seen that all the indicators of SMOTE’s LASSO-III feature set are better than the under-sampling SVM–RFE-III feature set, so finally the over-sampling LASSO-III feature set is selected as the best feature set.

Table 15 Comparison of under-sampling and SMOTE in content features

4.4.3 Concluding remarks

The summary of all selected important features is shown in Table 16. In terms of content features, we have identified three features for the video game project, namely Promotion (X15), Sound effects (X17), and Appearance (X20). For the mobile game project, we also found 7 important features, namely Interaction (X11), Role (X12), System (X13), Promotion (X15), Sound effects (X17), Operation (X18), and Price (X19). Initially, the selected features of the two games are different, and the emphasis on the narrative content of successful fundraising projects is also quite different. Because video games have physical goods, promotion, sound effects, and appearance will be very important. However, mobile games are mainly downloaded by players, and then through in-game purchases as the main profit model, so the emphasis is on price, interactivity, fluency of operation, and game experiences.

Table 16 The discovered important features for video and mobile game crowdfunding projects

5 Discussions and conclusions

5.1 Discussions

This study uses “video game” and “mobile game” project in Kickstarter.com as our research objects. Through experiments, we find out important factors shown in Table 16. From this table, we can see that the target amount, number of updates, promotion, and sound effects are common important factors for video game and mobile game projects. This study regards them as the key factors affecting the success of game crowdfunding projects.

In both experiements, results of feature section indicated that the target amount and the number of updates are two key factors that affect the success of game crowdfunding. However, judging from the performance of using project features and content features to build a classifier, the classification accuracy of project features is much higher than that of content features. This may be that the amount of information brought by the content features is less than the project features.

In addition, there are several differences in important factors between video games and mobile games. In all the features, the project features of video games have one additional feature “number of investors”, but the number of investors is a factor beyond the control of fundraisers. Therefore, this can only show that the number of investors will affect the success of video game projects. The game is less affected. In terms of content features, in addition to the common “promotion” and “sound effects”, “appearance” is also an important success factor for video games. And “interaction”, “role”, “system”, “operation” and “price” are key successful factor for mobile games.

We also analyzed the important factors of the two types of projects as shown in Tables 17 and 18. The analysis results of the video game project are shown in Table 17. In the project features, we can see that the target amount of successful projects below 5000 USD is 46% higher than the 34.1% of failed projects. The proportion of successful projects with more than 5 updates (84.9%) is also much higher than the 18.9% of failed projects. In the content features, the number of mentions of “promotion” and “appearance” related words in the introduction description of successful projects is also significantly more than that of failed projects.

Table 17 Analysis of selected important feature between successful and failed video games projects
Table 18 Analysis of selected important feature between successful and failed mobile games projects

Table 18 summarizes the analysis results of the mobile game project. In the project features, we can see that more than half of the successful fundraising projects (53.6%) have a target amount of less than 5000 USD, while only 33.2% of the failed projects are less than 5000 USD. Moreover, the number of updates for successful projects is much higher than that for failed projects. According to the examples we have collected, the number of successful projects has been updated more than five times, accounting for 70.2%, while the number of failed projects has been updated more than five times, accounting for only 9.2%. In the content feature part, we calculate based on the frequency of related words appearing. We can see that interaction, role, system, promotion and other factors are obviously mentioned more frequently in successful fundraising projects than failed projects.

5.2 Conclusion

This study attempts to discover the key successful factors of project descriptions that influence the success of video game and mobile game crowdfunding projects. We proposed a lexicon-based feature selection method which uses real crowdfunding projects on Kickstarter, and define 9 candidate project features that might affect the success of the project from related literature, and summarize the game-related aspects and vocabulary to establish lexicons to form 11 candidate content features. Then, three feature selection methods including DT, LASSO and SVM–RFE have been employed to select the candidate feature subsets. Finally, SVM is used to evaluate the performance of candidate feature subsets and determine the key succesul factors of game crowdfunding projects.

We consolidate the important factors of the two types of projects as the key factors affecting the success of the game crowdfunding project and give some suggestions to help improve the success rate of crowdfunding projects, as shown in Table 19. From this table, we can see that the target amount, number of updates, promotion (content feature) and sound effects (content feature) are the key factors that affect the success of the game crowdfunding project.

Table 19 Suggestions of the selected important features

The target amount is a controllable factor for fundraisers. It is recommended to set it below US$5000, and the success rate will be higher. “Update” frequency is also a controllable factor. It is recommended that fundraisers should make more updates during the fundraising period to avoid information asymmetry, so that investors can keep abreast of the situation and interact with investors more. In addition, fundraisers are suggested that when writing a project introduction, more words related to “promotion”, “discount”, “offer”, “music” and “sound” should be used, which will help increase the probability of success of the project.

In addition to the above-mentioned key factors, there are some differences between the important features of video games and mobile games. In the project features, the number of investors is one of the important factors of video games. The number of investors representing investors will affect the success of video game projects. But, it’s not a controllable factor.

In terms of content features, “appearance” is also one of the important features of video games, which means that when investors consider whether to invest in video game projects, the appearance and packaging are the parts that investors will pay attention to, and more attention should be paid to appearance and packaging. “Interaction”, “role”, “system”, “operation”, and “price” are important features of mobile games. It means that investors who invest in mobile games pay more attention to the interactivity, character design, system stability, and operation, control and price.

Moreover, according to results of feature selection, we can find that the importance of the project feature is higher than the content feature. On the one hand, the importance of the project feature is higher, and on the othr hand, the lexicons may not be complete. The lexicon can be improved in the future. Therefore, it is recommended that fundraisers who create game fundraising projects should pay more attention to the setting of project features in addition to the description of the text. For example, the “target amount” should be set appropriately. The project should be updated regularly so that information can be communicated more quickly and transparently, so that investors can better understand the investment project.

For potential direction of future works, other types of game projects or other well-known crowdfunding platforms can be considered in the future. The number of data can be increased in the future to make the research more accurate. Other feature selection methods can be included to find the most suitable feature selection method. Finally, to extend the built lexicons might be considered in the future. Regarding COVID-19, the epidemic has already had a huge impact on the entire world. Future researchers can focus on the change of project types or comments of participants before and during the epidemic to find out the impact of the epidemic on the direction of fundraising.