Abstract

When material desires are satisfied, people begin to pursue more and more spiritual levels. Health exercises have an excellent auxiliary effect on people’s flexibility and physical fitness, so more and more people choose health exercises. However, the movement of health exercises returns to Chengdu and affects the efficiency of physical training. Therefore, we have designed a sports competition assistance system based on vague big data and a health exercise recognition algorithm. First of all, in this article, the standard score comparison database is created by extending the standard action data. In addition, the system architecture is further given, and the key 3D data-based acquisition module design is given. In addition, the system architecture is further given, and the basic 3D data acquisition unit design is given. In this document, the depth characteristics filtered by the Fourier Pyramid are fused to the bone characteristics, and the merged data is sorted based on the support engine, thus designing the action recognition unit. A hidden Markov model (HMM) human action recognition algorithm based on pose selection is proposed. This method uses two affine propagation (AP) clustering algorithms to cluster the features, automatically select the key posture of each action, and correspond to the hidden state of the HMM. These hidden state labels are used to initialize the parameters of the HMM to train the model, and the trained model is used to implement action classification. The result shows that the design in the article has a more accurate recognition result, which provides a powerful tool for the referee to score. Using the Fourier Pyramid filtering method, through a large number of health exercises for comparison, the ability to judge the degree of standard health exercises is significantly improved, the efficiency is increased by 25%, and the accuracy rate is increased by 15%.

1. Introduction

With the rapid development of computers, artificial intelligence and fuzzy big data are also riding these two “trains” [1]. As the cutting edge technology of the twenty-first century, big data and artificial intelligence have been widely used in all aspects of our lives. Human behavior recognition, as a part of artificial intelligence [2, 3], not only has an irreplaceable role in the recognition of health exercises, but also is widely used in intelligent video surveillance, human-computer interaction, robots, and human abnormal behavior monitoring, and other aspects [4]. At present, there are already a large number of computer vision researchers engaged in the research of human action recognition [5]. With the release of Microsoft’s cheap depth camera Kinect, researchers have once again inspired researchers to use depth and bone data to conduct in-depth research on human behavior recognition [6].

This article examines whether the exercises are standard with posture. Posture refers to the external performance of various parts of the body, such as fingers, upper limbs, torso, and lower limbs during action [7]. Health exercises play a very good role in regulating fitness and posture. Lianghaide’s ability to control her body and posture are the basic requirements for health exercises. It is an essential factor in improving the quality of human action and artistic expression.

Due to the importance of action recognition, more and more teams have begun to conduct research on action recognition and have achieved good results. For example, Jamshidi, Ali proposed a decision support method for state judgment based on expert systems. However, the positioning performed by this method is not very accurate and very subjective [8]. More and more big data comes from sensor nodes. In order to monitor environmental conditions, Xia, Xu, and others use monitors to sense motion recognition and positioning. But this method is also prone to errors [8, 9].

Omar Boutkhoum proposed a hybrid decision-making method based on affinity graph, Fuzzy Analysis Hierarchy Process (FAHP), and fuzzy technology, which is similar to the order preference similar to the ideal solution (FTOPSIS) to evaluate, rank, and select the most appropriate. The cloud contains and manages solutions for big data projects. In fact, the strategic focus of many companies is to create competitive advantage through the use of new available technologies, processes, and governance mechanisms (such as big data and cloud computing). As this technology is always affected by progress and development, the problem faced by many companies is how to use the technical flexibility that cloud computing can provide to benefit from big data. In this case, choosing the most suitable cloud solution to host a big data project is a complex issue that requires an extensive evaluation process. But this method is not very accurate [10].

The innovations of this paper are as follows: (1) Human behavior recognition based on fusion of bones and deep data is performed. When people perform health exercises, the bones of the body will swing. We use the swing of bones to check whether the exercises meet the standards. Human actions can be recognized by the hidden Markov model (HMM) classifier using the observed features. (2) The action methods are classified. The extracted features are processed to minimize the influence of other factors. Finally, the processed data is sent to the classifier to recognize the movements of the human body. There are many types of classifiers that can be used for action recognition, such as using nonparametric naive Bayes models to classify actions, using KNN classifiers for classification. In addition, researchers often use statistical pattern recognition classifiers to recognize human actions. (3) LSTM is used to predict action sequences. With the help of global background information, the new LSTM network has high recognition performance. However, the deep neural network method is very time-consuming. It requires a large amount of labeled data for training. In a small sample database, overfitting usually occurs.

2. Sports Competition Assistance System Based on Fuzzy Big Data and Health Exercise Recognition Algorithm

2.1. Motion Recognition Algorithm Based on Trajectory Tracking

The spatiotemporal method can use image display to recognize simple human actions but cannot handle complex behaviors. With the advent of depth sensors, it has become possible to monitor three-dimensional parts of the human body to detect complex human behavior. Recognizing the person-to-person interaction with the object is highly dependent on tracking and marking the object. The algorithms that can be used are Markov Hierarchical Hidden Model (HMM), Convolutional Neural Network (CNN), Conditional Random Field (CRF), etc. [11, 12].

We can divide the main human joints, as shown in Figure 1. Analyze the corresponding joints through people’s movements. Analyze the movement of each joint. Then, compare with the size of the specification range. In this way, concrete actions can be analyzed through data. We can better judge whether the action is standard or not. Its architecture is shown in Figure 1.

2.2. Action Template Recognition Method

The basic process of template recognition is as follows: first, the action sequence is transformed into a static feature form through certain preprocessing such as feature extraction, Then, compare with the previous static table, and finally compare based on test samples and known standards. Then, we compare it with the past static form, and finally obtain the result of the test sequence based on the similarity between the test sample and the known standard and the specified limit value. The standards-based method is one of many very effective methods we use. Common methods include pattern matching and dynamic time distortion. The matching pattern is very simple and effective. Common methods include pattern matching and dynamic time distortion. Matching templates is very simple and effective. This method is based on the similarity between the two action sequences of the action test sample and the action feature sample to determine whether the generalizer action conforms to the specification [13]. The test sample and the sample of characteristics are represented by the nearest adjacent distance. When the weight of each component of the attribute quantity is different or relative, the Mahalanobis distance is usually used for the calculation. The pattern matching method is concise and clear, the application is simple and clear, and the computation speed is fast, but it has high requirements for data samples and cannot be adapted to large-scale spatial changes.

Suppose two time series A and B: their lengths are x and y respectively; one is the reference template, and the other is the test template. The value in the sequence is the characteristic value of each frame [14].

First, construct an xy distance matrix S, and the matrix element S (e, f) represents the distance between the two template elements and the Euclidean distance:

That is, the similarity between a single frame of sequences A and C:

Dynamic regularization, as the name implies, is to find a route with the optimal distance in the distance matrix, so that the sequence points passed by the route have the smallest Euclidean distance and the highest similarity. Define the regularization route as follows, where the Eth element de = (u, i), which represents the mapping relationship between sequences A and B [15, 16]:

The selection of the reorganization route satisfies the following three constraints:(1)Boundary conditions: . The speed of any action will be different, but the order of the parts cannot be changed, so the start and end of the regular route are the start and end of the two time series.(2)Continuity: suppose that the path of a certain point is . Next waypoint . It is required that the regular path is selected as adjacent elements or diagonal elements.(3)Monotonicity: if . It is required that the regularization path is monotonous in time.

Under satisfying the constraints, choose the path with the least regularization cost from the many regularization routes:

Among them, K is the path length corresponding to different paths.

Best path: the path that passes the least distance between two points is the best path, which can be easily determined according to the dynamic programming algorithm. Define a cumulative distance x (a, b). The two sequences A and B are matched starting from the point (0, 0), and each point is reached, and the distance calculated by all the previous points will be accumulated. After reaching the end point (n, m), the cumulative distance is the total distance, which is the similarity between sequences Q and C [17, 18].

2.3. Obtaining Body Movements

Our current methods for acquiring people’s body movements are mainly as follows: one is to use color cameras or other types of cameras to capture image information, and the other is to use inertial sensors with wearable devices to collect acceleration and angular velocity changes during the action [19, 20].

Traditionally, human motion recognition is based on video image information, including the fusion application of multiple technologies such as digital image processing and pattern recognition [21]. Although we have many methods for processing image recognition, the overall recognition process is the same, that is, collecting human body movement information through the camera, using different methods to distinguish the human body from the background, distinguishing contours and parts of the human body, and exporting the basic features of the human body. Finally, human movements are identified according to changes from frame to frame. Although the computer has significantly improved the processing of hardware and images, obstruction of the capture target, etc., renders the information content of the received image data insufficient, leading to identification errors or unrecognizable situations in the shooting environment and the blocking of the shooting target, and so on; the information content of the acquired image data is not enough, which leads to recognition errors or unrecognizable situations [22, 23]. When image-based action recognition is combined with a specific body model to acquire human features, a specific space environment and shooting angle are required for image data collection, and the equipment installation process is complicated and has poor portability [22].

Accelerometers are used to power different parts of the human body when people are moving, and gyroscopes are used to measure the speed of objects in the process of motion, combined with acceleration and angular velocity to analyze the current motion of an object [24, 25]. More flexible actions can be taken by eliminating the limitations of wired transmission. Compared with the data suggested by the image, the process of acquiring inertial data is more intuitive and obvious. By wearing the inertial sensor on the key part of the limb, the acceleration angular velocity change caused by the person during the movement can reflect the displacement of the limb of the wearing part. With the rapid development of micro-electromechanical systems (MEMS), inertial sensors have achieved miniaturization and high integration, small size, low power consumption, and low cost, making it more convenient and faster to use inertial sensors to obtain kinematic parameters [26]. However, not all methods are perfect. Inertial sensors also have their own shortcomings. Human body movements change rapidly, and the amplitude is small. The output data of the sensor is noisy, and the real-time performance is poor [27]. A single sensor cannot achieve accurate identification [28]. Therefore, a combination of multiple tools is required to make the data more convincing. At the same time, our data can be more helpful to our research [29].

3. Research of the Sports Competition Auxiliary System Based on Fuzzy Big Data and Health Exercise Recognition Algorithm

3.1. Feature Extraction on a Subset of Joint Points

Extracting local features in human action recognition is aimed at people’s interest. In an action that has a relatively large change, there is no need to locate and track the entire human body. The local features have strong anti-interference against external environmental factors, such as people changes in body shape, changes in shooting angles, changes in lighting and occlusion issues, etc. Local features have been widely used in human action recognition, and there are many classifiers corresponding to them. Next, we will elaborate on the feature extraction methods and the selection techniques of bone joint subsets.

3.1.1. Displacement Vector Characteristics

The second local feature extracted is the relative position feature, which is a very distinguishing spatial feature. Many current action recognition methods based on bone data have extracted this feature. This paper extracted this feature in the literature. In addition, they also calculated the coordinate difference between the bone joints in the current frame and the previous frame, and the coordinate difference between the current frame and the initial frame. The specific calculation formula is as follows: it is obtained by subtracting the coordinates of the b-th bone joint point from the coordinates of the a-th bone joint point in the c-th frame. Of course, the a-th bone joint and the b-th bone joint are not the same.

3.1.2. Selection of a Subset of Joint Points

Most public human action recognition data sets contain bone data, which are generally extracted from depth maps taken by Kinect. Kinect will use 20 or 15 bone joints to represent the entire skeleton of the human body. The skeleton model of MSR Action 3D and MSR Daily Activity 3D and the label on each joint are shown in Figure 2. However, for any movement, not all bone joints will change in orientation and angle during this movement. If the features of all the bones and joints are extracted, it will bring a “dimension disaster” to the action classification stage. In addition, some redundant features will also interfere with the recognition results, and it is even more impossible to recognize human movements in real time. The feature dimension calculated by the selected joints is also relatively small, which is conducive to the realization of the later clustering algorithm.

3.1.3. Preprocessing the Bones

On the one hand, human body shape differences require standardized processing of 3D data. On the other hand, the tester’s speed and style differences result in different action sequence lengths. Therefore, it is necessary to preprocess the bones. Based on the action classification of 3D bone data, the sequence of 3D coordinates (i.e., 3D trajectory) of bone joints in time is used to describe the actions of the human body. However, this kind of representation is to choose different reference coordinate systems according to the difference of human body shape, and the reference coordinate system is different in each environment, which can be solved by the method of coordinate system transformation. Literature [19] put the position of the bones in a common coordinate system to make the joint coordinates comparable.

3.2. Clustering of Features

It is not advisable to directly use the extracted features to train the classifier. On the one hand, because the number of extracted features is relatively large, there are a large number of redundant features. If the features are directly sent to the classifier, the calculation rate is very low, and redundant features will also affect the final recognition result. On the other hand, if the features are processed by traditional methods, it can be seen from the previous experimental results that the recognition rate is not high. This part chooses to use two affine propagation algorithms on the extracted features, selects the key frame that best represents each action, and initializes the parameters of the hidden Markov model, which makes the process of initializing parameters more efficient while increasing the calculation rate reasonably.

3.3. Hidden Markov Model (HMM)

Before introducing the clustering algorithm, I use an easy-to-understand example to introduce the hidden Markov model and introduce its five elements and three basic questions. Suppose that there are three different dice: the first is a common 6-sided dice (indicated by D6), the numbers on the surface are from 1 to 6, and the probability of each number appearing is 1/6. The second dice has four sides (represented by D4) and contains numbers from 1 to 4. The probability of each number is the same, which is 1/4. The third dice is an octahedron, denoted as D8, and the probability of each face being thrown is 1/8.

First, randomly select one of the three dice and throw it. The probability of picking any one is 1/3. Roll the dice to get any one of 1, 2, 3, 4, 5, 6, 7, or 8. Suppose that the above process is performed 10 times, and a string of numbers is obtained: 1, 7, 3, 2, 8, 5, 6, 2, 4, 1. This string of numbers is called the visible state chain, and there are corresponding ones. A hidden state chain is invisible. In this example, the hidden state chain is the dice selected before each throw. Assuming that the corresponding hidden state chain is: A4, A8, A6, A4, A6, A8, A8, A6, A8, A4, Figure 3 shows the corresponding graphs of the visible and hidden states in the hidden Markov model.

`It can be seen from Figure 2 that there is a transition probability between the hidden states of the hidden Markov model. Additional conditions can be added. For example, D6 can only be followed by D8, and the probability of D8 after D8 is 0.8, and the probability of D4 is 0.2. D8 cannot be followed by D6, so the transition probability will change. Here, we only introduce the simplest case; that is, the transition probability between hidden states is 1/3, so that the hidden states are shown in Figure 2 conversion diagram. The hidden state can output the visible state, and the transition probability between them is called the observation probability, which is also the output probability. For example, the probability of D6 throwing 3 is 1/6. If you make a dice with hands and feet, the output probability will change.

4. Sports Competition Auxiliary System Based on Fuzzy Big Data and Health Exercise Recognition Algorithm

4.1. Key Pose Selection Based on Two Affine Propagation Algorithms

Feature extraction only completes the initial steps of action recognition. Clustering the extracted features to select key frames that can represent each action is the key to improving the recognition rate of the algorithm. In the literature [54], K-means was used to cluster the extracted features and verified on multiple databases. In the process of action recognition, in order to minimize human involvement, this paper does not use the traditional clustering algorithm and uses the affine propagation algorithm (AP) introduced above. This algorithm does not need to set the number of clustering centers. For small data, the set can achieve very good automated clustering results.

For all sample sequences of an action, if you directly use AP to cluster each sequence according to the conventional idea, stitch the clustering results of all sample sequences together to represent this action. For simple actions in this way, the key poses of each sequence after clustering will be relatively single. For complex movements, it is also difficult to choose a representative posture. In addition, the redundant postures that still exist will also affect the subsequent classification results. This paper uses the AP algorithm again on the basis of the initial clustering to reduce the redundancy and improve the calculation speed and recognition accuracy.

4.2. Experiments and Analysis on MSR Action 3D Database

As a public data set, it has been used in many experiments. There are two ways to classify the data in this dataset. The first is the multiple verification method proposed by Wang et al. [11]. For individual classification, subjects 1, 3, 5, 7, and 9 are used for education, and subjects 2, 4, 6, 8, and 10 are used for testing. The second type is divided into three sub-data sets based on the similarity and complexity of actions. This article divides the experimental data according to the first half of the cross-validation method. Due to the severe damage to the bone data caused by certain actions, only 652 samples were used in this experiment, 350 samples were used for training, and the remaining 302 samples were used for testing. Table 1 and Figure 2 show the comparison with the experimental results of internal and external conventional algorithms.

It can be seen from Table 1 and Figure 2 that the recognition rate of this method is 94.42%, which exceeds most mainstream algorithms at home and abroad. The algorithm in this article far exceeds the current very popular recurrent neural network algorithm at home and abroad, because its recognition rate is only 90.03%. The neural network algorithm can achieve good recognition results when the training sample is large enough, but in small samples in the database, there will be serious overfitting.

In addition, the study in [19] applied two metric learning algorithms in the model training stage, which is very time-consuming. This means that it is difficult for the algorithm in [19] to achieve real-time recognition of an unfamiliar action. Table 2 and Figure 4 show the comparison of the algorithm proposed in this paper and literature [19] on the training time of a single sample at each stage. The computer specification used is Inter(R) Core(TM) i7-4790 CPU 3.60 GHz, RAM 16 GB, 64-bit operating system, software Matlab R2014b. The feature extraction in this article is the feature extraction algorithm introduced in Section 3.1, and then PCA is used to process the extracted features, AP algorithm is used twice to aggregate the features, and the EM algorithm is used to train the classifier.

4.3. Feature Category and Number of Joints on Recognition Rate

In this section, on the basis of the above experiments, some extended experiments are done. In order to verify the complementary characteristics of the relative position feature and the displacement vector feature extracted from the bone data, experiments are carried out with each feature separately. As shown in the first three rows of Table 3, using the displacement vector feature alone, the recognition rate is 91%, a decrease of 4.4 percentage points, and using the relative position feature alone, the recognition rate is 75.8%, which is a 19.6% decrease compared with the original. Experiments show that these two features are complementary, and each feature contributes a lot to the final recognition result.

In addition, as shown in the fourth row of Table 3 and Figure 5, this paper extracts the relative position features and displacement vector features of all 15 bone joints, and the final recognition rate is only 92.2%, which shows that, in an action, it is not all skeletal joints work. Selecting the main joints can reduce data redundancy, increase the calculation speed, and improve the accuracy of the final motion classification.

In order to reduce the randomness of the clustering process, the experimental settings on UTKinect are the same as those in [7]. Nine persons in the database are used for training, and one person is used for testing, that is, the leave one actor method (leave one actor), cross-validation, LOOCV; the experiment cycle is 10 times, and the final average of the experiment results is ten times. The comparison with other documents is shown in Table 4 and Figure 6. It can be seen from the table that the recognition rate of this algorithm on UTKinect basically exceeds the recognition rate of all current documents. It is easy for an algorithm to obtain a relatively high recognition rate on the database, but it is very difficult to increase a few percentage points on the existing basis.

The recognition rate of the algorithm in this paper and the algorithm in [19] on two simple human motion databases is equivalent. In order to verify that the algorithm in this paper is more adaptable to different input data, the same as the previous experimental parameter settings, the algorithm in this paper and the algorithm in [19] are also tested on the more challenging character interaction data set MSR Daily Activity 3D. Refer to the detailed introduction of the database in Chapter 2.3. The experiment is set up in a five-fold cross-validation method. 1, 3, 5, 7, and 9 are used for training, and 2, 4, 6, 8, and 10 are used for testing. The recognition results are shown in Table 5. Since the skeleton feature does not contain the character interaction feature information, the recognition rates of the two methods are relatively low. But the method in this paper is 2.5% higher than that in [19], which proves the adaptive performance of the algorithm in different databases.

According to the data provided by the database, the bone joint coordinates of the MSR Daily Activity 3D dataset can be expressed in two ways, world coordinates and standard screen coordinates plus depth. In the feature extraction stage, the relative position feature of the world coordinates of the bone joints and the relative position feature of the standard screen coordinates are extracted, and these two features are spliced together to form the features on the bone data. The results are shown in Table 6 and Figure 7:

5. Conclusions

Human action recognition is a research direction that combines computer vision and artificial intelligence. It has been applied in the fields of abnormal human behavior recognition, intelligent nursing, and action comparison. It is one of the important technologies for the intelligent development of people’s lives. It is practically significant. The main advantages of this paper are 1. based on big data skeleton data; 2. in feature processing, it uses two clustering algorithms and initializes the parameters of the classifier with the clustering results; 3. fusion of bones and depth features to identify the interactive actions of characters.

However, the research in this article still has the following shortcomings: (1) the amount of survey data is not enough; (2) there are still irresistible errors in the research process.

The research in this article can continue to go further. The in-depth directions are as follows: 1. automatically learn action features; 2. combine scenes to recognize human behavior; 3. action recognition in unsegmented videos; 4. group action recognition.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare no conflicts of interest.