Abstract

The research purpose is to study the standardization and scientizing of physical training actions. Stacking denoising auto encoder (SDAE), a BiLSTM deep network model (SDAL-DNM) (a kind of training action model), and an unsupervised transfer model are used to deeply study the action problem of physical training. Initially, the physical training action discrimination model adopted here is a combination of stacked noise reduction self-encoder and bidirectional depth network model. Then, this model can collect data for five actions in physical training and further analyze the importance of action standardization for physical training. Afterward, the SDAL-DNM implemented here fully integrates the advantages of SDAE and BiLSTM. Finally, the unsupervised transfer model adopted here is based on SDAL-DNM deep learning (DL). The movement data of the physical training crowd are collected, and then the unsupervised transfer model is trained. According to the movement characteristics of physical training, the data difference between trainers is calculated so that the actions of each trainer can be continuously adapted according to the model, and finally, the benefits of effectively distinguishing the training actions can be achieved. The research shows that before and after unsupervised learning, the average decline of the model used is 1.69%, while the average decline of extreme learning machine (ELM) is 5.5%. The conclusion is that the unsupervised transfer model can improve the discrimination accuracy of physical training actions and provide theoretical support to effectively correct mistakes in physical training actions.

1. Introduction

With the continuous development of science and technology, technology in all aspects of society is constantly improved, and people’s lifestyles have undergone earth-shaking changes. With the continuous improvement of material life, people began to take some physical exercises in their spare time. In this case, physical training [13] has become a new way of life. Sports can not only play the role of physical exercise but also promote the comprehensive development of people’s spiritual quality. At present, there are various types of physical training in society. Nowadays, traditional physical training is no longer the only way of doing exercise in daily life [46] but also competitive physical training, recreational physical training, and medical physical training in the new era. Based on this, sports have become an indispensable part of life.

Relevant researchers have carried out a lot of research work according to the methods of physical training and the matching and adaptability of these methods to the human body. With the development of smart devices, portable and wearable smart devices are widely used. On this basis, the data collection of physical training [79] has provided great assistance to relevant researchers at home and abroad. Research shows that, based on the use of intelligent equipment, relevant scholars have confirmed that wearable devices based on embedded sensor networks can accurately detect the intensity of physical training and other indicators. Additionally, relevant scholars have studied the physical training with the random forest method of machine learning [1012]. It is pointed out that this method can effectively distinguish various actions and behaviours in physical training on datasets. At present, the research on information retrieval of physical training action pictures is divided into two dimensions. On the one hand, it is the retrieval mode of physical training pictures based on text information. In this dimension, the image text database needs manual maintenance, each image needs manual description, and each image has more information, which brings more workload. On the other hand, it is the retrieval method of training action image information based on content, which is mostly used in the fields of shopping and industry. However, due to the large amount of processed data, there are high requirements for servers and developers. Jonker et al. [13] once pointed out that the main characteristics of image and text information processing for large-scale training actions are as follows: the data volume of action images is large, the feature dimension is high, and the response speed required by users is fast. However, the traditional action image retrieval algorithm is calculated based on similarity, which cannot adapt to the retrieval process of large-scale images at all. Nguyen et al. [14] once pointed out that the hashing algorithm has gradually become a research hotspot in image information retrieval because of its small storage and high retrieval efficiency. Research of Chang et al. [15] showed that the hash algorithm could greatly reduce the computation of training action image retrieval, but the algorithm also losed part of the information of action images, which reduced the performance of the model. As a research hotspot of machine learning, deep learning (DL) has obvious advantages in the field of physical training image detection due to its strong learning ability. Yu and He [16] found that DL was introduced into the field of image retrieval of physical training action. The feature detection efficiency of the image can be effectively improved. Although many scholars have carried out research in these two aspects, there are still some problems in the image retrieval of physical training motion; especially in the face of large-scale motion image information, the current algorithm is still difficult to calculate efficiently.

According to the current global research situation, as well as the existing problems and shortcomings, the unsupervised transfer method under the model of the DL algorithm is adopted to conduct in-depth research on various actions in physical training. Based on this, the combination is implemented on the effective discrimination of physical training actions and unsupervised transfer methods to study the research topic. The innovation is the combination of the unsupervised transfer method of the DL model with physical training for a brand-new study, which is of great practical significance for scholars engaged in research related to physical training.

2. Materials and Methods

2.1. Sports Action Discrimination Model

The model of physical training action discrimination here is a SDAL-DNM model. This model is used to collect data for five actions in physical training and further analyze the importance of action standardization for physical training. Wearable devices are used to collect data. SDAL-DNM constructed can fully integrate the advantages of SDAE and BiLSTM. Figure 1 signifies the frame of training action discrimination.

One of the key points is that SDAL can connect previous information to the current task. In some cases, the current tasks can be performed only by the given previous information. SDAL can deal with such long-term dependency problems. People can carefully select parameters to solve the primary form of such problems. Yet, the traditional neural network model cannot successfully learn these knowledge points in practice. SDAL is deliberately designed to avoid long-term dependency problems. In practice, long short-term memory (LSTM) networks can remember the long-term information by default instead of obtaining information at great costs. By contrast, SDAL can remove or increase information to the cell state through a well-designed “gate” structure, a gateway for selective information passage. The gate contains a sigmoid neural network layer and a multiplication operation. For example, a language model might input a pronoun and output either another pronoun or verb-related information based on the input. Then, the model must consider the person and number of the input pronoun and know the inflectional verb rules to correct output information.

In the SDAL-DNM model, the loss function [17, 18] adopted is the cross-entropy loss. Its principle can be expressed as

Here, and refer to the true and the predicted values of training action data, respectively.

Deep convolutional neural network (DCNN) is a typical neural network structure, which is widely used in computer vision fields such as image recognition, target location, and face recognition. Local perception and weight sharing in convolution layer reduce the number of parameters and the complexity of the network model. Pooling reduces the size of feature graph and the number of parameters. There is always the problem of gradient disappearance when CNN activates the function with sigmoid, and especially in the complex network model, CNN is easy to overfit. Here, CNN is used to analyze and process the data collected during physical training. Figure 2 shows the data collected by sensors.

Sensor detection is the frontier technology of modern science and technology. It is the basis of manufacturing automation and informatization. Meanwhile, it is a basic course suitable for college Majors, such as Electromechanical, Automation, Aviation, Navigation, and Aerospace. Sensor detection technology involves measuring, transforming, and processing various physical quantities, chemical quantities, and biomass. It is widely used and significant to industrial and agricultural production and national defense. For example, sports technological giants are developing smart clothing and smart insoles with the help of Internet of things (IoT) sensors to track the health and performance of athletes and collect sports data for analysis. An American sports equipment manufacturer has manufactured shoes with built-in IoT sensors to store and share data, such as athletes’ running distance, speed, mileage, and fatigue. The data collected from sensors can be integrated with the team’s internal systems to analyze athletes’ performance, health status, stress, and injuries. Having realized the potential of the IoT in the sports fields, many enterprises invest heavily in advanced sports venues and sporting goods. However, the market of the IoT in the sports field is still quite primary. Therefore, sports institutions need to be prepared for IoT technological takeover. To do this, organizations can create their own strategies and build their infrastructure accordingly.

The deep feature extraction of SDAE is used to distinguish the training actions. Self-encoder consists of three parts: the first part is input layer, the second part is hidden layer, and the third part is output layer. Its training process is divided into two processes: coding and decoding. The process of feeding back the input data to the hidden layer is coding, and its process can be expressed as

Here, is the output value. and represent weight and deviation. stands for the sigmoid function.

Here, and represent the weight and deviation. is the sigmoid function. Here, the cross entropy loss can be applied to calculate the error between the initial data and the output data. Equation (4) expresses the detailed calculation.

According to the results, updating can be realized on the overall network. Equation (5) is the conversion form when the coding layer performs coding operation.

A lot of noise reduction self-encoders are stacked to be turned into deeper networks. At this time, the SDAE network is obtained, and Figure 3 displays its structure.

2.2. Unsupervised Transfer Model

Transfer learning [19, 20] is actually a process of knowledge transfer in two different fields, and for new fields, transfer learning can be good assistance. Figure 4 illustrates the specific flow of the learning.

CNN can recognize the image scenes and provide relevant titles. It can also identify daily objects, humans, and animals. Recently, CNN has also played a great role in some natural language processing (NLP) tasks, such as sentence classification. Thus, CNN is becoming an important tool for most machine learning practitioners. Channel traditionally refers to a specific component of an image. Standard digital camera images have three channels (red (R), green (G), and blue (B)) and can be modeled as three stacked two-dimensional matrices mathematically, each with pixel values between 0 and 255. First, the image input mechanism into the neural network must be understood. Since the computer is more suitable for matrix operation than humans, scientists try to model images into matrices. All color images are superimposed by three RGB channels stored in the computer through three matrices. Then, the computer visualization technique presents layers with different shapes or structures. When these images are input to a CNN, these differently shaped and structured layers become different filters or operators. Then, each CNN neuron will select an operator to extract the input image features of the previous layer to get a stronger and higher-level representation. Therefore, compared with ordinary neural networks, CNN’s every layer has a different mathematical representation. Yet, no matter how its mathematical representation changes, there is always a way to calculate its gradient using the piecewise function or convolution layer derivation functions. Only the convolution layer derivation functions might be more complex. The unsupervised transfer model here is based on SDAL-DNM DL. With this foundation, the movement data of the physical training crowd are collected to train the SDAL-DNM unsupervised transfer model. According to the movement characteristics of physical training, the data difference between trainers is calculated. Furthermore, the actions of each trainer can be constantly self-adapted according to the model, and finally, effectively distinguishing the training actions can be achieved. Figure 5 explains the working principle of the unsupervised transfer model.

In the unsupervised transfer learning of action discrimination for the process of physical training [21], invariant features are extracted according to the ambiguous features of target and source, of which data are further classified. Figure 6 is a schematic diagram of the model structure for feature point extraction.

Generally speaking, the status of the system of unsupervised transfer is determined by time series, and the system model equation is as follows:

Here, represents the action state at time k. is the state function at time k. is the action state at time k. Generally, the action states meet the Gaussian distribution. In order to realize physical training or training action prediction, observation and analyzation of the state in the training process need to be realized in various ways. The measurement of its system can be expressed as

Here, represents the body state at time k. is the state transfer function. denotes the observed state, which normally tallies with Gaussian distribution and is independent from the training action . Based on Bayesian estimation, after the observation system data is obtained, the prior knowledge is applied to estimate the state of the system, which includes prediction and update. The observed value of the system is used to deduce the probability density of the current state of the system .

Through equations (8)–(11), the recursion operation from time k to time k − 1 is obtained, but the concrete data value of probability analysis on the difficult movements in physical training is difficult to calculate, which leads to the impossibility of realizing the recursion operation. Therefore, the DL algorithm is introduced to predict physical training actions. The flow of the DL algorithm is as follows: Step 1: data are collected, and the convolution kernel and weights are randomly initialized. Step 2: according to the characteristics of forward propagation, relevant operations are carried out to calculate the output probabilities of multiple classification at the same time. Step 3: the output probability is compared with the actual physical training to calculate the error between them. Step 4: the gradient descent algorithm is used to reduce the loss in propagation. Figure 7 signifies the detailed flow of the algorithm.

The CNN algorithm in DL [2224] is used to extract features from the data collected in physical training. CNN can minimize the occurrence of preprocessing and can directly extract the most expressive features from the original data input by specifying features manually. In CNN, the main function of the pooling layer is to implement downsampling operation on the input feature map in two dimensions: length and width. It can reduce the number of parameters in the physical training model by implementing the downsampling operation on the input characteristic data. According to the index of the number of parameters, the data complexity of different actions in physical training is reduced. It can reduce the degree of overfitting of the neural network and the probability of falling into local minimum. In addition, the pooling layer can enhance the robustness of the model to translation and distortion in images.

According to the knowledge of DL, in the model of CNN [2527], the action data of physical training go through multiple convolution layers and pooling layers, and then they are connected with each other by one or more full connection layers. In order to improve the network performance, ReLU function is usually used in the excitation function used by all neurons. Meanwhile, in order to reduce the amount of computation and enhance CNN’s generalization performance, AlexNet (a model of deep CNN [2830] with more network layers and stronger learning ability) is chosen. Further improvement is made on the functional layer of the convolution layer of the AlexNet model, from “local normalization and then pooling” to “pooling and then local normalization.” This improvement has two main advantages: first, it can further enhance the generalization ability of the AlexNet [3133] network, weaken the phenomenon of overfitting, and greatly reduce the training time. Secondly, overlapping pooling before local normalization can not only keep more data information and weaken redundant information in the pooling process but also accelerate the convergence rate of the training action judgment prediction model in the physical training process. It can highlight the advantages of overlapping maximum pooling over previous maximum pooling methods.

In the improved AlexNet [3436] algorithm flow, and s are initialized as

Here, , is sorted in ascending, in order to find out the smallest top ks among them. And, refers to the nearest k samples from . is updated according to the equation as follows:

The eigenvectors corresponding to the smallest c eigenvalues of Laplace matrix can be used to form function F. Through equation (13), can be solved.

The physical training features are extracted more deeply. The tth feature map of the lth convolution layer is sampled in the way of overlapping pooling.

Here, s represents the moving step of the pooling layer, and indicates the width of the pooling layer .

After the first and second pooling layers of the AlexNet [37] model, local normalization layers are added to standardize the feature map .

Here, are all superparameters, which can be set as 2, 0.78, 10−4, and 7 in turn. N stands for the total number of convolution cores in the lth convolution layer. In order to prevent the problem of “gradient dispersion” in the network model [38], the ReLU function is used as the activation function for convolution output .

Among parameters in equations (16) and (17), refers to the ReLU activation function. If the weight is set as , and the gradient is expressed as , then the biased first-order estimation s and second-order estimation r of the updated gradient can be calculated as follows:

In equation (18) and (19), means estimated instantaneous exponential decay rates, 0.9 and 0.999 by default. Deviations and of the corrected first moment s and second moment r can be separately expressed as follows:

Finally, the updated weight is obtained.

Here, represents a learning step. is a constant (10−8 by default). Figure 8 is the schematic diagram of convolutional network pooling.

In order to prevent overfitting, the parameter of dropout operation is set to 0.5. When l equals 5 in equation (15), all feature images are reconstructed into a high-dimensional single-layer neuron structure C5; then, calculation can be made on the input of the ith neuron in the 6th fully connected layer as

Here, and are the weight and offset of the ith neuron in layer 6.

In the process of improving generalization ability, the neuron is output without weight in the 6th and 7th fully connected layers. If , then the input of the ith neuron in the 7th and 8th fully connected layers can be represented by . And the output of the ith neuron in the 6th and 7th fully connected layers can be referred by or . Finally, the input of the ith neuron in the 8th fully connected layer can be obtained through the equation as follows:

Meanwhile, the cross-entropy loss [39] suitable for classification task is used as the error function of the model, and its equation is as follows:where k is the number of categories. , , and represent the sample’s actual category distribution, network output, and classification result. The input of the softmax function is an N-dimensional real number vector, which is set as x, and the equation is as follows:

At the perspective of the essence, softmax function can map an N-dimensional arbitrary with real vector to an N-dimensional vector in which the values of each element are in the range of (0,1), thus realizing the normalization of the vector. In order to reduce the computational complexity of the model system, the compression-expansion conversion reduces the output data amount to 28, i.e.,  = 255, to improve the prediction efficiency of the model.

2.3. Dataset Collection

Firstly, the SDAL-DNM model collects five sporting processes of athletes: forward motion, backward motion, left motion, right motion, and squat motion. The model can capture and calculate athletes’ acceleration in the sporting process. Then, through unsupervised transfer learning, the motion accuracy of athletes is captured and analyzed by extracting invariant features from the fuzzy features of targets and sources. The obtained results are sorted and statistically analyzed through visualization. Finally, the data are analyzed to demonstrate the research conclusions.

3. Results

3.1. Analysis of Training Action Model Results

The data of five training actions are collected and analyzed by the SDAL-DNM model. Figure 9 presents the specific data of acceleration in three dimensions of different training actions. Among them, the acceleration change of x axis in Figure 9(a) is the largest, and the absolute value of the maximum value reaches 16.9 m/s2; Figure 9(b) also indicates that the acceleration of the x axis changes the most, with a specific value of 18.7 m/s2; the x-axis acceleration changes of Figures 9(c)9(e) are not as good as those of Figures 9(a) and 9(b). Figure 9 shows the acceleration curves of various actions.

3.2. Result Analysis of Unsupervised Transfer Model

The specific data collected here represent the concrete situation of every movement in the physical training. After the unsupervised transfer model, the specific analysis results are presented in Figure 10. In Figure 10(a), according to the accuracy before and after learning by using the research method, the average decline of the model studied is 1.69%, while that of extreme learning machine (ELM) is 5.5%. According to the situation of Figures 10(b)10(e), the model has the lowest average decline. This fully demonstrates that the stability and transfer ability of the model adopted in this paper are superior to those of ELM. Figure 10 signifies the comparison of model accuracy before and after learning.

ELM is a feedforward neural network. It does not need gradient-based backpropagation to adjust the weight but sets it through Moore–Penrose generalized inverse. A series of calculations are completed by multiplying the input by the weight, adding the offset, calculating the activation function, calculating the output, and error backpropagation. The gradient descent takes a longer training time than the matrix inversion, and the accuracy and number of nodes are most critical of these calculated results. Then, in the first two datasets, different sizes of BP networks are used to obtain the same results as ELM. In the first case, the BP network’s size is five times the original, and in the second case, the size of the BP network is two times the original. The experimental results indicate that the ELM method can accurately approximate datasets. The training process of ELM can be reduced to a nonlinear optimization problem. When the activation function of the hidden layer node is differentiable, the input weight of the network and the threshold of the hidden layer node can be randomly assigned. Given the defects of ELM, now more scholars are trying to improve the ELM using the variable scale optimization algorithm of compound chaos to obtain better generalization performance and higher prediction accuracy in one-step and multistep prediction problems. It might work to use ELM as the experimental technical support for the present work because it can accurately learn and calculate the athletes’ movement speed under specific circumstances. However, ELM cannot capture the athletes’ movements effectively, so the present work does not choose ELM.

4. Conclusions

Today, with the rapid development of science and technology, people’s lifestyles are constantly changed. Meanwhile, physical training, as a way to exercise, has gradually become the trend of the new era. Under this background, relevant studies are carried out according to the physical training actions. Aimed at the accuracy of action discrimination in physical training, the SDAL-DNM model and the unsupervised transfer model are put forward, and the standardization and scientizing of physical training actions are deeply studied. By collecting the action data of physical training lovers, the unsupervised transfer model is implemented and the data differences among different trainers are calculated so that the actions of trainers can be adjusted adaptively, thereby achieving the effect of efficiently identifying training actions. According to the experimental results, the average decline of the model before and after unsupervised learning is about 3.8% less than that of ELM, which shows that the model proposed here has obvious advantages in performance. The final conclusions are as follows: (I) unsupervised transfer model can improve the discrimination accuracy of physical training actions and provide theoretical support for effective action-correction measures. (II) the SDAL model can effectively analyze the athletes’ instantaneous state data and accurately capture and analyze their acceleration or real-time speed. The deficiency lies in the lack of training movement data collected, which has a certain influence on the universality and rigor. In future research, this is what needs to be paid attention to.

Data Availability

The data used to support the findings of the study are included within the article.

Conflicts of Interest

The authors declare no conflicts of interest.