Abstract

In recent years, the Mean shift algorithm has extensive applications in the field of video tracking. It has some advantages of low cost, small memory, and good tracking effect. However, there are some shortcomings in the existing algorithm; for example, it cannot produce adaptive changes as the target size changes. And when there are similar objects, it is prone to target positioning errors and tracking failures caused by occlusion. In this paper, an improved method of continuous adaptive change Mean shift (Camshift) for high-precision positioning and tracking is proposed. The traditional Camshift method only uses hue components in HSV to extract features. This paper uses the combination of H and S components in HSV space to build a two-dimensional color feature histogram and with the image’s LBP feature histogram to increase tracking accuracy. Meanwhile, for the sake of target occlusion and nonlinear changes in the tracking process, this paper introduces a Gaussian-Hermit particle filter that is updated by the Kalman filter. Experimental result demonstrates that the real-time performance of the proposal in this paper is better than Mean shift, Camshift, simple particle filter, and Kalman filter.

1. Introduction

With the wider application of video tracking, scholars continue to propose various methods [1, 2]. In video tracking algorithms [36], the Mean shift [7, 8] has many merits such as small calculation amount, good robustness, and high tracking accuracy. But it has certain limitations [9]. For example, fixed tracking window and reverse mapping simply relied on the brightness structure in HSV will cause the tracking effect to deteriorate. Besides, when the target rotates and deforms greatly, it is easy to lose the target and cause tracking failure. For this reason, [10, 11] proposed a method of adaptively changing the tracking window. This method adds adaptive capabilities to the original algorithm to handle the impact of target size changes. In addition, in order to cope with the tracking environment where similar object interference appears, [12] uses the H and V vectors on HSV to construct the histogram to enhance the features of the target so that the tracking accuracy is increased. However, the issue of the target moving faster was not discussed. [13, 14] raises a method that combined with the Kalman algorithm. The Kalman algorithm [15] is treated as predictor to forecast the target globally and to obtain more accurate target coordinates. This improvement can reduce the error of the tracking when the object moves too fast. [16] proposed a tracking algorithm combining multiple feature templates. This algorithm is combined with the SIFT algorithm to make its tracking effect better. However, the number of array operation is too large to satisfy the requirements of real-time tracking. [17] combines the back projection of Camshift and the motion state obtained by the optical flow method to achieve target prediction, but this algorithm also has the shortcomings mentioned in the previous literature and poor real-time performance. [18] uses chromaticity-differential two-dimensional joint feature to establish a target model. The maximum difference value of each pixel in the 8-neighborhood is used to describe the relative information of the pixel and the detailed information. The experimental result of this method is good, but the calculation amount is too large. In [19], back projection is used to obtain the chromaticity-differential two-dimensional joint probability distribution map of the tracking image.

In [20, 21], Kalman filter is utilized to enhance robustness when the object is partially occluded. Kalman filter is used as a processor to process partial occlusion when the target’s motion mode is linear. However, this method fails to consider the processing of nonlinear problems. [22] proposed a particle filter-based method that improved nonlinear motion tracking. [23] proposed to find the local maximum of the particle iteratively. In this way, some individual particles can be aggregated into a larger particle set, thereby reducing the divergence and obtaining a better tracking result. In [24], Mean shift particle filter requires only a few particles when dealing with fast-moving targets and when encountering occlusions. But since each particle needs to be iterated by Mean shift, even if the quantity of particles is reduced, the tracking speed is still very slow. In addition, as [25] performs arithmetic operations on each particle set, this leads to a large amount of calculation and also makes the particle set too concentrated in this method. At this time, when the tracked object is partially or completely occluded and then reappears, which will cause object tracking failure. Aiming at the shortcomings of the abovementioned documents, [26, 27] proposed the ideas from the local characteristics of the target to the overall characteristics. Among them, the overall feature model of the target mostly meets the characteristics of linear or Gaussian motion, while only a small part of the object does not meet the above characteristics. For objects that meet linear or Gaussian models, Kalman filter is used for motion estimation; for objects that do not meet linear motion, particle filters are used for local estimation. [28] linearizes each particle to obtain more aggregated particles and perform nonlinear processing on each particle. The quantity of particles can be able to greatly cut without any iterative addition in this method. Therefore, their calculation time remains the same. The algorithm proposed in [29] has a degradation phenomenon, and the key to suppressing the degradation phenomenon is to choose a good important density function. In view of the above shortcomings, we improve as follows: (1) we use the original H component to improve the H and S components to build the target histogram; (2) use the target’s LBP feature histogram and H and S histograms to establish an inverse mapping graph; (3) use a hybrid filter to improve the tracking effect during occlusion.

In this paper, we propose to take advantages of the weighted feature histograms of H and S in HSV to describe the characteristics of the target and then use the inverse mapping distribution of the color histogram to improve the tracking accuracy. The Gaussian-Hermitian particle filter proposed in [30] is intended to generate the importance density function of the particle filter, and the Kalman filter is used for global motion estimation to obtain better state estimation accuracy. The structure of this paper is as follows: Section 2 introduces the proposed algorithm. Section 3 details the improved Camshift method. Section 4 is the experiment, and Section 5 gives the conclusion. In Section 6, the references are listed.

2. The Proposed Method

Camshift algorithm is evolved from Mean shift algorithm. Among them, Mean shift has the superiority of simple operation and small memory space. It is a method that uses the nonparametric function of density gradient estimation to locate and track objects by iterative optimization to seek out the extremum value of the probability distribution. So this article briefly introduces the process of Mean shift algorithm.

The algorithm flow in this paper is as follows: (1)Select the object and record the input position and object information (2)Find the zero-order distance of the object:

Find the first-order distance:

Compute the centroid of the search window : (3)The size of as ; the length is 1.2 s(4)Adjust the size of the window and find the centroid and specify the measure of the fixed threshold. If the distance moved each time is greater than the value we set, it will repeat (2), (3), and (4) until the distance they move each time is less than the set value, or the quantity of calculations reaches the maximum amount of times set will stop the calculation

2.1. Target Model

When the tracking target is selected, [31] proposes the histogram information of the target color to construct the histogram, and then, the kernel density estimation function Epannechnikov [32] weights the pixels at different positions, highlighting the color characteristics of pixels at different positions contribution. From KE in (7), we can see that the pixels in the center of the target area are counted for color information. It can be seen that the contribution of the pixels in the center of the target is the largest, and the contribution at the edge of the target is the smallest: where is the histogram interval obtained by segmentation, represents the set of pixel values in the corresponding interval, and represents each pixel value. is the Kronecker del function [33]; is the normalization constant and . The function is used to determine which color histogram belongs to in order to construct the color space. KE(.) is the kernel density estimation function, which is applied to weight and highlight the features of the target. Its expression is as (7): where represents the pixel position of the target area.

2.2. Candidate Target Model

In the selection of candidate targets, it is assumed that are pixels in the area to be selected. is the bandwidth that determines the value of pixels in the positioning process. is the normalized pixel position of the target candidate centered on in the current frame; then, the expression of its probability function is as (8):

is the normalization constant so that , .

2.3. Similarity Measure

After using the calculation process in (6) and (8) to obtain the weighted probability function of the target and the candidate target, respectively, we then use the similarity function to quantitatively express their similarity. We calculate the similarity between these two discrete representations and using the Bhattacharyya distance in [34]. Among them, the value range of their similarity is as the (BH) coefficient in (10) is the interval [0, 1]. The size of BH responses the extent of similarity between the target and the candidate target. If the value is larger, the candidate region model and the target model are more similar, and the candidate target we get is the greater the probability of our tracking target.

3. Improved Camshift Tracking Algorithm

Camshift is an improved algorithm extended from Mean shift [35] algorithm to consecutive video sequence, which performs Mean shift [36, 37] operation on all video frames of the video and takes advantage of the size and center of the search window of the previous frame as the initial value of the search window of the Mean shift algorithm in the next frame to find the optimal iterative result. Through the above calculations, we can tail the target after. The Mean shift algorithm process is: (1)Initialize search window (2)Establish a back projection in the light of the HSV feature of the video frame(3)Utilize the Mean shift algorithm to obtain the position and the size of target to search for the new window(4)Reinitialize the size and position of the search window with the value in 3 in the next frame of video image and then jump to 2 to continue

Camshift is able to effectively improve the issue of target size change and partial occlusion, and it does not require high system resources, and the real-time tracking is good, and good tracking results can be achieved when the background conditions are not complicated. However, when the background is more complicated, or there are interferences parallel to the target color pixels, it is going to cause the tracking effect to deteriorate and even the tracking failure because it simply considers the H component in the color histogram to construct color features. For the sake of improving the performance of the system, this paper divides the H and S components into different intervals. For the number of intervals divided by H, we take the value in (6); then, the quantization step size of each interval is 360/; in the same way, S is also divided into different intervals, and the number of intervals is ; then quantization, the step size is 255/. We obtain the color feature histogram of the target based on the obtained spatial range of H and S and then through the action of the weighting function. And the H and S components are transformed by (11) to obtain the standardized distribution table of the inverse mapping distribution, and then, the histograms of hue and saturation obtained are back projected through Mean shift operation to obtain the position of the region of interest. The function of (11) is the amount to a color probability search table that converts the H and S components in the inverse mapping into the same distribution as in formula (11):

Among them, and represent the pixel values in the H and S components, respectively, and and represent the maximum value of and . The is on behalf of the probability distribution of the colors of the H and S components.

The basic principle of LBP is to regard a pixel as a center point, and its value is calculated by comparing the relative gray value of the point and its neighboring pixels as a response, by comparing the pixel value of the surrounding neighborhood. The most primitive LBP calculation formula is described as (12) and (13):

Add the result of the inverse mapping transformation of the feature map obtained by LBP and the inverse mapping image obtained by HSV according to the ratio. The above improvement process is as follows:

The first step is to initialize:

Initialize the search window W and initial position of the selected target in the search window.

Step 2. Establish HSV and LBP feature map:
Get the HSV and LBP features of the selected target and select LBP, H, and S to build a two-dimensional weighted feature histogram.

Step 3. Calculate the back projection:
Using the H and S components and LBP feature of picture, a back projection is established.

Step 4. Track the target:
We use the initialized window position to back project, call the Mean shift algorithm to acquire the position and size of the target, and use this as the initial value of the next frame of image. Camshift adjusts and shrinks the window in the light of the calculation information of Mean shift to determine the location information of the target.

Step 5. Loop:
In the next search, we utilize the position information acquired in the previous step as the initial value to continuously adjust the search window and execute step 3.

In the improvement process of the above-mentioned Camshift, the change of the target size and the tracking phenomenon of the target and the background resemblance to the situation can be improved. However, the target model that appears in the target tracking process sometimes fails to meet linear or Gaussian conditions, which will lead to tracking failure. For this possible situation, this paper introduces the method of mixing Gaussian-Hermit particle filter and Kalman filter. Specifically, the algorithm in this paper uses the Gaussian-Hermit particle filter on the basis of the improved Camshift method to improve the nonlinear and non-Gaussian problems and then uses the Kalman filter to renew the prediction, which be able to enhance tracking accuracy. Compared with the Kalman filter algorithm, the method in this paper is also suitable for nonlinear system models. It only needs to know the prior distribution of the target variable and does not strongly depends on the initial conditions. In addition, the Kalman filter algorithm may lead to divergence, failure, and tracking failure under nonlinear systems and inappropriate initial condition models. The improved algorithm in this paper passes the region of interest tracked by Camshift as input to the Gaussian-Hermit filter. The particle filter considers the state of Camshift to define the number of particles. Then, the processed state is applied to the [38] Kalman filter for prediction update, and the updated tracking result has better tracking accuracy than before. At the same time, unlike Kalman filter tracking in the global scope, the method in this paper restricts the particles to tracking moving targets in the local field of view. Through the combination of these two kinds of filtering, the problems of occlusion and nonlinear motion are improved. In addition, this paper is based on the feature extraction of color density features to meet the application features of Gaussian-Hermit particle filter [39], so GH is used to construct the features of the importance density function. Its GH one-dimensional integral formula (16) is expressed as:

Gauss point is ; the weight coefficient is , and is the value of Gauss points.

Assuming that the posterior probability of the system state is approximated by a Gaussian distribution, and are mutually independent Gaussian noises with zero mean, and the variances of [40] are and , respectively. Gaussian-Hermitian filtering, like Bayes filtering, also has an iterative process of prediction and update. It also uses transfer equations and observation equations to obtain posterior probability estimates. Specific steps are as follows: (1)Forecast: suppose the is system state at the previous moment, and the variance is . Use the Gaussian point transformation formula to carry out the transition equation between Gaussian transformation and system state and calculate the system state and its variance; then, the system can be obtained The prediction state and variance are expressed as: (2)Update: according to the prediction results, continue the Gaussian transformation to obtain a new Gaussian point and observation model. The system state and its variance are updated as follows:among them

Based on the above theory, the GHPF algorithm steps are as follows:

Step 1. Importance sampling. Gauss-Hermitian filter is performed on the sample set at -1 time to obtain the state and variance of each particle and take them as importance density functions, namely:

Generate prediction sample set:

The importance weights corresponding to the particles are:

Normalized weight is:

Step 2. Resampling of particles. Get the particle sample set , then the posterior probability density is:

Step 3. Estimate the posterior mean of the state.

The predicted in (19) is updated through the Kalman filter to obtain a satisfactory tracking effect. The introduction is as follows: when the observation noise is large, the particles cannot accurately describe the posterior probability, which will affect the estimation performance. Considering the influence on the order estimation of the state , we pass the predicted through the Kalman filter for prediction update. Since equation (18) and equation (19) are approximately linear equations, Kalman filtering can be considered to obtain its optimal estimate , where we define a state space equation for the Kalman filter system:

Equation of state:

Measurement equation: where is the state vector, is the measurement vector, is the state transition matrix, is the control vector, is the control matrix, and and are the system noise and satisfy the Gaussian noise. The process of Kalman filter is a recursive process. The constant “prediction update process”, its prediction status value is as (31)

Minimum prediction error:

Updated status:

Measurement error:

Measurement covariance:

Optimal Kalman gain:

Correction status value:

Corrected minimum mean square error:

The location information of the tracking target is obtained after predictive update. The steps of the aforementioned hybrid particle filter algorithm 1 are as follows:

1. Initialization: sample the prior probability distribution p(x0) to get particles
2. For
3. Update the particles according to formula (19) to obtain
4. Update the importance weight according to equation (29) and get
5. Normalize the weights according to equation (32) to obtain
6. Resample the particles to obtain the particle density (34)
7. According to equation (35), the Kalman gain is obtained
8. According to formula (37), the posterior estimation error covariance matrix is is obtained
9. According to equation (36), the optimal estimate of state is obtained
10. End

4. Experiment

In this paper, the experiment environment is under the 64-bit system of Windows10, and the processor is Intel(R) Core (TM)i5-9400 [email protected] 2.90GHz and RAM is 16.0 GB. The experiment is carried out in python3.7 under the editor of PyCharm. The relevant information of these three groups of experimental videos is shown in Table 1. The video sets of these three groups of experiments contain different experimental information. In this paper, three sets of video sets in Table 1 are used to verify our tracking effect. These three sets of experiments contain environments of different complexity. In the first group of experiments, the tracking of a car on the expressway is a part of the experiment. The tracking environment of this group of experiments is relatively complex, including all occlusions and partial occlusion, as well as abrupt changes of motion state. The second group of experiments was conducted in an intersection environment. Compared with the first group of experiments, it had more kinds of interference, among which the interference not only included vehicles, pedestrians, zebra crossings, and roadside public facilities. The third experiment was conducted under intense light conditions. In this experiment, the interference was mainly caused by light and roadside trees and moving vehicles. These three groups of experiments have different emphases. For the sake of analyzing the tracking consequence of our experiment in detail, we conducted quantitative analysis on the data of the three groups of experiments. In this paper, we analyze the change of the centroid coordinates of the tracking object to represent the change of the position of the object and use the difference of the centroid of the tracking object on the - and -axes to represent the tracking error. We verify the tracking effects of different algorithms by analyzing the errors of different experiments. At the same time, the advantages and disadvantages of different algorithms are analyzed to verify the advantages of this paper. The video set selected in the experiment is shown in Table 1, which records the name of the video, the frame number of the video set, and the size of the video in detail.

In the first experiment, we chose a video frame number of 2480 with a resolution of to track a video on a highway with a relatively complex environment. The experimental environment includes partial occlusion of the object, total occlusion, sudden change of target speed, lane change of the target vehicle, and tracking under the interference of similar targets. The tracking effect of this experiment is shown in Figure 1. We select 8 frames of the video to represent the experiment of this group. Under the same conditions, it can be known that the tracking effect of our means is relatively good, which can not only track the target in the case of occlusion after updated prediction but also combine the improved particle filter and Kalman filter in the face of velocity change. It can not only improve the tracking performance but also realize various tracking effects when the target has lane change. At the same time, for the sake of demonstrating the robustness of our method, this paper compares the predicted value of each method’s tracking target with the actual value and then utilize the difference value to the error graph to make quantitative analysis, which is the experimental effect. The tracking effect of experiment 1 is shown in Figure 2, and the experimental error results are shown in Figure 3.

The second group selected a crossroad video clip with 252 frames of video for the experiment. The jamming objects in this set of videos are more complex. In this experiment, a video of a crossroad was selected. The disturbing objects in the video included not only the interference of pedestrians, vehicles in different directions, and the white zebra section on the sidewalk but also the interference of two cars parked on the roadside with similar outlines. The experimental environment is complex, and the experimental results of various methods are also different. The results of this group of experiments are revealed in Figure 4. It can be concluded from the figure 4 that other methods can still track the target in the first few frames when interference exists. After some tracking updates, other methods all stay on some different interferences, which cannot overcome the conditions of interference, resulting in the failure of tracking. The algorithm presented in this paper not only has good experimental results in interference and turning but also has good tracking effect when the size of the target changes. In order to better analyze the tracking effect, we analyzed the centroid value of the target. It can be seen from Figure 5 that our method has better effect on the distribution and real position of the centroid, which be able to meet the precision requirements of real-time tracking.

In the third experiment, the frame number of the video was 2160, and the resolution of the video was . The video environment of this group of experiments is under the condition of intense illumination. The main interfering factor in this experiment is the intensity of light. In the experiment, we can see that our method can always track the target from Figure 6 even if there is interference from similar items in the tracking process. However, the tracking effect of other methods cannot adapt to the change of the object quickly and requires more frames to be initialized. It can be known from the experiment that the tracking impact of other methods is relatively poor under the same conditions. However, with the passage of time, the tracking infection of other ways gradually gets better, and the tracking result is also better. Compared with our method, the tracking accuracy of other methods is low, and the tracking effect is not very good. It can be concluded from Figure 7 that our method has a good tracking effect with low relative error and good tracking effect.

Meanwhile, for the sake of quantitatively analyzing our experimental results, this paper uses the actual centroid coordinates of the tracking target position and the centroid coordinates of the observation position to verify our method, where and is on behalf of the coordinates of the centroid of the tracking object on the - and -axes. As indicates in the figure, Figures 37 represent the changes in the centroid points of the tracking objects in videos 1, 2, and 3, respectively. From Figures 37, we can see that under the same conditions, our method has a relatively good tracking impression, and the real-time tracking and tracking accuracy meet the tracking requirements.

It can be observed that the coordinate position of the centroid of the tracking target on the and coordinate axes from Figures 37. This article analyzes the video tracking error based on the coordinate value of the tracking target. From the changes in the line graphs of Figures 810, we can come to the following conclusions. In the first set of videos, large errors will occur due to occlusion and sudden changes in the motion state, but the tracking effect is adaptive to Mean shift and particle. The method is not much different from our method, and the tracking effect is relatively good. But in the second complex environment, the advantage of the tracking effect of our method is shown. In the tracking, the error of our method is small, and the accuracy is high, but other methods have failed to track. The effect of the third set of experiments is not much different from the tracking errors of other methods. In the later stages of tracking, the tracking accuracy of these methods is improving and can basically meet the tracking requirements, but the error of our method is relatively small, and the tracking effect is better, and the error effect diagram is shown in Figure 10.

In order to further prove the accuracy of our method, we selected the data of experiment two and compared our method with Kcf and TLD methods. It can be seen from Table 2 that our method has small errors and meets the accuracy requirements of real-time tracking.

5. Conclusion

The improved Camshift method proposed in this paper uses a weighted kernel function to weight the pixels around the target to increase the weight of the pixels close to the target in order to highlight the target features. Then, the original Camshift algorithm is improved from a one-dimensional histogram of HSV (hue, saturation, value) that only contains hue components to a two-dimensional histogram of color features using hue and saturation components in the HSV space and LBP feature of image to highlight the target. Features can enhance the accuracy of tracking similar targets. Meanwhile, for the sake of tracking the occlusion and nonlinear non-Gaussian changes in the tracking model, we introduced a Gaussian-Hermit particle filter to improve the nonlinearity and occlusion problems in motion, and then, we use the Kalman filter to predict the update that makes the tracking accuracy more accurate. Through experiments, we can conclude that the improved method put forward in this paper not only satisfies the effect of real-time tracking in time. In terms of tracking accuracy, we can see that the coordinates of the actual position coordinates, and the predicted coordinates have a small difference to meet the tracking requirements. Through the analysis of experiments, our method is capable of being better than Mean shift and Continuously Adaptive Mean shift (Camshift) and simple particle filter and Kalman filter in accuracy. Consequently, this algorithm has great application prospect in the field of target tracking and location.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Ethical Approval

This article does not contain any studies with human participants performed by any of the authors.

Conflicts of Interest

Sijie DU declares that he has no conflict of interest. Hongxin Xu declares that he has no conflict of interest. Tianping Li declares that he has no conflict of interest.

Authors’ Contributions

Funding acquisition was done by Tianping Li. Methodology was done by Sijie Du and Hongxin Xu. Project administration was done by Tianping Li. Software was done by Sijie Du. Validation was done by Hongxin Xu. Writing the original draft was done by Sijie Du. Writing the review and editing were done by Tianping Li. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

This work was supported in part by the NSFC (61572286 and 61472220), the NSFC Joint with Zhejiang Integration of Informatization and Industrializaiton under Key Project (U1609218), and the Fostering Project of Dominant Discipline a Talent Team of Shandong Province Higher Education.