1. Introduction
Video content evolves to more complex forms of media, with a variety of different combinations of spatial resolution, dynamic range, color range, codecs, containers and frame rates. Among others, it is well known that high-frame-rate (HFR) video content is important for high-quality consummation and especially further relevant video investigation, since transmission bandwidth and storage is affected by new frame rate formats [
1]. Generating new video and immersive media comes with a lot of challenges for traditional infrastructures and sharing possibilities, particularly for higher resolutions, like temporal. HFR contributes to the increase in perceived quality, but in most practical applications, video still rarely exceeds 60 frames per second (fps) [
1,
2].
Low frame rates (LFR) or standard frame rates (SFR) have become an obvious limitation, especially when it comes to sport, fast action genres (as in cinema and gaming) and immersive virtual reality (VR) and augmented reality (AR) content [
2]. This is also recognized in BT.2020 or the ultra-high-definition (UHD) television standard, where up to 100- and 120-frame frequencies are adopted for further exploitation [
3]. Moreover, in the Advanced Television Systems Committee (ATSC) 3.0 ecosystem, high or very high frame rates are expected, as well as video services developed by interconnecting 5G communication networks and ATSC 3.0 broadcasting. Currently, in these cases, high-efficiency video coding (HEVC or H.265) is adopted to deal with novel video technology formats like HFR [
4,
5,
6]. Progressive formats are accompanied by picture rates with possible dealing with SFR and HFR (like 120 fps) recovery and temporal filtering [
5,
6]. Media over Internet Protocol (IP) provides a high level of flexibility, and new broadcast systems are already using IP infrastructures [
7]. The infrastructures enable HFR distribution, but dealing with video in such manner also leads to significant changes in bandwidth requirements. Besides future formats and services, compression technologies are of crucial interest [
6]. The perceptual quality improvement resulting from HFR is recognized in industrial and academic communities [
8]. HFR is preferable for applications in order to enhance the smooth end-user experience and to produce different effects [
9,
10]. It is not easy to select and adopt frame rates, since frame frequency changes produce distortions. In typical workflows, the frame rate is decided before any acquisition. This leads to general suggestions that production needs to be as high as possible during the production phase, where end-user deliverables are adopted/modified to final needed frame rates [
10]. An increasing number of modern content creator captures show their activities using HFR on social networks and sharing IP platforms [
11,
12,
13]. Working with fast-forward video and similar content that is not the result of temporal consecutive frames is especially challenging, since the quality is jointly dependant on frame rate and compression [
14]. Video reproductions in HFR have been reported and analyzed in [
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23]. Video conversion, on the other hand, usually means frame rate upscaling or upconversion, often referred to as frame interpolation [
24,
25]. One should have in mind that it also comes with compression format [
26,
27,
28,
29,
30,
31,
32].
There has been extensive analysis of video tracing by long-range dependency (LRD) and multifractals [
33,
34,
35,
36,
37,
38]. One of the most popular tracing tools is the Fast Forward Moving Picture Experts Group (Fast Forward MPEG or ffmpeg) solution [
33], while self-similarity is considered as one of the most powerful properties [
34], where LRD and multifractals have been used in many applications related to different types of sequences [
35], statistical modeling and analysis of video traffic [
36,
37] and queuing performance [
38]. Video traces after frame parsing have been examined for two main purposes: the purpose of traffic modeling [
39,
40,
41,
42,
43,
44,
45,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55] or towards characterization of compressed video by focusing on specific standard [
56,
57,
58,
59]. For traffic modeling, studies are related to specific protocols [
39], queuing [
40], variable bitrates [
41], specific prediction models [
41,
42,
43,
44,
45,
46,
47,
48,
49,
50,
51], buffering [
52], dynamic bandwidth allocation [
53], attacks [
54] and various content [
55]. On the other hand, characterizations of compressed video using specific standard and video traces are considered using MPEG-4 Advanced Simple Profile (ASP) [
56], MPEG-4 version 2 and H.263 in [
57] and advanced video coding (AVC or H.264) [
58] and its extensions in [
59]. So far, to the author’s knowledge, LRD- and multifractals-based studies oriented towards characterization of compressed video by focusing on specific standard and different compression factors have only been examined up to MPEG-4/AVC content [
56,
57,
58,
59], while HEVC should deal with modern HFR video content. Since each standard affects tracing differently, it is crucial to understand the LRD and multifractal behavior of HEVC HFR compressed content while having in mind different compression quality. Moreover, studies related to detection of frame rate upconversions, i.e., temporal resolution recovery (TRR), are presented in [
60,
61,
62,
63]. In recent works, MPEG-4 traces have been investigated [
60]. They are mostly investigated for original LFR content: frame rates of 24 fps upconverted to 30 fps [
61], frame rates of 15 fps [
62] and frame rates of 15 to 30 fps [
63]. Recent previous studies have shown that it is possible to perform TRR detection [
60,
61,
62,
63] using MPEG-4/AVC SFR video content/traces.
The purpose of this work is to analyze video traces corresponding to HFR HEVC compressed content using LRD and multifractals and to tackle issue of TRR detection by examining effects found in TRR. Namely, this paper addresses HEVC (or H.265) frame size traces extracted from HFR video content that are considered from an LRD and multifractal point of view. As a consequence of the above, the analysis of HFR should go hand in hand with compression observed using constant rate factors. Here, the focus is on UHD HEVC video traces collected from data corresponding to HFR and frequency up to 120 fps, where tests have been performed using publicly available reference HFR video content. Temporal resolution recovery (TRR) has also been examined. The contributions of this paper are as follows:
- −
HFR HEVC frame size traces show specific behavior in LRD- and multifractal-based analysis, where difference before and after temporal resolution recovery (TRR) exist.
- −
The experimental results are obtained for HEVC compressed HFR video frame size traces for the first time in multifractal domain, which may contribute to recognition of possible changes like TRR.
- −
Having in mind the obtained results and spectra behavior, a novel detection method is proposed for TRR detection regardless of compression level expressed through constant rate factors.
- −
The proposed TRR detection model based on weighted k-nearest neighbors (weighted kNN or WkNN) classifier shows high accuracy detection percentage in the performed experimental analysis using a relatively low number of features.
This paper is organized as follows. After the introduction, in
Section 2, a brief description of HFR, video quality and coding is given. In
Section 3, additional details on works related to multifractal analysis of compressed video content are presented. Video frame size traces and data gathering are explained in
Section 4. Applied methods for LRD and multifractal spectrum calculation are described in
Section 5. HFR content is characterized using LRD and multifractal properties before and after TRR. Moreover, novel model for TRR detection is proposed based on HFR video multifractal analysis, WkNN classifier and a relatively low number of multifractal features. The experimental results on 4k 120 fps HFR content are shown in
Section 6, where a high accuracy percentage is obtained for different content and compression rate factors. Finally, conclusions are given in
Section 7.
2. HFR Processing and Challenges
It is well known that frame rate impacts quality of experience related to how realistic content being consumed is or which style one desires to obtain, like motion blur, slow motion or fast forward. HFR video is expected to approach realism when there are a lot of actions happening, as in sports, busy scenes in movies and gaming but also in live and realistic experience with crisp information. One of the benefits is increased realism, where video seems more immersive by making the viewer’s experience more lifelike.
Frame rate is generally described as number of frames per second (fps), which is illustrated in
Figure 1, where HFR means that temporal resolution is increased and more images are captured in a given amount of time. HFR can be described as video content captured or displayed at a frame rate of 60 fps or higher. This is in contrast to the SFR/LFR that is typically used for television [
5,
6]. Besides the realistic experience which is tied to the perception of motion, HFR reduces motion blur when an object is moving and enables its clearer representation, described as smooth motion given in more detail. Even so, differences in frame rates in acquisition and reproduction may produce uneven pacing and sometimes longer frames. Inconsistent frame times, known as judder, and decreasing low frame rate producing so-called stutter effects are only some of the issues [
12]. Video quality estimation may produce different results due to varying video behavior when transforms are made in frame rates according to specific quality mode selections. Wearable and lightweight cameras like action cameras are popular in the consumer industry, meaning that both professional and nonprofessional content has a variety of distortions [
13].
Temporal resolution changes through objective measurements are still mainly analyzed using standard full-reference approaches [
17]. Quality assessments are usually made by mean squared error, peak signal-to-noise ratio, structural similarity index, video multimethod assessment fusion and similar metrics [
18,
19,
20]. In [
12], frame rate differences have been considered by using video multimethod assessment fusion (VMAF) and entropy differences. Video collections like Youtube-UGC [
21] or Konstanz KoNViD-1k [
22] are made for research using different quality scores, compression results and distortion diversity, which can be used for purposes like constructing general no-reference models. Still, a few studies have been specifically considering HFR with publicly video sets that are acquired with frame rates equal to or above 60 fps for research reasons, like Waterloo HFR [
16], LIVE-YT-HFR [
12] and BVI-HVR [
1]. Primarily, for HFR experimental analysis, the video tracing and compression analysis Ultra Video Group (UVG) dataset can be used, since it consisted of 120 fps sequences of even higher spatial resolution, meaning containing RAW video content up to 4k 120 fps [
23]. This is the reason why this dataset is chosen here. The general suggestion in production is to keep the frame rate high as much as possible, where the choice of video frame rate may be intentionally HFR [
10].
One should have in mind that even if an acquired video is HFR, this can be a significant barrier for many systems and devices. Devices with limited processing power may require downsampling, like frame dropping. HFR may be experienced even as unnatural and not easy to follow by the human visual system, leading to frame rates being decreased. Moreover, interoperability between components of a system may cause lower frame rates for HFR processing tasks. Frame rate can be downscaled, leading to significant decrease in cost in storage or streaming. On the other hand, it is well known that decreasing the frame rate can also result in choppy video experience. This means that video conversion can also be followed by frame rate upscaling, usually referred as frame interpolation, where temporal resolution is increased by adding the frames between the known frames [
24,
25]. This represents temporal resolution recovery or TRR.
Any conversion is difficult and comes with a lot of challenges, especially in the temporal domain. HFR leads to higher video file size and bandwidth challenges. Since frame rate affects storage and the capacity of telecommunication channel, HFR quality is accompanied with compression. This inevitably introduces possible unwanted artifacts and undesirable components in motion picture result, where coding and compression solutions enable to decrease the video size by keeping the video quality high. Giving appropriate insight into such content is needed.
HFR assessment goes with video compression. In a nutshell, the design of variety of coding standards and codecs is needed to achieve specific tasks. MPEG is dedicated to efficient coding and compression algorithms, where MPEGx and H26x standards have been popular over the years [
26,
27,
28,
29,
30,
31]. Algorithms are becoming more complex, and advancements are being made to deal with new video technologies. In general, coding steps include block-oriented making intra- and interpredictions, transformation and quantization, filtering and entropy coding. MPEG-2 has become popular over the years in practical applications like broadcasting, where MPEG-4 continues to be the leading choice for streaming implementations. H.264, or AVC standard (MPEG-4 Part 10), was introduced in 2003 by International Telecommunication Union—Telecommunication (ITU-T) Standardization Sector and Organization for Standardization/International Electrotechnical Commission (ISO/IEC) [
26,
27,
28]. It is a video compression standard based on block-oriented and motion-compensated coding supported by a wide range of devices and systems and is still one of the most widely accepted standards known for high compression efficiency and high-definition television implementation.
AVC was followed by HEVC, introduced in 2013, also known as H.265 or MPEG-H Part 2 [
29,
30]. HEVC, briefly speaking, enables compression of approximately half the size of AVC as a next-generation standard. It supports streaming and broadcasting with higher resolution, where HEVC is mostly used in action cameras and smart phones for HFR purposes. Also, there are many other available solutions for IP delivery, like VP9 and its successor AV1 [
31,
32,
33]. Nevertheless, it should not be neglected that AVC is still in force for various implementations, but when it comes to HFR, it is expected to transfer to standards like HEVC [
26]. HFR HEVC compressed video content effects have not been considered to a large extent, and this is relevant to practical implementations. For example, recently, in [
32], HFR was analyzed from a perceptual quality point of view in the case of HEVC and VP9 by authors for full high-definition (HD) video sequences and five constant quality factor values, showing the better performance of HEVC for higher rates using standard metrics.
Frame size trace sequences for AVC HD and 4k/UHD HEVC video can be seen in
Figure 2 for a sequence taken from the publicly available UVG dataset described in [
23]. Comparison after frame frequency alignment in
Figure 2 shows different trace sequence behavior. Still, self-similarity-based analysis related to HFR HEVC has not been performed so far to the best of the author’s knowledge.
There are a lot of challenges related to HFR processing that need to be further investigated, such as effects due to compression and codec settings, effects due to differences in frame rates, no-reference HFR content characterization, HFR reproduction, editing and hardware utilization. Here, only some of these issues are tackled. Effects due to compression are considered in this paper for HEVC standard and TRR, but other standards like VP9 and content modifications can also be taken into account. No-reference characterization and quality estimation is of general interest for HFR processing. A relatively large amount of different raw HFR video content is needed in the research community. Also, HFR reproduction, decoding and editing require additional resource utilization compared with SFR due to the high number of frames per second, where possible effects of available acceleration approaches need to be researched further.
3. Self-Similarity and Multifractal Analysis of Compressed Video Content
Video can be manipulated in many ways, affecting and controlling the overall quality. The most frequent choices are setting constant quality factor or buffer size or using constant, constrained or variable bit rates [
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46]. A large number of physical systems and nonstationary signals tend to show similar behavior at different scales, known as having self-similarity properties [
34,
35,
36]. These properties have been analyzed using fractal and multifractal theory for compressed video content and video tracing.
In order to ensure a desirable quality of service, self-similarity is investigated for constant and variable bit rates in [
37]. Self-similar patterns are explored for high-speed network traffic in [
38]. In early works, long-range dependency or LRD in video traffic represented by traces has been mostly quantified by a single-parameter Hurst exponent [
37,
38]. LRD means that traces exhibit correlation over a range of time scales, where among standard statistical video traffic metrics, like mean and variation in video traces, additional self-similarity properties have been investigated under different conditions, where LRD is only one feature of fractal-like behavior. In [
39], Transport Control Protocol (TCP) traffic collected through a number of bytes arriving per time is multifractal and is analyzed using spectra, enabling valuable statistical estimation. Multifractals are applied for behavior of a queuing system in [
40]. Tail distributions in a multifractal sense while measuring variable bit rate are compared in [
41].
Generally, there are two main directions in analysis of video traces. The first one is traffic modeling. Self-similarity has been widely recognized, and multifractal-based traffic modeling has been found suitable for video tracing [
39,
40,
41,
42,
43,
44,
45]. Different self-similarity models are tested for network traffic analysis and prediction: using fractional traffic Brownian motion model [
41], wavelets [
42,
43], multiplicative approaches [
44,
45], autoregressive models [
46,
47,
48,
49] and Markov chains [
50,
51]. Experimental multifractal analysis is applied for dimensioning, buffer capacity interpretation and statistical multiplexing of video streams [
52], as well as for dynamic bandwidth allocation [
53]. Multifractal spectra have been compared during the normal work and force attacks in a communication network [
54], while differentiation between spectra is used to show the consistency of LRD [
55].
The second direction in investigating self-similarity properties of video traces is oriented towards the characterization of compressed video, having in mind specific standards. Most of the research uses MPEG-4 traces for testing, being one of the most valuable practical standards, like in [
56,
57]. MPEG-4 Advanced Simple Profile (ASP)-based encoded traffic is tested for estimating queuing performance [
56]. In [
57], MPEG-4 version 2 and H.263 video traces are compared using accompanied parsers in order to extract ten sequences, so-called frame size traces, which are found statistically valuable for testing performance. This work has been continued on new encoders like H.264/MPEG-4 AVC [
58,
59]. H.264 video compressed traces are analyzed using multifractal and fractal approaches [
58]. Trace analysis with extended encoding standards is performed in [
59]. The whole encoding or transcoding process takes time and, due to settings, it is hard to compare the former generated traces with the new traces [
56,
58]. Frame size trace sequence according to each standard is different.
4. HFR HEVC Video Traces and Temporal Recovery Data
Temporal recovery or frame upconversion detection has been examined in [
60,
61,
62,
63] using MPEG-4/AVC traces. In [
62], motion-compensated frame rate upconversion is proposed with the possibility of its detection via optical flow algorithm, where original frame rates were of 15 fps. A frame rate conversion detection is also analyzed in [
63], having in mind interpolation schemes like common nearest neighbor interpolation. An automatic approach using machine learning is proposed for four original frame rates of 15 to 30 fps and conversions up to 30 fps. Multifractality may be useful in recovery detection, having in mind video tracing and multifractal analysis [
56,
57,
58,
59,
64,
65,
66]. For example, machine learning and multifractal features are applied for an intrusion detection system in an unmanned aerial system in [
65]. Moreover, Legendre multifractal spectrum is applied in [
66] for animation frame analysis and its differentiation from real and partially animated ones, especially due to self-similarity properties found also in video traffic analysis.
In this paper multifractal analysis of HFR frame size traces of HEVC compressed video sequences is the focus. LRD and self-similarity effects are considered for compressed video characterization, with special attention to their application in video change/modification detection. Here, HFR video traces are collected similarly to as is explained in the previous section, where frame size sequences are extracted using an accompanied parser [
57]. HEVC compressed video represents input for the parser, which is applied for obtaining the xml trace file needed for statistical analysis, as shown in
Figure 3.
Video trace sequences are generated using ffmpeg v5.1.2 for different content. If audio exists within a file, it is removed. Constant rate factor, denoted as crf, i.e., two-pass crf, is selected as a model for controlling the output. The crf option is available for popular codecs and keeps the output quality level by rate control method, which is applied in practical implementations. Lower crf values in compressed data correspond to higher video quality. Six crf values are used, ranging here from 20 to 40. The supported preset option focused on speed and codec complexity is set to default, meaning medium, for the video trace collection, and no additional tuning is applied. The trace/data collection and the experimental analysis are carried out on Intel(R) Core(TM) i7-10750H Central Processing Unit (CPU), 2.60 GHz with 16 GB Random-Access Memory (RAM) on Windows 10 Pro 64 bit operating system without including specific graphical acceleration possibilities.
Since it is of interest to investigate the behavior of video traces, the testing circumstances are selected to be as simple as possible. In order to perform the analysis, the experimental procedure employs LRD and multifractal methods for estimation of HFR. The most common HFR video change is temporal resolution recovery, named here as TRR. The scenario of typical TRR found in practice includes temporal filtering, followed by temporal resolution matching. Significant savings in memory and channel capacity can be probably temporarily made in the temporal domain by decreasing frame frequency, and this is called temporal filtering [
5,
6]. Frame frequency alignment leads to HFR TRR. It is generated by the increase in frame number after a loss of original data, where specific temporal upsampling is ignored, as in [
67], to avoid the choice of different methods and adding undesirable artifacts. Here, it is assumed that self-similarity properties of video traces may be observed in TRR scenario. TRR is valuable, since the practical implementations often need savings and further comparisons in the HFR domain.
This HFR TRR after temporal filtering can be considered common in practices where it is needed to have a matching frame rate, as in the original HFR video case. In this paper, the focus in on differentiating these TRR and original sequences. Additionally, it is possible to decrease frame rate to match original one expecting similar traces to the original ones, but it is evident that in the HFR 120 fps case, this is not still common in practice. The losses in HFR video recovery may produce specific tracing behavior that may contribute to possible detection of such changes. Besides temporal resolution changes, selected compression quality is expressed here through crf. For the experimental analysis, reference and publicly available HFR video sequences are selected. Additional tests are made using an action camera.
The basis of the experiments represents video trace collection that is made according to UVG dataset [
23]. Recently, the benchmark was widened in 2020 for additional sequences, where of particular interest here are the 120 fps source files, available in YUV format in 4k/UHD or 2160 p spatial resolution. Source files representing HFR YUV 8-bit video sequences used for the analysis are listed in
Table 1.
Each source video file contributes to original and TRR video trace sequences corresponding to different crf values. Applied methods are related to LRD, or to be more precise, Hurst index evaluation, as well as multifractal spectra calculation for further comparison and testing.
7. Conclusions
This paper presents experimental results obtained for HEVC compressed HFR video frame size traces for the first time in a multifractal domain. In the analysis, it is presented that HFR trace sequences manifest long-range dependence and multifractal behavior. In comparison between temporally recovered UHD 120 fps HFR and corresponding nonrecovered or original HFR data, lower Hurst indices are obtained, as well as often wider multifractal spectra. By analyzing spectra for different crf compressed sequences, it is assumed that it is possible to differentiate TRR signals from the original ones. The proposed WkNN approach was able to detect recovered video data, where the Mahalanobis measure was applied. Also, the feature vector is of low length, and features are extracted as nonreference. Input can be a TRR sample, which can be detected without prior assumption related to constant rate factors and without direct comparison between modified and original sequence. The proposed detection approach gave above 98% in accuracy during the cross-validation.
Overall, the differences between TRR and original HFR video are not easy to notice, even though Hurst indices like the ones calculated using R/S statistics show these differences. Multifractal spectra and their characteristics are also indicative in differentiating between the two groups consisting of various content and motion. By examination of the reference UVG dataset included in the video trace analysis, this research shows that multifractal descriptors and the trained model may be adequate for detection. The model enabled high-accuracy results regardless of compression rate or content. However, further development of the proposed model should be oriented towards other distortion possibilities in HFR domain.
Integrity and authentification issues may arise, and the approach may be useful in TRR or frame rate upconversion detection. Namely, there are a lot of challenges associated with HFR, and HFR needs to be properly addressed in order to truly realize its potential. The obtained results in this work can be considered valuable for future research. It can be concluded that HFR represents a significant advancement in the field of video technologies, and tracing analysis is important for dealing with specific behavior that HFR brings.