Keywords

1 Introduction

Today, a massive amount of video and multimedia data is processed. Cameras observe systems in manufacturing, food production and car traffic. They provide information in autonomous driving cars and advanced driver assistance systems as well as entertainment systems. In a similar way audio and further data are used and sometimes combined to one file or a group of files to make multimedia data. The amount of that data grows as rapid as the their complexity. The resolution increases to HD and more, they get 5.1 to 22.2 surround sound, as well as 3D or 360-degree. The field of application expands form TV and computer screens to huge projectors and small smartwatch like devices. But commonly the examination of accessibility, correctness, performance and especially quality will be done with old, small size single media samples like Fig. 1aFootnote 1, Fig. 1bFootnote 2 and Fig. 1cFootnote 3 from last century, reaching SD with stereo sound at most.

Fig. 1.
figure 1

Commonly used test images and test video

In principle thereby different steps of a processing chain (Fig. 2) are processed, in order to improve the data, to store them or to show them. Each step has thereby its own characteristics and adds errors, which can be noticed e.g. as picture artefacts shown in Fig. 3. The type of artefacts and their frequency of occurrence are heavily addicted to numerous parameters like the transcoding system, its implementation and settings as well as the input data. Different test patterns exists for different types of image artefacts and due to the innumerable amount of artefacts, they should prompt as many as possible artefact types and make them detectable.

Fig. 2.
figure 2

Processing chain for images “C, input from camera; G, grab image (digitize and store); P, preprocess; R, recognize (i, image data; a, abstract data).”[1]

Fig. 3.
figure 3

Common artefacts in digital images

Many artefacts are nevertheless not detectable, since they will not appear in a single test pattern. They are results of movements, quick image switching or other conditional image transformations which appears in image sequences like videos.

Testsets should not only cause the expected errors, but also make them clearly visible and detectable. In case of single images or nature movies it is e.g. difficult to see slight color differences or single pixel errors. Furthermore in some areas like image understanding [3], image retrieval or digital archiving [2] the testsets have to be as compact as possible since otherwise extensive tests would hardly or not at all be possible with such an amount of data. Facing these problems we conceptualized a highly flexible synthetic testset, which was adapted to these specific purposes.

Fig. 4.
figure 4

Schema of the framework to generate the testsets

2 Framework Structure

To generate a synthetic, versatile and flexible testset which is able to detect designated picture artefacts, it is necessary to use a highly adaptable framework from first to last as proposed in Fig. 4. As the fundament we need a vectorized description of the different test patterns like the Scene Description Language used in the open source raytracing software POV-Ray. This abstract scene description is based on parameters and coordinates whereby it is independent from the desired resolution, aspect ratio and file format. Due to this it is easily possible to change the testset or add further test cases in order to adapt the test pattern to changed purposes. In the next step we defined the test cases through the scene description parameters within a Python script which generates the workflow control file. This control file is constructed in such a way that all, a selective amount and even single test cases can be created. It uses Apache Ant to call POV-Ray and to pass the parameters to them. These test cases serve as the input data for the programs to test and as the original material for the comparison with the transcoded results.

3 Testset Generation

We used the description language of the raytracing renderer POV-Ray to define a set of test pattern in a abstract, target and size independent way as shown in Fig. 5a. At the same time a set of descriptions is defined with the Python programming language to form the test sequences (Fig. 5b) as well as the way to handle there execution through the planned program. This forms an Ant-based controlFootnote 4 file to allow parallel as well as independent execution of each test (Fig. 5c). Further processing steps can be accomplished if planned program needs it to handle the input, as shown in Fig. 4 for cases listed in Table 1.

Fig. 5.
figure 5

Sample of the control element to generation a testcase with rotation and its execution

The created test patterns can be divided in four groups of pattern designs. The first pattern design is a Cartesian grid structure of square blocks, which are provided in edge lengths of 1, 4, 5, 8 and 10 pixels. Except for the 1-pixel-design, the pattern sequences also exist in a rotating version. The images of the second pattern design are composed of 1, 2 or 4 rectangular sections. Additionally the 2-section-sequence are translated perpendicular to their separation line. The 4-section-sequences are also rotating. The third group contains images with stripes in widths of 1, 4, 5, 8 and 10 pixels, which also are rotated and translated in various ways. The last category of test patterns shows a Siemens star in different sizes and with a different number of beams, which are available in a rotating version as well.

Table 1. Usability of image sequences as direct input data
Fig. 6.
figure 6

The strip sample (a) rotating around the picture center leads with FFmpeg, H.264 and 1 MBit/s to the results in (b) and (c). It shows up different unevenly arising spots in varying intensity and with complete dissolution of the original pattern in the first.

Every test pattern sequence exists in four different resolutions, two commonly used and two unusual ones. On the one hand \(1920\times 1080\) as a high frequently used resolution for videos and movies as well as for displays and video projectors. On the other hand \(1024\times 768\) as an old-standard but still used resolution e.g. in smaller displays, netbooks and mobile devices. Besides we constructed two resolutions out of prime numbers: \(1009\times 631\) with an usual aspect ratio of about 16 : 10 and \(997\times 13\) with an uncommon aspect ratio of about 77 : 1. Moreover each test pattern were generated in five different color sets: A grayscale set, a color set that consists the six complementary colors red, green, blue, cyan, magenta and yellow, a set whereby all 16, 777, 216 colors of the RGB color space are randomly changing and two green color sets. These permutations finally result in over 900 test cases with different structures, transformations, colors and resolutions.

Every test case sequence is made of 360 images of every of these test cases whereby the appropriated transformations and colors change from frame to frame. We used FFmpeg 2015, Telestream Episode 6.4.6 and Adobe Media Encoder CC 2015.0.1(7.2) to combine and to transcode the respective 360 frames into different video formats with various qualities. We used the video codecs H.264/x264, which is employed on Blu-ray Discs, in the digital satellite TV broadcast standard DVB-S2 as well as in internet movies and in MP4 files for mobile devices, and MPEG-2, which serves as video format e.g. for DVDs and digital TV broadcast. The test sequences were transcoded to videos with bit rates of 400 kBit/s and 1.5 MBit/s, which are the minimum speed for high quality video calling respectively the recommended speed for HD video calling in Skype, in addition 4.976 MBit/s, 31.668 MBit/s and 83.11 MBit/s, which are the lowest and the highest possible bit rate in DVB-T as well as the highest in DVB-C, and 15 MBit/s, which is e.g. equal to the MPEG-2 Main Level bit rate. Refresh rate and resolution were not changed during the transcoding process.

Fig. 7.
figure 7

The motionless siemens sample (a) leads with Episode Engine, MPEG-2 and 1.5 MBit/s to the results in (b), (c) and (d). After the substantial artefact formation in first frame, a frame-wise quality improvement followed and leads to (c). The next frame (d) show substantial artefacts again and starts a new sequence of improvement with relative good quality conditions in the end.

4 Experimental Results and Discussion

The generated test sequences are processed by the video encoders and empirically examined for remarkable events. The results show that huge single colored structures as well as fence-like vertical or horizontal structures are good encodable. In contrast to that the strip-like content as well as the siemens star pattern will create clearly visible artefacts as shown in Figs. 6 and 7. Some are disturbing like in Fig. 6c whereas others impaired the whole image as Fig. 6b.

A further effect of video test sequences is shown in Fig. 7. Still standing similar frames will be modified by the video encoding process and a not existing movement is created.

5 Future Work

In this paper we proposed a new framework for generation of testset for multimedia systems. Since the description of the tests is separated, the applicability of the framework appears very flexible in creating arbitrary testsets and there execution in different environments. We show that the generated testsets can be usable to search for badly influencing effects of performance and quality. We show cases which are too complex to be detected with old-style test images and video sequences.

The next steps incorperate further more complex test patterns, composite sequences to address more artefacts. Test patterns with sound as well as 3D and embeded metadata are to be added. An automatic preliminary investigation of the results could be used to find candidates of problematic testcases. An application to other fields of image processing like robustness research in the field of pedestrian detection can be possible.