AVS trick modes for PVR and VOD services

https://doi.org/10.1016/j.image.2008.12.008Get rights and content

Abstract

AVS1-P2 is a recently completed video compression standard developed by the Audio and Video Coding Standard (AVS) Workgroup of China. The standard promises comparable compression efficiency to the H.264/MPEG-4 AVC video codec with lower implementation complexity and royalty fees. AVS1-P2 is the Chinese next generation national video coding standard with an increasing amount of industrial importance as it is being required for different applications and services. It is expected that AVS1-P2 will be prominent in emerging Chinese digital video application markets, especially for personal video recorder (PVR) and video on demand (VOD).

This paper discusses how trick modes for PVR and VOD can be performed with AVS1-P2 content. It begins with generating an index table to facilitate AVS1-P2 trick modes followed by a discussion of basic and advanced trick modes. Next, the minimum decoder speed and display frame buffer requirements are analyzed for smooth trick play, especially for 1× rewind. VOD stream delivery strategies for various trick modes and transitions are then discussed. Finally, an overview of other topics that may affect implementation such as content protection and encoding is provided. Many of the discussions are also applicable to other video coding standards like MPEG-2, H.264/AVC, etc.

Introduction

PVR and VOD are two methods to satisfy the constantly increasing appetite of consumers for content in today's world. They are revolutionary consumer video applications that free users from the time constraint of viewing scheduled live television broadcasting [1], [2], [3], [4]. PVR and VOD also provide the versatility of select playback and associated special trick play features such as pause, fast forward, slow forward, rewind, slow reverse and frame advance.

This paper will define PVR as using a local hard disk or other storage medium to record and playback broadcast material and define VOD as using a network server or other storage medium to stream content to the local device. Note that the terms PVR and VOD are commonly interchanged, e.g., the use of terms like “network PVR” to represent PVR from a network storage device and “push VOD” to represent content that is “pushed” onto a local hard disk before viewing. Nevertheless, it is useful to briefly discuss both technologies before discussing the new Audio and Video Coding Standard (AVS) video compression standard and then how trick play features can be implemented for AVS PVR and VOD deployments. It is also important to note that an operator does not need to choose between PVR and VOD. Both technologies have their pros and cons but can also be effectively used together.

Traditional PVR offers consumers a local hard disk within their set-top box (STB) that digitally records live television programs. The additional cost of the local hard disk for PVR is alleviated by the fact that storage prices are continually getting cheaper and the service scales efficiently since each user retrieves content from the local hard disk instead of the network. While the local hard disk has smaller capacity than typical headend or network storage, PVR consumers can configure the recordings on their local device to match their interests.

VOD has traditionally been used to describe a system where content is stored on a network server. A user can then select and play programs from the network to their local device. Early VOD solutions left consumers with few content choices and did not provide the necessary interactivity to drive mainstream home viewing, making them niche solutions at best. Initial VOD infrastructures also were expensive and had scalability issues that limited its growth potential. However, with recent advances in networking and storage capabilities as well as interactive digital video server technology, VOD is now a reality and provides a simple method for operators to deliver content to a large number of consumers on their network.

AVS1-P2, also known as AVS video, is a recently completed video compression standard developed by the Audio and Video Coding Standard Workgroup of China (AVS Workgroup) [5]. The standard promises comparable coding efficiency to the H.264/MPEG-4 AVC codec [6], [7] with lower implementation complexity and royalty fees [8], [9]. In February 2006, AVS1-P2 was approved as a Chinese national standard. After that, the industrial marketplace begins to embrace the AVS technology. Many new Chinese video services including IPTV, terrestrial digital broadcasting and HD–DVD require AVS1-P2 support [10]. In March 2008, Broadcom released the BCM7405 [11], a high-performance SOC chip that provides full support for AVS1-P2 Jizhun Profile HD decoding [12]. (The BCM7405 is a next generation high-definition satellite, cable and IP set-top box solution offering integrated AVS1-P2, H.264/AVC, MPEG-2 [13], MPEG-4 Part 2 [14], DivX [15], Xvid [16] and VC-1 [17] video decoding technology.) As the AVS industry continues to mature, AVS1-P2 is a very promising codec for Chinese digital video application markets.

A detailed overview of AVS1-P2 is beyond the scope of this paper, but the following text provides a brief overview highlighting some of the main aspects of AVS1-P2 that impact PVR and VOD.

There are several profiles specified in AVS1-P2. For example, the Jizhun profile is the “baseline” profile and the Zengqiang profile is the “advanced” profile. In this paper, only the Jizhun profile is used to describe and illustrate PVR and VOD concepts and techniques.

The AVS1-P2 syntax is composed of two main layers: the sequence layer and the picture layer.

The sequence layer provides sequence level information, such as a stream's profile and level, picture size, frame rate, etc, and it also specifies random access locations. Note that unlike MPEG-2, the first picture in an AVS1-P2 bitstream after the sequence header in decoding order must be an I-picture. Furthermore, if there is a video_edit_code transmitted before the sequence header, the bitstream may be modified before the sequence header and the first group of B-pictures following the first I-picture may lack forward reference pictures.

The picture layer provides picture level information, such as picture type, picture level quantization parameter, progressive frame flag, etc. AVS1-P2 supports both progressive pictures and interlaced pictures. The two fields of a progressive picture come from the same time instant while the two fields of an interlaced picture have one field time interval difference. In AVS1-P2, progressive pictures are coded as a single frame (and referred to as a progressive frame). Interlaced pictures can be coded as a single frame (an interlace frame picture) or as two separate fields (an interlace field picture). Note that two interlaced field pictures share a same picture header.

There are three different picture types in AVS1-P2: I-pictures, P-pictures and B-pictures. More details on each of the picture types are listed below. Note that the most important aspect for trick-mode implementation is picture dependency. The frame ordering and bumping process for AVS1-P2 is similar to the one used in MPEG-2.

  • I-picture: a picture coded independently of any other pictures in the stream. This definition is similar to a MPEG-2 I-picture. In AVS1-P2, an I-picture may be a possible random access point. The P-pictures following an I-picture may refer to pictures before the I-picture. An exception is that if the I-picture is the first picture following the sequence header, subsequent P-pictures are prohibited from referencing pictures before that I-picture. The first picture after the sequence header in coding order must be I-picture.

  • P-picture: a picture that is coded using motion compensated prediction from previous I- or P-pictures. P-pictures in AVS1-P2 are conceptually closer to P-pictures in MPEG-2 Main Profile video rather than P-pictures in H.264/AVC. In AVS1-P2, P-pictures may use at most two previous I- or P-pictures as reference pictures. (In MPEG-2 Main Profile video, P-pictures can only use immediately previous I- or P-pictures as reference pictures and in H.264/AVC, a large number of pictures that have been decoded and stored can be used as reference pictures)

  • B-picture: a picture that is coded using motion compensated prediction from the immediately previous and/or future I- or P-pictures. B-pictures in AVS1-P2 are not used for predicting any other picture, i.e. B-pictures in AVS1-P2 can always be dropped without affecting the decoding of other pictures. Therefore, B-pictures in AVS1-P2 are conceptually closer to B-pictures in MPEG-2 Main Profile video rather than B-pictures in H.264/AVC. (In H.264/AVC, B-pictures are allowed to be used as reference pictures.)

The AVS1-P2 syntax does not define a structure like the MPEG-2 group of picture (GOP) structure. For ease of discussion in this paper, an AVS GOP will be defined as follows:

  • An AVS GOP is a group of successive pictures within a coded video stream that begins with an I-picture preceded by a sequence header and ends before the next video_edit_code, sequence header or sequence_end_code.

Video coding standards like MPEG-2 are usually designed to support a broad range of applications. Application specific standards (e.g., DVB [18]) are designed to constrain video coding standards to the appropriate tools and/or parameters for a particular application. Currently, the application specific requirements and standards for AVS are not as mature as they are for other video codecs [18], [19], [20]. There is scarce literature discussing how AVS can be applied to specific applications. This paper addresses how PVR and VOD can be implemented for AVS video, specifically targeting the trick play modes. Index table generation, the process of extracting the exact locations and characteristics of different pictures in an AVS bitstream, is discussed in Section 2. Section 3 discusses PVR trick-mode implementation. Various stream delivery strategies for trick modes and transitions in VOD are discussed in Section 4. Section 5 discusses other implementation considerations including content protection and the effect of encoding and packetization. Final conclusions are given in Section 6.

Section snippets

AVS1-P2 index table generation

An important aspect of any type of picture manipulation algorithm such as trick play is to first determine the exact locations and characteristics of different pictures in the bitstream. This allows one to carefully control exactly which pictures are decoded and displayed to create the desired visual effect.

When and how this information is extracted from a bitstream depend on specific application scenarios. Consider a scenario where information is extracted when a bitstream is being recorded

AVS PVR systems

There are various methods to implement trick modes for PVR systems. For simplicity, this paper will discuss two different trick-mode implementations: basic trick modes and advanced trick modes.

Basic trick modes involve manipulating the recorded stream to create a reassembled/reconstructed bitstream that is sent to the decoder. The decoder operates in its normal decoding mode with minimal or no knowledge that this bitstream is different from a standard bitstream. The visual effect of the trick

An AVS VOD system with basic trick mode

VOD systems stream content to a set-top box, allowing viewing in real time. The majority of cable, satellite and IPTV television providers offer pay-per-view services whereby a user buys or selects a movie or television program and it begins to play on the television set almost instantaneously.

Streaming video on demand systems provide the user with a large set of PVR functionality including pause, fast forward, fast rewind, slow forward, slow rewind, jump to previous/future picture, etc. These

Content protection

PVR has become a popular service that is an important revenue generator for many entertainment video service providers. As discussed earlier, an important aspect of PVR is to create an index table which contains the exact locations and characteristics of different pictures in the bitstream. This index table allows a decoder to further manipulate the stream such as removing certain pictures without parsing the entire stream each time stream manipulation is desired.

As described earlier, the

Conclusion

In this paper, AVS trick modes are discussed for PVR and VOD applications. The AVS trick play can be facilitated by a pre-generated AVS index table. Using the index table, basic trick modes manipulate the recorded stream to create a specially processed bitstream that is sent to the decoder while the decoder operates in its normal decoding mode. To achieve better visual effect during trick play, especially for 1× rewind, advanced trick modes use special commands to change the decoder operations.

References (25)

  • A. Puri et al.

    Video coding using the H.264/MPEG-4 AVC compression standard,

    Signal Processing: Image Communication

    (2004)
  • A. Oliphant et al.

    TV gets personal

    IEE Review

    (2001)
  • P. Hearty

    Carriage of digital video and other services by cable in North America

    Proceedings of IEEE

    (2006)
  • Information technology – Advanced coding of audio and video – Part 2: Video, N1370, Beijing, August...
  • Advanced video coding for generic audiovisual services, ISO/IEC 14496-10 | ITU-T H.264,...
  • L. Fan, S. Ma, F. Wu, Overview of AVS video standard, in: Proceedings of the IEEE International Conference on...
  • Z. Jiang, W. Gao, H. Liu, An AVS based IPTV system network trial, in: Proceedings of the 8th Pacific Rim Conference on...
  • BCM7405 Product Brief,...
  • Cited by (3)

    View full text