Elsevier

Pattern Recognition Letters

Volume 33, Issue 10, 15 July 2012, Pages 1338-1348
Pattern Recognition Letters

Video synchronization based on events alignment

https://doi.org/10.1016/j.patrec.2012.02.009Get rights and content

Abstract

This paper presents a method of synchronizing two video sequences. The changes of kinematic status of feature points are considered as events. The basic idea of this paper is to temporally align these events observed in the two cameras by using an algorithm to score each candidate event correspondence, such that each false correspondence with a lower score could be discarded. Then the recovered event correspondences are obtained and they can be used to coarsely estimate synchronization parameters via the Hough transform. Finally refine these parameters by solving an optimization problem in order to recover synchronization to sub-frame accuracy. The method is evaluated quantitatively using synthetic sequences and demonstrated qualitatively on several real sequences. Experiment results show that the method is applicable to multiple features case, single feature case, different frame rates case and even the case of single feature with the two cameras relative motion.

Highlights

► An events-alignments-based method of synchronizing two video sequences is proposed. ► An algorithm is proposed to score each candidate event correspondence. ► Applicable to cases of multiple features and single feature. ► Suitable for cases of different frame rates. ► Can handle cases of single feature with relative motion between cameras.

Introduction

The multi-camera video sequences have been applied to some common communities, such as human motion tracking system (Mori et al., 2001), multiple objects tracking (Stein, 1999, multi-camera scene surveillance (Takemura and Ishiguro, 2010), dynamic depth recovery (Zhou and Tao, 2003), urban 3D reconstruction (Pollefeys et al., 2008) and sport scenes (Inamoto and Saito, 2003, Saito et al., 2004, Inamoto and Saito, 2005). Multi-camera systems require cameras to synchronize with each other for tracking objects and accurately measuring depth. Video synchronization ensures that image features matched between sequences also correspond to the same instant in time before spatial matching. Some applications assume the input video sequences are already synchronized (manual synchronization or time-stamped-based synchronization, e.g. EyeVision (2011), while others use optional built-in expensive hardware to synchronize cameras by using a trigger signal from a single source such that every camera opens its shutter at the same time. In practice, however, many stock footage is captured without hardware synchronization or time-stamped-based synchronization such that an alternative solution is necessary. Many known researches have shown that the video sequence itself can provide sufficient constraints to synchronize the cameras (Caspi et al., 2006, Reid and Zisserman, 1996, Tuytelaars and Gool, 2004b, Tresadern and Reid, 2009, Caspi and Irani, 2000). The ability to synchronize sequences from the available feature locations makes the method applicable to stock footage where hardware synchronization is unavailable.

An early task of video synchronization was manually achieved by using the geometric distance of a point in one view from its epipolar line in the other as a measure of temporal correspondence between two sequences (Reid and Zisserman, 1996). This kind of methods require the epipolar geometry to be known (Zhou and Tao, 2003, Pooley et al., 2003, Carceroni et al., 2004). Stein, 1999, Lee et al., 2000 achieved the synchronization of two sequences by assuming a homography relationship between cameras. Because of using exhaustive search among different intervals between video sequences, their methods are computationally quite expensive.

Caspi et al. (2006) proposed a feature-based sequence-to-sequence alignment method. The alignment is done both in time and in space, where the spatial alignment can be modeled by a homography or a fundamental matrix. Due to the adoption of homography or fundamental matrix, the sequences alignment are completed concurrently in both space and time, but at the same time, the cameras are required to remain relative fixed to each other.

There are another class of spatio-temporal alignment methods, which also make use of the constraint of 3D epipolar geometry (Rao et al., 2003, Cao et al., 2010) or 2D homography (Tresadern and Reid, 2009). In these approaches, fundamental matrix or homography matrix is not computed during the process of alignment. As we know, if a group of equations has non-trivial solutions, the rank of coefficient matrix must be less than the number of unknowns. They just use the necessary condition that there exits non-trivial solutions when the temporal alignment is exact, i.e. the rank of coefficient matrix is less than 9 (3D) or 4 (2D). The computation of the fundamental matrix or homography matrix was avoided, but compared with the method in (Caspi et al., 2006), these methods need more pairs of corresponding trajectories between the two views. Then the extreme case of non-overlapping cameras as in (Caspi et al., 2006) is difficult to solve.

In this paper, we present an approach of trajectory – to – trajectory temporal alignment, which is based on events alignment. In this method, we also need not compute fundamental matrix or homography matrix. Ideally, we only need align very few event points (the number of which is often much less than the number of frames) of a pair of corresponding trajectories. The very extreme case of non-overlapping and relative moving cameras can still be solved, which can not be solved by other known models.

An “event” here is referred to a significant change of the status of structure and scene, such as the change of illumination, the start or stop of a car and the rebound of a basketball. In this paper, we mainly consider the “events” as kinetic changes of moving objects. For instance, when a object suddenly accelerated (decelerated) moves or changes the direction of movement, it is considered to be an event. When such an event occurs, the corresponding time, which is called “event point” in this paper, will be recorded. Then align the event points of two video sequences, and video synchronization is completed. This is the main idea of the event-based video synchronization.

Section snippets

Problem formulation

Suppose S and S′ are two input image sequences, where S denotes the reference (first) sequence, and S′ denotes the second sequence. Let x˜=(x,y,t) be a spatio-temporal point in the reference sequence S (i.e. a pixel (x, y) at frame t) and let x˜=(x,y,t) be the matching spatio-temporal point in the sequence S′. The recorded scene can change dynamically, namely, it can include moving objects. The cameras can be either stationary or moving. Note there is not the restriction of fixed internal

Synchronization based on events alignment

At first we consider the basic pinhole camera model (Hartley, 2003). A point in 3D space with coordinates X = (X, Y, Z)T (in nonhomogeneous coordinate system) is mapped to the point on the image plane where a line joining the point X to the center of projection meets the image plane. By similar triangles, we can compute that the point (X, Y, Z)T is mapped to the point (x, y)T on the image plane, where x=fXZ and y=fYZ. Taking partial derivatives of them with respect to t, one can easily getxt=fXZ-fXZ

Algorithm

We next outline the video synchronization algorithm based on events alignment. Each step of the algorithm is then explained in more detail below.

  • 1.

    Specify some feature point correspondences between first frames of two videos, and denote their image coordinates as (x1, y1), (x2, y2),  , (xm, ym) and x1,y1, x2,y2,  , xm,ym, respectively. Here m is the total number of point correspondences.

  • 2.

    Construct corresponding feature trajectories, i.e., track the feature points in two videos to acquire

Experiment and discussion

We have applied our algorithm on both synthetic and real video sequences. The synthetic sequences are used to show the application of our algorithm, and more importantly, an accuracy evaluation of the proposed algorithm could be obtained, since the synchronization parameters are preassigned. Then the real applications are presented.

Conclusion

This paper has presented a method for synchronizing two sequences based on events alignment. On the basis of analyzing events detection, the changes of kinematic status of feature points are considered as events and then the event correspondences between the two cameras are recovered. An algorithm of match score to candidate correspondence is proposed to recover these correct event correspondences when there is only one pair of feature trajectories available. Finally coarsely estimate the

Acknowledgments

This research was supported by the National Natural Science Foundation under Grants 61173182, 61179071 and 60736046, the Program for New Century Excellent Talents in University under Grant NCET-08-0370, Specialized Research Fund for the Doctoral Program of Higher Education under Grant 20090181110052, Projects of International Cooperation and Exchanges, Sichuan Province under Grant 2010HH0031, the National Basic Research Program under Grant 2009CB320803.

References (32)

  • EyeVision, 2011....
  • R. Hartley

    Multiple view geometry in computer vision

    (2003)
  • N. Inamoto et al.

    Immersive observation of virtualized soccer match at real stadium model

  • N. Inamoto et al.

    Free viewpoint video synthesis and presentation from multiple sporting videos

    IEICE Trans. Inform. Syst.

    (2005)
  • T. Kasparis et al.

    Detail-preserving adaptive conditional median filters (journal paper)

    J. Electron. Imag.

    (1992)
  • J.C. Lagarias et al.

    Convergence properties of the Nelder–Mead simplex method in low dimensions

    SIAM J. Optim.

    (1998)
  • Cited by (7)

    • A computer vision-based method for bridge model updating using displacement influence lines

      2022, Engineering Structures
      Citation Excerpt :

      The video synchronization between the two IOS devices is achieved already thanks to the DoubleTake app, which allows us to do it automatically. In turn, the third camera is synchronized with the two devices based on an optical synchronization through a stroboscopic light [38,39]. This synchronization method worked well in the laboratory environment.

    • Self-Supervised Human Pose based Multi-Camera Video Synchronization

      2022, MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
    • Multiple video sequences synchronization during minimally invasive surgery

      2016, Progress in Biomedical Optics and Imaging - Proceedings of SPIE
    • Advances in urban video-based surveillance systems: A survey

      2016, Advances in Intelligent Systems and Computing
    View all citing articles on Scopus
    View full text