ABSTRACT
Recent advances in Computer Vision and Deep Learning have made possible the efficient extraction of structured information from frames of video feeds. As such, a stream of objects and their associated classes along with unique object identifiers derived via object tracking can be generated, providing unique objects as they are captured across frames. In this paper we initiate a study of temporal queries involving objects and their co-occurrences in video feeds. For example, queries that identify video segments during which the same two red cars and the same two humans appear jointly for five minutes are of interest to many applications ranging from law enforcement to security and safety. We take the first step and define such queries in a way that they incorporate certain physical aspects of video capture such as object occlusion. We present an architecture consisting of three layers, namely object detection/tracking, intermediate data generation, and query evaluation. We propose two techniques, Marked Frame Set (MFS) and Sparse State Graph (SSG), to organize all detected objects in the intermediate data generation layer, which effectively, given the queries, minimizes the number of objects and frames that have to be considered during query evaluation. We also introduce an algorithm called SSG-CM that processes incoming frames against the SSG and efficiently prunes objects and frames unrelated to query evaluation, while maintaining all states required for succinct query evaluation. We present the results of a thorough experimental evaluation utilizing both real and synthetic data, establishing the trade-offs between MFS and SSG. We stress various parameters of interest in our evaluation and demonstrate that the proposed query evaluation methodology coupled with the proposed algorithms is capable to evaluate temporal queries over video feeds efficiently, achieving orders of magnitude performance benefits.
Supplemental Material
- Favyen Bastani, Songtao He, Arjun Balasingam, Karthik Gopalakrishnan, Mohammad Alizadeh, Hari Balakrishnan, Michael Cafarella, Tim Kraska, and Sam Madden. 2020. MIRIS: Fast Object Track Queries in Video. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1907--1921.Google ScholarDigital Library
- Yun Chi, Haixun Wang, Philip S Yu, and Richard R Muntz. 2004. Moment: Maintaining closed frequent itemsets over a stream sliding window. In Fourth IEEE International Conference on Data Mining (ICDM'04). IEEE, 59--66.Google Scholar
- Nick Koudas Daren Chao and Ioannis Xarchakos. 2020. SVQGoogle Scholar
- : Querying for Object Interactions in Video streams. In Proceedings of ACM SIGMOD, Demo Track .Google Scholar
- Ross B. Girshick. 2015. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7--13, 2015. 1440--1448. https://doi.org/10.1109/ICCV.2015.169Google ScholarDigital Library
- Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23--28, 2014. 580--587. https://doi.org/10.1109/CVPR.2014.81Google ScholarDigital Library
- Ian J. Goodfellow, Yoshua Bengio, and Aaron C. Courville. 2016. Deep Learning .MIT Press. http://www.deeplearningbook.org/Google ScholarDigital Library
- Brandon Haynes, Maureen Daum, Amrita Mazumdar, Magdalena Balazinska, Alvin Cheung, and Luis Ceze. 2020. VisualWorldDB: A DBMS for the Visual World. In CIDR .Google Scholar
- Brandon Haynes, Amrita Mazumdar, Magdalena Balazinska, Luis Ceze, and Alvin Cheung. 2019. Visual Road: A Video Data Management Benchmark. In Proceedings of the 2019 International Conference on Management of Data. ACM, 972--987.Google ScholarDigital Library
- Kaiming He, Georgia Gkioxari, Piotr Dollá r, and Ross B. Girshick. 2017. Mask R-CNN. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. 2980--2988. https://doi.org/10.1109/ICCV.2017.322Google Scholar
- Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Shivaram Venkataraman, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, and Onur Mutlu. 2018. Focus: Querying Large Video Datasets with Low Latency and Low Cost. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . USENIX Association, Carlsbad, CA, 269--286. https://www.usenix.org/conference/osdi18/presentation/hsiehGoogle ScholarDigital Library
- Junchen Jiang, Ganesh Ananthanarayanan, Peter Bodik, Siddhartha Sen, and Ion Stoica. 2018. Chameleon: Scalable Adaptation of Video Analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (Budapest, Hungary) (SIGCOMM '18). Association for Computing Machinery, New York, NY, USA, 253--266. https://doi.org/10.1145/3230543.3230574Google ScholarDigital Library
- Nan Jiang and Le Gruenwald. 2006. CFI-Stream: mining closed frequent itemsets in data streams. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining . 592--597.Google ScholarDigital Library
- D. Kang, P. Bailis, and M. Zaharia. 2019 a. BlazeIT: Fast Exploratory Video Queries Using Neural Networks. In PVLDB .Google Scholar
- Daniel Kang, Peter Bailis, and Matei Zaharia. 2019 b. Challenges and Opportunities in DNN-Based Video Analytics: A Demonstration of the BlazeIt Video Query Engine. In CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 13--16, 2019, Online Proceedings . http://cidrdb.org/cidr2019/papers/p141-kang-cidr19.pdfGoogle Scholar
- Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. 2017. NoScope: Optimizing Neural Network Queries over Video at Scale. Proc. VLDB Endow. , Vol. 10, 11 (Aug. 2017), 1586--1597. https://doi.org/10.14778/3137628.3137664Google ScholarDigital Library
- Nick Koudas, Raymond Li, and Ioannis Xarchakos. 2020. Video Monitoring Queries. In Proceedings of IEEE ICDE .Google ScholarCross Ref
- Sebastian Krebs, Bharanidhar Duraisamy, and Fabian Flohr. 2017. A survey on leveraging deep neural networks for object tracking. In 20th IEEE International Conference on Intelligent Transportation Systems, ITSC 2017, Yokohama, Japan, October 16--19, 2017. 411--418. https://doi.org/10.1109/ITSC.2017.8317904Google ScholarDigital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM , Vol. 60, 6 (May 2017), 84--90. https://doi.org/10.1145/3065386Google ScholarDigital Library
- Yann LeCun, Yoshua Bengio, and Geoffrey E. Hinton. 2015. Deep learning. Nature , Vol. 521, 7553 (2015), 436--444. https://doi.org/10.1038/nature14539Google Scholar
- Yao Lu, Aakanksha Chowdhery, Srikanth Kandula, and Surajit Chaudhuri. 2018. Accelerating Machine Learning Inference with Probabilistic Predicates. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). ACM, New York, NY, USA, 1493--1508. https://doi.org/10.1145/3183713.3183751Google ScholarDigital Library
- Siwei Lyu, Ming-Ching Chang, Dawei Du, Longyin Wen, Honggang Qi, Yuezun Li, Yi Wei, Lipeng Ke, Tao Hu, Marco Del Coco, et almbox. 2017. UA-DETRAC 2017: Report of AVSS2017 & IWT4S Challenge on Advanced Traffic Monitoring. In Advanced Video and Signal Based Surveillance (AVSS), 2017 14th IEEE International Conference on. IEEE, 1--7.Google ScholarCross Ref
- Xue Mei, Haibin Ling, Yi Wu, Erik Blasch, and Li Bai. 2011. Minimum error bounded efficient ? 1 tracker with occlusion detection. In CVPR 2011. IEEE, 1257--1264.Google Scholar
- Anton Milan, Laura Leal-Taixé, Ian Reid, Stefan Roth, and Konrad Schindler. 2016. MOT16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016).Google Scholar
- Fatemeh Nori, Mahmood Deypir, and Mohamad Hadi Sadreddini. 2013. A sliding window based algorithm for frequent closed itemset mining over data streams. Journal of Systems and Software , Vol. 86, 3 (2013), 615--623.Google ScholarDigital Library
- Alex Poms, Will Crichton, Pat Hanrahan, and Kayvon Fatahalian. 2018. Scanner: Efficient Video Analysis at Scale. CoRR , Vol. abs/1805.07339 (2018). arxiv: 1805.07339 http://arxiv.org/abs/1805.07339Google ScholarDigital Library
- Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv (2018).Google Scholar
- Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. , Vol. 39, 6 (2017), 1137--1149. https://doi.org/10.1109/TPAMI.2016.2577031Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR , Vol. abs/1409.1556 (2014). arxiv: 1409.1556 http://arxiv.org/abs/1409.1556Google Scholar
- R. Urtasun. 2020. Self Driving Vehicle Technology. CVPR 2020, Tutorial (2020).Google Scholar
- Steven Euijong Whang, Hector Garcia-Molina, Chad Brower, Jayavel Shanmugasundaram, Sergei Vassilvitskii, Erik Vee, and Ramana Yerneni. 2009. Indexing boolean expressions. Proceedings of the VLDB Endowment , Vol. 2, 1 (2009), 37--48.Google ScholarDigital Library
- Nicolai Wojke, Alex Bewley, and Dietrich Paulus. 2017. Simple Online and Realtime Tracking with a Deep Association Metric. In 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 3645--3649. https://doi.org/10.1109/ICIP.2017.8296962Google ScholarDigital Library
- Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2013. Online object tracking: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2411--2418.Google ScholarDigital Library
- Ioannis Xarchakos and Nick Koudas. 2019. SVQ: Streaming Video Queries. In Proceedings of ACM SIGMOD, Demo Track .Google ScholarDigital Library
- Tiantu Xu, Luis Materon Botelho, and Felix Xiaozhu Lin. 2018. Reinventing Data Stores for Video Analytics. CoRR , Vol. abs/1810.01794 (2018). arxiv: 1810.01794 http://arxiv.org/abs/1810.01794Google Scholar
- Xingyi Zhou, Dequan Wang, and Philipp Kr"ahenbühl. 2019. Objects as points. arXiv preprint arXiv:1904.07850 (2019).Google Scholar
Index Terms
- Evaluating Temporal Queries Over Video Feeds
Recommendations
TQVS: Temporal Queries over Video Streams in Action
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataWe present TQVS, a system capable of conducting efficient evaluation of declarative temporal queries over real-time video streams. Users may issue queries to identify video clips in which the same two cars and the same three persons appear jointly in ...
SVQ++: Querying for Object Interactions in Video Streams
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataDeep neural nets enabled sophisticated information extraction out of images, including video frames. Recently, there has been interest in techniques and algorithms to enable interactive declarative query processing of objects appearing on video frames ...
Computing Complex Temporal Join Queries Efficiently
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataThis paper studies multi-way join queries over temporal data, where each tuple is associated with a valid time interval indicating when the tuple is valid. A temporal join requires that joining tuples' valid intervals intersect. Previous work on ...
Comments