Abstract
In this paper, we propose a generative inpainting-based method to detect anomalous images in human monitoring via self-supervised multi-task learning. Our previous methods, where a deep captioning model is employed to find salient regions in an image and exploit caption information for each of them, detect anomalies in human monitoring at region level by considering the relations of overlapping regions. Here, we focus on image-level detection, which is preferable when humans prefer an immediate alert and handle them by themselves. However, in such a setting, the methods could show their deficiencies due to their reliance on the salient regions and their neglect of non-overlapping regions. Moreover, they take all regions equally important, which causes the performance to be easily influenced by unimportant regions. To alleviate these problems in image-level detection, we first employ inpainting techniques with a designed local and global loss to better capture the relation between a region and its surrounding area in an image. Then, we propose an attention-based Gaussian weighting anomaly score to combine all the regions by considering their importance for mitigating the influences of unimportant regions. The attention mechanism exploits multi-task learning for higher accuracy. Extensive experiments on two real-world datasets demonstrate the superiority of our method in terms of AUROC, precision, and recall over the baselines. The AUROC has improved from 0.933 to 0.989 and from 0.911 to 0.953 compared with the best baseline on the two datasets.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Notes
Refer to https://github.com/Vious/LBAM_Pytorch/generateMask.py for more details in generating partial masks.
We adopted the standard implementation from their public code, which is available at: https://github.com/jcjohnson/densecap.
For simplicity, we use such expressions to represent the setting of a layer. e.g., K5S2P1 represents the kernel size is 5, the stride size is 2, and the padding size is 1.
We did not use their real-time detection on an autonomous robot.
Their first target was anomalous image region detection.
References
Akcay, S., Atapour-Abarghouei, A., & Breckon, T.P. (2018). Ganomaly: Semi-supervised anomaly detection via Adversarial training. In Asian conference on computer vision, ACCV (pp. 622–637). https://doi.org/10.1007/978-3-030-20893-6_39
Akcay, S., Atapour-Abarghouei, A., & Breckon, T. P. (2019). Skip-Ganomaly: Skip connected and adversarially trained encoder-decoder anomaly detection. In International joint conference on neural network, IJCNN (pp. 1–8). https://doi.org/10.1109/IJCNN.2019.8851808
Arjovsky, M., & Bottou, L. (2017). Towards principled methods for training generative adversarial networks. In International conference on learning representations, ICLR.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 1–58. https://doi.org/10.1145/1541880.1541882.
Chen, T., Zhai, X., Ritter, M., & et al. (2019). Self-supervised GANs via auxiliary rotation loss. In Conference on computer vision and pattern recognition, CVPR (pp. 12154–12163). https://doi.org/10.1109/CVPR.2019.01243
Choi, M.J., Torralba, A., & Willsky, A.S. (2012). Context models and out-of-context objects. Pattern Recognition Letters, 33(7), 853–862. https://doi.org/10.1016/j.patrec.2011.12.004.
Deguchi, Y., Takayama, D., Takano, S., & et al. (2017). Skeleton clustering by multi-robot monitoring for fall risk discovery. Journal of Intelligent Information Systems, 48(1), 75–115. https://doi.org/10.1007/s10844-015-0392-1.
Dong, N., Hatae, Y., Fadjrimiratno, M.F., & et al. (2020). Experimental evaluation of GAN-based one-class anomaly detection on office monitoring. In International symposium on methodologies for intelligent systems, ISMIS (pp. 214–224). https://doi.org/10.1007/978-3-030-59491-6_20
Dong, N., & Suzuki, E. (2021). GIAD: Generative inpainting-based anomaly detection via self-supervised learning for human monitoring. In Pacific Rim international conference on artificial intelligence, PRICAI, Part II (pp. 418–432). https://doi.org/10.1007/978-3-030-89363-7_32
Esterwood, C., & Robert, L.P. (2020). Personality in healthcare human robot interaction (H-HRI) a literature review and brief critique. In International conference on human-agent interaction, HAI (pp. 87–95). https://doi.org/10.1145/3406499.3415075
Fadjrimiratno, M.F., Hatae, Y., Matsukawa, T., & et al. (2021). Detecting anomalies from human activities by an autonomous mobile robot based on “Fast and Slow” thinking. In International joint conference on computer vision, imaging and computer graphics theory and applications, VISIGRAPP, Subvolume for VISAPP (Vol. 5 pp. 943–953). https://doi.org/10.5220/0010313509430953
Gidaris, S., Singh, P., & Komodakis, N. (2018). Unsupervised representation learning by predicting image rotations. In International conference on learning representations, ICLR.
Godard, C., Mac Aodha, O., & Brostow, G.J. (2017). Unsupervised monocular depth estimation with left-right consistency. In Conference on Computer Vision and Pattern Recognition, CVPR (pp. 270–279). https://doi.org/10.1109/CVPR.2017.699
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., & et al. (2014). Generative adversarial nets. In Neural information processing systems, NIPS (pp. 2672–2680).
Hatae, Y., Yang, Q., Fadjrimiratno, M.F., & et al. (2020). Detecting anomalous regions from an image based on deep captioning. In International joint conference on computer vision, imaging and computer graphics theory and applications, VISIGRAPP, Subvolume for VISAPP (Vol. 5 pp. 326–335). https://doi.org/10.5220/0008949603260335
Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). Densecap: Fully convolutional localization networks for dense captioning. In Conference on computer vision and pattern recognition, CVPR (pp. 4565–4574). https://doi.org/10.1109/CVPR.2016.494
Kahneman, D. (2011). Thinking, fast and slow. New York: Macmillan.
Kimura, D., Chaudhury, S., Narita, M., & et al. (2020). Adversarial discriminative attention for robust anomaly detection. In Winter conference on applications of computer vision, WACV (pp. 2172–2181). https://doi.org/10.1109/WACV45572.2020.9093428
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations, ICLR.
Krishna, R., Zhu, Y., Groth, O., & et al. (2017). Visual Genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1), 32–73. https://doi.org/10.1007/s11263-016-0981-7.
Lawson, W., Bekele, E., & Sullivan, K. (2017). Finding anomalies with generative adversarial networks for a Patrolbot. In Conference on computer vision and pattern recognition, CVPR Workshops (pp. 12–13). https://doi.org/10.1109/CVPRW.2017.68
Li, C.-L., Sohn, K., Yoon, J., & et al. (2021). CutPaste: Self-supervised learning for anomaly detection and localization. In Conference on computer vision and pattern recognition, CVPR (pp. 9664–9674).
Liu, H., & Hoeber, O. (2011). A Luhn-inspired vector re-weighting approach for improving personalized web search. In International conferences on web intelligence and intelligent agent technology (pp. 301–305). https://doi.org/10.1109/WI-IAT.2011.130
Liu, Z., Nie, Y., Long, C., & et al. (2021). A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction. In International conference on computer vision, ICCV (pp. 13588–13597). https://doi.org/10.1109/ICCV48922.2021.01333
Liu, G., Reda, F.A., Shih, K.J., & et al. (2018). Image inpainting for irregular holes using partial convolutions. In European conference on computer vision, ECCV (pp. 85–100). https://doi.org/10.1007/978-3-030-01252-6_6
Liu, G., Zhang, Q., Cao, Y., & et al. (2021). Online human action recognition with spatial and temporal skeleton features using a distributed camera network. International Journal of Intelligent Systems, 36(12), 7389–7411. https://doi.org/10.1002/int.22591.
Miyato, T., Kataoka, T., Koyama, M., & et al. (2018). Spectral normalization for generative adversarial networks. In International conference on learning representations, ICLR.
Nguyen, B., Feldman, A., Bethapudi, S., & et al. (2021). Unsupervised region-based anomaly detection in brain MRI with adversarial image inpainting. In International symposium on biomedical imaging, ISBI (pp. 1127–1131). https://doi.org/10.1109/ISBI48211.2021.9434115
Oh, J., Kim, H.-I., & Park, R.-H. (2017). Context-based abnormal object detection using the fully-connected conditional random fields. Pattern Recognition Letters, 98, 16–25. https://doi.org/10.1016/j.patrec.2017.08.003.
Pang, G., Shen, C., Cao, L., & et al. (2021). Deep learning for anomaly detection: A review. ACM Computing Surveys, 54(2), 1–38. https://doi.org/10.1145/3439950.
Pathak, D., Krahenbuhl, P., Donahue, J., & et al. (2016). Context encoders: Feature learning by inpainting. In Conference on computer vision and pattern recognition, CVPR (pp. 2536–2544). https://doi.org/10.1109/CVPR.2016.278
Ravanbakhsh, M., Nabi, M., Sangineto, E., & et al. (2017). Abnormal event detection in videos using generative adversarial nets. In International conference on image processing, ICIP (pp. 1577–1581). https://doi.org/10.1109/ICIP.2017.8296547
Ravanbakhsh, M., Sangineto, E., Nabi, M., & et al. (2019). Training adversarial discriminators for cross-channel abnormal event detection in crowds. In IEEE winter conference on applications of computer vision, WACV (pp. 1896–1904). https://doi.org/10.1109/WACV.2019.00206
Sabokrou, M., Khalooei, M., Fathy, M., & et al. (2018). Adversarially learned one-class classifier for novelty detection. In Conference on computer vision and pattern recognition, CVPR (pp. 3379–3388). https://doi.org/10.1109/CVPR.2018.00356
Schlegl, T., Seeböck, P., Waldstein, S.M., & et al. (2017). Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Information processing in medical imaging, IPMI (pp. 146–157). https://doi.org/10.1007/978-3-319-59050-9_12
Schlegl, T., Seeböck, P., Waldstein, S.M., & et al. (2019). F-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. Medical Image Analysis, 54, 30–44. https://doi.org/10.1016/j.media.2019.01.010.
Selvaraju, R.R., Cogswell, M., Das, A., & et al (2017). Grad-Cam: Visual explanations from deep networks via gradient-based localization. In International conference on computer vision, ICCV (pp. 618–626). https://doi.org/10.1109/ICCV.2017.74
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations, ICLR.
Singh, M., Mandal, M.K., & Basu, A. (2005). Gaussian and Laplacian of gaussian weighting functions for robust feature based tracking. Pattern Recognition Letters, 26(13), 1995–2005. https://doi.org/10.1016/j.patrec.2005.03.015.
Sultani, W., Chen, C., & Shah, M. (2018). Real-world anomaly detection in surveillance videos. In Conference on computer vision and pattern recognition, CVPR (pp. 6479–6488). https://doi.org/10.1109/CVPR.2018.00678
Wang, Z., Bovik, A.C., Sheikh, H.R., & et al. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://doi.org/10.1109/TIP.2003.819861.
Xie, C., Liu, S., Li, C., & et al. (2019). Image inpainting with learnable bidirectional attention maps. In International conference on computer vision, ICCV (pp. 8858–8867). https://doi.org/10.1109/ICCV.2019.00895
Yu, J., Lin, Z., Yang, J., & et al. (2018). Generative image inpainting with contextual attention. In Conference on computer vision and pattern recognition, CVPR (pp. 5505–5514). https://doi.org/10.1109/CVPR.2018.00577
Yu, J., Lin, Z., Yang, J., & et al. (2019). Free-form image inpainting with gated convolution. In International conference on computer vision, ICCV (pp. 4471–4480). https://doi.org/10.1109/ICCV.2019.00457
Zaheer, M. Z., Lee, J.-H., Astrid, M., & et al. (2020). Old is Gold: Redefining the adversarially learned one-class classifier training paradigm. In Conference on computer vision and pattern recognition, CVP (pp. 14183–14193). https://doi.org/10.1109/CVPR42600.2020.01419
Zavrtanik, V., Kristan, M., & Skočaj, D. (2021). Reconstruction by inpainting for visual anomaly detection. Pattern Recognition, 112(107706). https://doi.org/10.1016/j.patcog.2020.107706.
Zhang, Y., Bai, Y., Ding, M., & et al. (2020). Multi-task generative adversarial network for detecting small objects in the wild. International Journal of Computer Vision, 128(6), 1810–1828. https://doi.org/10.1007/s11263-020-01301-6.
Zhang, K., Fadjrimiratno, M.F., & Suzuki, E. (2021). Context-based anomaly detection via spatial attributed graphs in human monitoring. In International conference on neural information processing, ICONIP (pp. 450–463). https://doi.org/10.1007/978-3-030-92185-9_37
Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: An efficient data clustering method for very large databases. ACM Sigmod Record, 25(2), 103–114. https://doi.org/10.1145/233269.233324.
Zhang, M., Tseng, C., & Kreiman, G. (2020). Putting visual object recognition in context. In Conference on computer vision and pattern recognition, CVPR (pp. 12985–12994). https://doi.org/10.1109/CVPR42600.2020.01300
Zhao, H., Gallo, O., Frosio, I., & et al. (2017). Loss functions for image restoration with neural networks. IEEE IEEE Transactions on Computational Imaging, 3(1), 47–57. https://doi.org/10.1109/TCI.2016.2644865.
Acknowledgements
This work was partially supported by Japan Science and Technology Agency (JST) SPRING, Grant Number JPMJSP2136.
Funding
Japan Science and Technology Agency (JST) SPRING, Grant Number JPMJSP2136.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of Interests
No known competing financial interests or personal relationships have appeared to influence the work reported in this paper.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dong, N., Suzuki, E. GIAD-ST: Detecting anomalies in human monitoring based on generative inpainting via self-supervised multi-task learning. J Intell Inf Syst 59, 733–754 (2022). https://doi.org/10.1007/s10844-022-00722-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-022-00722-8