Elsevier

Forensic Science International

Volume 268, November 2016, Pages 46-61
Forensic Science International

Pornography classification: The hidden clues in video space–time

https://doi.org/10.1016/j.forsciint.2016.09.010Get rights and content

Highlights

  • Temporal Robust Features (TRoF) are proposed for pornographic motion description.

  • The Pornography-2k dataset is introduced as a new benchmark for video porn detection.

  • The present BoVW- and TRoF-relied solution achieves more than 90% of detection accuracy.

  • Such solution surpasses commercial detectors and works on par with the literature.

  • Particularly, the present solution can be implemented to meet real-time requirements.

Abstract

As web technologies and social networks become part of the general public's life, the problem of automatically detecting pornography is into every parent's mind — nobody feels completely safe when their children go online. In this paper, we focus on video-pornography classification, a hard problem in which traditional methods often employ still-image techniques — labeling frames individually prior to a global decision. Frame-based approaches, however, ignore significant cogent information brought by motion. Here, we introduce a space-temporal interest point detector and descriptor called Temporal Robust Features (TRoF). TRoF was custom-tailored for efficient (low processing time and memory footprint) and effective (high classification accuracy and low false negative rate) motion description, particularly suited to the task at hand. We aggregate local information extracted by TRoF into a mid-level representation using Fisher Vectors, the state-of-the-art model of Bags of Visual Words (BoVW). We evaluate our original strategy, contrasting it both to commercial pornography detection solutions, and to BoVW solutions based upon other space-temporal features from the scientific literature. The performance is assessed using the Pornography-2k dataset, a new challenging pornographic benchmark, comprising 2000 web videos and 140 h of video footage. The dataset is also a contribution of this work and is very assorted, including both professional and amateur content, and it depicts several genres of pornography, from cartoon to live action, with diverse behavior and ethnicity. The best approach, based on a dense application of TRoF, yields a classification error reduction of almost 79% when compared to the best commercial classifier. A sparse description relying on TRoF detector is also noteworthy, for yielding a classification error reduction of over 69%, with 19× less memory footprint than the dense solution, and yet can also be implemented to meet real-time requirements.

Introduction

Pornography diffusion over the Internet has systematically increased in recent years [1]. This poses a challenge as web technologies reach broader uses and audiences, since pornographic content is unwelcome in many contexts, especially where underage viewers are concerned.

The need for regulating the diffusion of Internet pornography clashes with the international, distributed, and large-scale nature of the web. Trying to regulate the diffusion from the side of creators and distributors is a sisyphean task. Regulation from the consumer side, in the form of content filtering is more promising, and thus is employed by governments, companies, tutors, and parents against inappropriate access to pornography. If we are to meet the daunting growth rates of content creation, this content-filtering has to be automated.

From the point of view of health and social sciences, the understanding of the impacts of pornography production and consumption on society is still incipient and understudied [2], with inconclusive results [3]. Regardless of that, some modalities of porn are illegal, with child pornography being the obvious case in most countries [4]. Because of that, pornography detection receives growing attention in Law enforcement and Forensic activities. Besides the selection of relevant material for attaching to legal dossiers, detecting pornographic files (i.e., the fast filtering of pornographic content among millions of files) at crime scenes brings great benefits, including the immediate arrest of criminals. Furthermore, once all porn-related files are singled out, we can employ additional techniques such as the ones involving face detection and recognition, child-pornography detection, age estimation, etc., for further selecting videos of higher importance for an investigation. The method could be used directly on servers for monitoring, during search-and-seizure for proper confiscation of suspected materials and equipments, or even in police premises to quickly glean over apprehended hard disks, thus decreasing the amount of human police resources currently put into place for this type of analysis.

Most conventional, commercially available, content-filtering solutions regulate the access to pornographic content by blacklisting URLs and looking at metadata (keywords in file names and descriptions, parental advisory metadata, etc.). In contrast, analyzing the visual information itself is mandatory to robust pornography filtering, since the visual information, contrarily to meta-information, is much more difficult to conceal. Therefore, a few off-the-shelf solutions include visual-content analysis in their features [5], [6], [7], [8]. However, according to the experimental results we report in this paper, those tools are yet far from being effective.

In the literature, the first efforts for pornography detection conservatively associated pornography to nudity. Since then, plenty of solutions have been proposed, aiming at identifying nude people by the means of skin detection [9], [10], [11], [12], [13], [14], [15]. Notwithstanding, those strategies suffer from high rates of false positives in situations of non-pornographic body exposure (e.g., swimming, sunbathing, baby breastfeeding, etc.).

In contrast to nudity detection, in the scope of this work, we want to classify pornography as “any explicit sexual matter with the purpose of eliciting arousal” [1]. In such vein, the current state of the art of pornography classification relies on Bags-of-Visual-Words (BoVW)-based strategies, to reduce the semantic gap between the low-level visual data representation (e.g., pixels), and the high-level target concept of pornography [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29]. However, it is still very common to extend the still-image solutions to video, by labeling the frames independently, and then thresholding the quantity of sensitive samples [20], [21], [22], [23], [24]. That strategy misses opportunities because motion pictures offer extra space–time information, where one can look for additional features. Motion information for example, can be very revealing about the presence of pornographic content. Thus, in this work, we aim at taking a step further by incorporating temporal information to the task of video pornography classification, in a pursuit of more effective and efficient solutions.

This paper proposes an end-to-end BoVW-based framework of video-pornography classification, allowing to incorporate temporal information in different ways, according to different choices of low-level time-aware local descriptors — e.g., Space Temporal Interest Points (STIP) [30], or Dense Trajectories [31] — to BoVW-based mid-level representations for the entire video footage. To perform experiments and validation, we introduce the Pornography-2k dataset, a new challenging pornographic benchmark that comprises 2000 web videos, available upon request and the sign of a proper responsibility agreement.

Additionally, we introduce Temporal Robust Features (TRoF), a novel space-temporal interest point detector and descriptor, which provides a speed compatible with real-time video processing and presents low-memory footprint. TRoF yields essentially the same classification accuracy of Dense Trajectories [31] — the current state-of-the-art space-temporal video descriptor — with 50× less memory footprint.

We organize the remainder of this paper into six sections. In Section 2 we explore related work, while in Section 3 we present the proposed framework to classify video pornography. In Section 4 we introduce TRoF, while in Section 5 we explain the experimental setup. In turn, in Section 6 we discuss the obtained results and, finally, in Section 7 we conclude the paper and elaborate on future work.

Section snippets

Related work

In this section, we survey some of the literature on the pornography detection approaches, focusing on BoVW-based approaches and relevant nudity classifiers. Table 1 summarizes these solutions. In addition, we explore commercial tools that block web sites or scan computers for pornographic content.

The first efforts to detect pornography conservatively associated pornography with nudity, where the solutions tried to identify nude or scantily-clad people [9], [11], [10], [12], [13], [14], [15].

Proposed framework

Web filters and scan-based software play an important role in preventing pornography from reaching unintended or inappropriate audiences, in particular, underage viewers. However, as stated in Section 2, the vast majority of those solutions and a great number of the published methods are based on human skin detection in images. Besides suffering from high rates of false positives, there are plenty of pornographic materials where very little skin is exposed, (from pornographic cartoons, e.g.,

Temporal Robust Features (TRoF)

Local space-temporal features are a successful representation for action recognition [38], [31]. Nevertheless, one important factor deterring the consideration of these features for real-time applications is the high computational cost, regarding both memory footprint and computational time.

To solve this problem, we propose a fast space-temporal video approach, with low-memory footprint, which can be performed on limited hardware, such as mobile devices. To deal with the memory usage issue, we

Experimental setup

This section describes the experimental setup, including the parametric values used for each approach. First, it is worth mentioning that previous work in the pornography classification literature presented a limited validation, with no standard datasets or metrics, except for the published methods in [20], [19], [21], [17], [18], which used the Pornography-800 dataset [21] with 800 videos. Hence, aiming at providing a standard validation benchmark, we augmented that dataset to 2000 videos,

Experiments and validation

This section evaluates the performance of different methods on the Pornography-2k dataset. The results are compared in Table 3. We report the accuracy rate (ACC) and the F2 measure (F2), on a 5 × 2-fold cross-validation protocol. Additionally, we report the true positive (TPR) and true negative (TNR) rates, to give the reader a broader view of the classification results.

As one might observe, the BoVW-based approaches remarkably outperform the third-party solutions. Not surprisingly, the

Conclusion and future work

In this work, we proposed a BoVW-based framework for video-pornography classification, novel both in the low and mid-level stages.

In the low-level stage we have introduced TRoF, a new space-temporal interest point detector and local video descriptor, that quickly detects an optimized amount of interest points, allowing us to sparsely describe the video space–time in a very fast way.

In addition, to the best of our knowledge, it was the first time that the dense application of STIP [30] and Dense

Acknowledgements

Part of the results presented in this paper were obtained through the project “Sensitive Media Analysis”, sponsored by Samsung Eletrônica da Amazônia Ltda., in the framework of law No. 8,248/91. We also thank the financial support of the Brazilian Council for Scientific and Technological Development — CNPq (Grants #477662/2013-7, #304472/2015-8), the São Paulo Research Foundation — Fapesp (DéjàVu Grant #2015/19222-9), and the Coordination for the Improvement of Higher Level Education Personnel

References (56)

  • W. Fisher et al.

    Pornography, sex crime, and paraphilia

    Curr. Psychiatry Rep.

    (2013)
  • Child Pornography: Model Legislation & Global Review....
  • Media Detective....
  • Snitch Plus....
  • PornSeer Pro....
  • M. Polastro et al.

    Nudetective: a forensic tool to help combat child pornography through automatic nudity detection

  • C. Platzer et al.

    Skin Sheriff: a machine learning solution for detecting explicit images

  • M. Jones et al.

    Statistical color models with application to skin detection

    Int. J. Comput. Vis.)

    (2002)
  • D. Forsyth et al.

    Automatic detection of human nudes

    Int. J. Comput. Vis.

    (1999)
  • D. Forsyth et al.

    Body plans

  • M. Fleck et al.

    Finding naked people

  • F. Souza et al.

    An evaluation on color invariant based local spatiotemporal features for action recognition

  • E. Valle et al.

    Content-based filtering for video sharing social networks

  • C. Caetano et al.

    Pornography detection using BossaNova video descriptor

  • C. Caetano et al.

    Representing local binary descriptors with BossaNova for visual recognition

  • S. Avila et al.

    BOSSA: extended bow formalism for image classification

  • C. Jansohn et al.

    Detecting pornographic video content by combining image features with motion information

  • A. Lopes et al.

    Nude detection in video using bag-of-visual-features

  • Cited by (0)

    View full text