Elsevier

Image and Vision Computing

Volume 77, September 2018, Pages 21-32
Image and Vision Computing

Can we teach computers to understand art? Domain adaptation for enhancing deep networks capacity to de-abstract art

https://doi.org/10.1016/j.imavis.2018.06.009Get rights and content

Highlights

  • We train from scratch a deep network to reach top performance in painting genre recognition.

  • We identify the style abstraction as the main limitation while trying to improve.

  • We test the potential for improvement by transfer from various domains.

  • Normal photographs (no adaptation) are as beneficial as style transfer.

Abstract

Humans comprehend a natural scene at a single glance; painters and other visual artists, through their abstract representations, stressed this capacity to the limit. The performance of computer vision solutions matched that of humans in many problems of visual recognition. In this paper we address the problem of recognizing the genre (subject) in digitized paintings using Convolutional Neural Networks (CNN) as part of the more general dealing with abstract and/or artistic representation of scenes. Initially we establish the state of the art performance by training a CNN from scratch. In the next level of evaluation, we identify aspects that hinder the CNNs' recognition, such as artistic abstraction. Further, we test various domain adaptation methods that could enhance the subject recognition capabilities of the CNNs. The evaluation is performed on a database of 80,000 annotated digitized paintings, which is tentatively extended with artistic photographs, either original or stylized, in order to emulate artistic representations. Surprisingly, the most efficient domain adaptation is not the neural style transfer. Finally, the paper provides an experiment-based assessment of the abstraction level that CNNs are able to achieve.

Introduction

This paper aims to investigate the differences between the level of abstraction achieved by deep convolutional neural networks as compared to the human performance in the context of painting analysis. To synthesize the motivation, let us recall Pablo Picasso’s words: “There is no abstract art. You must always start with something. Afterward you can remove all traces of reality”. Art historians and enthusiasts are able to note, while recalling major artistic works through the history, that the level of abstraction steadily increased.

In parallel, in the last period, works that use computer vision techniques to analyze visual art increased with respect to both the quantity and the quality of reported results. Two trends favored these developments. First, there were consistent efforts to digitize more and more paintings, such that modern systems may learn from large databases. Two of such popular efforts are Your Paintings (now Art UK1) which contains more than 200,000 paintings tightly connected with historical British culture and WikiArt2 which contains around 100,000 paintings gathered from multiple national cultures. The databases come with multiple annotations. For this work, we are particulary interested in annotations dealing with the painting's subject or scene type. From this point of view, a more complete database is the WikiArt collection, where the labelling category is named genre. The second trend is purely technical and it deals with the development of the Deep Neural Networks, that allowed classification performances that were not imagined before. In this work, we will use the more popular Convolutional Neural Networks (CNN) to recognize the painting genre.

Let us now establish the meaning of "genre", its relation with the scene and with the image subject. A list of definitions for various paintings genres is presented in Table 1. To label a painting into a specific genre, in most of the cases, a user has to identify the subject of that painting. The exceptions are “Abstract Art”, “Design”, “Illustration” and “Sketch and Study”, where the main characteristic is related to the depiction mode. In this majority of cases, the subject is related to the scene represented in the work of art. The term “genre” is typical for art domain, and is a more general, including, concept than mere “subject” or “scene type”. In this work, while referring to paintings, we will use all three with the same meaning of “genre”. In comparison, for a non-artistic photograph, as there is no artistic intervention in the depiction mode, the subject is more related to the scene, while the genre is hard to be defined. For artistic photos, the “genre” gets meaning again.

Starting from the idea that Deep Neural Networks share similarities with the human vision [1] and the fact that such networks are already proven to be efficient in other perception-inspired areas, like object recognition or even in creating artistic images, we ask ourselves if they can pass the abstraction limit of artistic paintings and correctly recognize the scene type of such a work.

In this paper, we will first work with Residual Network (ResNet) on the standard WikiArt database so to obtain state of the art results. Afterwards, we will test different domain transfer augmentations to see if they can increase the recognition rate; also we will study if the network is capable to pass the abstraction limit and learn from different types of images that contain the same type of scenes. Furthermore, we introduce several alternatives for domain transfer to achieve a dual-task: improve the scene recognition performance and understand the abstraction capabilities of machine learning systems.

Regarding deep networks, multiple improvements have been proposed. In many situations, if the given task database is small, better performance is reachable if the network parameters are previously trained for a different task on a large database, such as ImageNet. Next, these values are updated to the given task. This is called fine-tuning and it is a case of transfer learning. As our investigation is related to a different domain transfer, we will avoid to use both of them simultaneously, in order to establish clearer conclusions. To compensate, we are relying on the recent architecture of Residual Networks (Resnet [2]) that was shown to be able to overcome the problem of vanishing gradients, reaching better accuracy for the same number of parameters, when compared to previous architectures.

This paper extends our previous works [3, 4], being mostly developed from Ref. [3], where we had initiated the discussion about the efficiency of various methods to transfer information from the photographic domain to the paintings domain, such that the recognition by CNNs of paintings genre is improved. In this paper we significantly extend the discussion, by including other transfer methods and by adding more significant results that allow crisper conclusions. In the second work ([4]), we showed that the artistic style transfer remains as efficient even if a reduced number of iterations are performed while over–imposing the style of an artistic painting and the content from a photograph onto a new image, according to the neural style transfer introduced by Gatys et al. [5].

Overall, this paper claims several contributions along a number of directions. On one direction, we investigate which aspects, comprehensible by humans, hinder the CNNs while understanding a painting genre; subsequently by means of domain transfer, we retrieve information about the internal description and the organization of the painting clusters. In order to accomplish such a task, we annotate artistic photographic images with respect to the scene type related to genres and we stylize a large corpus of photographs using different style transfer methods. All this data will be made publicly available to be used in other research works.

On a second direction, this paper is the first to objectively evaluate the efficiency of the currently popular neural style transfer methods. Currently existing solutions [[5], [6], [7], [8]] compare themselves by speed, stability within video sequences or number of transferable styles. By quantifying the improvement while adapting photographs to the painting domain, we reach a surprising conclusion, namely that they are less or at most as efficient as non-neural style transfers solutions. Evermore, a CNN finds as informative the original photographs without any style transfer applied.

The remainder of the paper is organized as follows: Section 2 presents previous relevant works, Section 3 summarizes the CNN choices made and Section 4 will discuss different aspects of painting understanding. Section 5 presents the used databases, while implementation details and results are presented in Section 6. The paper ends with discussions about the impact of the results.

Section snippets

Related work

This work investigates the capabilities of CNNs to recognize the subject of paintings as compared with the performance of humans. Thus, relevant prior work refers to solutions for object and scene recognition in paintings. As paintings are an abstraction of real images, scene recognition in photographs is also relevant. At last, we aim to adapt information from photographs to paintings by means of style transfer.

Object and scene recognition in paintings. Computer based painting analysis has

CNNs: architectures and training

Following AlexNet [19] performance in the ImageNet challenge, in the recent years we have witnessed a steep increase in popularity of Convolutional Neural Networks (CNNs) when considering the task of image classification. Especially after Donahue et al. [30] showed that ImageNet pre-trained CNNs provide very good descriptors, it is very hard to find a large-enough database where state of the art performance is not related with CNNs. As such, the bulk of our experiments revolve, in one way or

Painting understanding and domain transfer

The main task of this work is to generate results about the understanding of the machine learning systems (in our case deep CNN) grasp of art. In such a case, one needs tools to ease the comprehension of the system internal mechanisms.

For CNNs, the most popular visualization tool has been proposed by Zeiler and Fergus [32] by introducing deconvolutional layers and visualizing activations maps onto features. Attempts to visualize the CNNs, for scene recognition, using this technique, indicated

Databases

For the various experiments undertaken, several databases have been employed. These are either a collection of digitized paintings, either a collection of digital photographs. In the next paragraphs we summarize their main characteristics.

Implementation and results

The main goal of this work is to study the performance of the CNNs in recognition of a paintings' genre and, in parallel, to investigate various ways in which this performance can be increased. This includes experiments on the classification methods themselves, in order to establish both the state of the art performance and a baseline for further experimentation. Afterwards, we follow with experiments on various alterations brought to the database in the context of domain transfer.

Discussion and conclusions

In this paper, we discussed the CNN capabilities to recognize the scene (genre) in a painting. The first contribution is that we clearly showed that machine learning systems (deep CNNs) are confused by the abstraction level from art. The experiment with abstract art showed that they cannot easily generalize with respect to style and that, the more abstract a style is, the lower is the recognition rate of the genre (painting subject). In this sense, the CNN is similar with humans, who also find

Acknowledgments

The work was supported by grants of the Romanian National Authority for Scientific Research and Innovation, CNCS UEFISCDI, number PN-II-RU-TE-2014-4-0733 and respectively, CCCDI-UEFISCDI, project number 96BM. The authors would like to thank NVIDIA Corporation for donating the Tesla K40c GPU that helped run the experimental setup for this research.

References (50)

  • J. Johnson et al.

    Perceptual losses for real-time style transfer and super-resolution

  • D. Ulyanov et al.

    Texture networks: feed-forward synthesis of textures and stylized images

  • X. Huang et al.

    Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

  • A. Bentkowska-Kafel et al.

    Computer Vision and Image Analysis of Art

  • S. Agarwal et al.

    Genre and Style Based Painting Classification

  • S. Karayev et al.

    Recognizing Image Style

  • Y. Bar et al.

    Classification of Artistic Styles Using Binarized Features Derived from a Deep Neural Network

  • C. Florea et al.

    Artistic Movement Recognition by Boosted Fusion of Color Structure and Topographic Description

  • E.J. Crowley et al.

    The art of detection

  • R. Condorovici et al.

    Painting Scene Recognition Using Homogenous Shapes

  • B. Saleh et al.

    Large-scale Classification of Fine-art Paintings: Learning The Right Metric on The Right Feature

  • W.R. Tan et al.

    Ceci n’est pas une pipe: A Deep Convolutional Network for Fine-art Paintings Classification

  • A. Krizhevsky et al.

    ImageNet Classification with Deep Convolutional Neural Networks

  • M. Aubry et al.

    Painting-to-3D model alignment via discriminative visual elements

    ACM Trans. Graph.

    (2013)
  • P. Hall et al.

    Cross-depiction problem: recognition and synthesis of photographs and artwork

    Comput. Vis. Media

    (2015)
  • Cited by (8)

    View all citing articles on Scopus

    This paper has been recommended for acceptance by Yannis Panagakis.

    View full text