Can we teach computers to understand art? Domain adaptation for enhancing deep networks capacity to de-abstract art☆
Introduction
This paper aims to investigate the differences between the level of abstraction achieved by deep convolutional neural networks as compared to the human performance in the context of painting analysis. To synthesize the motivation, let us recall Pablo Picasso’s words: “There is no abstract art. You must always start with something. Afterward you can remove all traces of reality”. Art historians and enthusiasts are able to note, while recalling major artistic works through the history, that the level of abstraction steadily increased.
In parallel, in the last period, works that use computer vision techniques to analyze visual art increased with respect to both the quantity and the quality of reported results. Two trends favored these developments. First, there were consistent efforts to digitize more and more paintings, such that modern systems may learn from large databases. Two of such popular efforts are Your Paintings (now Art UK1) which contains more than 200,000 paintings tightly connected with historical British culture and WikiArt2 which contains around 100,000 paintings gathered from multiple national cultures. The databases come with multiple annotations. For this work, we are particulary interested in annotations dealing with the painting's subject or scene type. From this point of view, a more complete database is the WikiArt collection, where the labelling category is named genre. The second trend is purely technical and it deals with the development of the Deep Neural Networks, that allowed classification performances that were not imagined before. In this work, we will use the more popular Convolutional Neural Networks (CNN) to recognize the painting genre.
Let us now establish the meaning of "genre", its relation with the scene and with the image subject. A list of definitions for various paintings genres is presented in Table 1. To label a painting into a specific genre, in most of the cases, a user has to identify the subject of that painting. The exceptions are “Abstract Art”, “Design”, “Illustration” and “Sketch and Study”, where the main characteristic is related to the depiction mode. In this majority of cases, the subject is related to the scene represented in the work of art. The term “genre” is typical for art domain, and is a more general, including, concept than mere “subject” or “scene type”. In this work, while referring to paintings, we will use all three with the same meaning of “genre”. In comparison, for a non-artistic photograph, as there is no artistic intervention in the depiction mode, the subject is more related to the scene, while the genre is hard to be defined. For artistic photos, the “genre” gets meaning again.
Starting from the idea that Deep Neural Networks share similarities with the human vision [1] and the fact that such networks are already proven to be efficient in other perception-inspired areas, like object recognition or even in creating artistic images, we ask ourselves if they can pass the abstraction limit of artistic paintings and correctly recognize the scene type of such a work.
In this paper, we will first work with Residual Network (ResNet) on the standard WikiArt database so to obtain state of the art results. Afterwards, we will test different domain transfer augmentations to see if they can increase the recognition rate; also we will study if the network is capable to pass the abstraction limit and learn from different types of images that contain the same type of scenes. Furthermore, we introduce several alternatives for domain transfer to achieve a dual-task: improve the scene recognition performance and understand the abstraction capabilities of machine learning systems.
Regarding deep networks, multiple improvements have been proposed. In many situations, if the given task database is small, better performance is reachable if the network parameters are previously trained for a different task on a large database, such as ImageNet. Next, these values are updated to the given task. This is called fine-tuning and it is a case of transfer learning. As our investigation is related to a different domain transfer, we will avoid to use both of them simultaneously, in order to establish clearer conclusions. To compensate, we are relying on the recent architecture of Residual Networks (Resnet [2]) that was shown to be able to overcome the problem of vanishing gradients, reaching better accuracy for the same number of parameters, when compared to previous architectures.
This paper extends our previous works [3, 4], being mostly developed from Ref. [3], where we had initiated the discussion about the efficiency of various methods to transfer information from the photographic domain to the paintings domain, such that the recognition by CNNs of paintings genre is improved. In this paper we significantly extend the discussion, by including other transfer methods and by adding more significant results that allow crisper conclusions. In the second work ([4]), we showed that the artistic style transfer remains as efficient even if a reduced number of iterations are performed while over–imposing the style of an artistic painting and the content from a photograph onto a new image, according to the neural style transfer introduced by Gatys et al. [5].
Overall, this paper claims several contributions along a number of directions. On one direction, we investigate which aspects, comprehensible by humans, hinder the CNNs while understanding a painting genre; subsequently by means of domain transfer, we retrieve information about the internal description and the organization of the painting clusters. In order to accomplish such a task, we annotate artistic photographic images with respect to the scene type related to genres and we stylize a large corpus of photographs using different style transfer methods. All this data will be made publicly available to be used in other research works.
On a second direction, this paper is the first to objectively evaluate the efficiency of the currently popular neural style transfer methods. Currently existing solutions [[5], [6], [7], [8]] compare themselves by speed, stability within video sequences or number of transferable styles. By quantifying the improvement while adapting photographs to the painting domain, we reach a surprising conclusion, namely that they are less or at most as efficient as non-neural style transfers solutions. Evermore, a CNN finds as informative the original photographs without any style transfer applied.
The remainder of the paper is organized as follows: Section 2 presents previous relevant works, Section 3 summarizes the CNN choices made and Section 4 will discuss different aspects of painting understanding. Section 5 presents the used databases, while implementation details and results are presented in Section 6. The paper ends with discussions about the impact of the results.
Section snippets
Related work
This work investigates the capabilities of CNNs to recognize the subject of paintings as compared with the performance of humans. Thus, relevant prior work refers to solutions for object and scene recognition in paintings. As paintings are an abstraction of real images, scene recognition in photographs is also relevant. At last, we aim to adapt information from photographs to paintings by means of style transfer.
Object and scene recognition in paintings. Computer based painting analysis has
CNNs: architectures and training
Following AlexNet [19] performance in the ImageNet challenge, in the recent years we have witnessed a steep increase in popularity of Convolutional Neural Networks (CNNs) when considering the task of image classification. Especially after Donahue et al. [30] showed that ImageNet pre-trained CNNs provide very good descriptors, it is very hard to find a large-enough database where state of the art performance is not related with CNNs. As such, the bulk of our experiments revolve, in one way or
Painting understanding and domain transfer
The main task of this work is to generate results about the understanding of the machine learning systems (in our case deep CNN) grasp of art. In such a case, one needs tools to ease the comprehension of the system internal mechanisms.
For CNNs, the most popular visualization tool has been proposed by Zeiler and Fergus [32] by introducing deconvolutional layers and visualizing activations maps onto features. Attempts to visualize the CNNs, for scene recognition, using this technique, indicated
Databases
For the various experiments undertaken, several databases have been employed. These are either a collection of digitized paintings, either a collection of digital photographs. In the next paragraphs we summarize their main characteristics.
Implementation and results
The main goal of this work is to study the performance of the CNNs in recognition of a paintings' genre and, in parallel, to investigate various ways in which this performance can be increased. This includes experiments on the classification methods themselves, in order to establish both the state of the art performance and a baseline for further experimentation. Afterwards, we follow with experiments on various alterations brought to the database in the context of domain transfer.
Discussion and conclusions
In this paper, we discussed the CNN capabilities to recognize the scene (genre) in a painting. The first contribution is that we clearly showed that machine learning systems (deep CNNs) are confused by the abstraction level from art. The experiment with abstract art showed that they cannot easily generalize with respect to style and that, the more abstract a style is, the lower is the recognition rate of the genre (painting subject). In this sense, the CNN is similar with humans, who also find
Acknowledgments
The work was supported by grants of the Romanian National Authority for Scientific Research and Innovation, CNCS UEFISCDI, number PN-II-RU-TE-2014-4-0733 and respectively, CCCDI-UEFISCDI, project number 96BM. The authors would like to thank NVIDIA Corporation for donating the Tesla K40c GPU that helped run the experimental setup for this research.
References (50)
- et al.
Morphological analysis for investigating artistic images
Image Vis. Comput.
(2014) Neural structures and mechanisms involved in scene recognition: a review and interpretation
Neuropsychologia
(2011)- et al.
How does the brain solve visual object recognition?
Neuron
(2012) - et al.
Transfer learning using computational intelligence: a survey
Knowl.-Based Syst.
(2015) Negative results in computer vision: a perspective
Image Vis. Comput.
(2018)- et al.
Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence
Sci. Rep.
(2016) - et al.
Deep residual learning for image recognition
- et al.
Domain Transfer for Delving into Deep Networks Capacity to De-abstract Art
- et al.
Efficient domain adaptation for painting theme recognition
- et al.
Image style transfer using convolutional neural networks
Perceptual losses for real-time style transfer and super-resolution
Texture networks: feed-forward synthesis of textures and stylized images
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
Computer Vision and Image Analysis of Art
Genre and Style Based Painting Classification
Recognizing Image Style
Classification of Artistic Styles Using Binarized Features Derived from a Deep Neural Network
Artistic Movement Recognition by Boosted Fusion of Color Structure and Topographic Description
The art of detection
Painting Scene Recognition Using Homogenous Shapes
Large-scale Classification of Fine-art Paintings: Learning The Right Metric on The Right Feature
Ceci n’est pas une pipe: A Deep Convolutional Network for Fine-art Paintings Classification
ImageNet Classification with Deep Convolutional Neural Networks
Painting-to-3D model alignment via discriminative visual elements
ACM Trans. Graph.
Cross-depiction problem: recognition and synthesis of photographs and artwork
Comput. Vis. Media
Cited by (8)
On Computational Complexity of Transfer Learning Approaches in Facial Analysis
2022, Smart Innovation, Systems and TechnologiesDiscussing the Aesthetic Emotion of Artworks by AI and Human Artists with the Mediating Variable of Aesthetic Fluency
2021, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Demographic Influences on Contemporary Art with Unsupervised Style Embeddings
2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Discussion on the Aesthetic Experience of Artificial Intelligence Creation and Human Art Creation
2020, Advances in Intelligent Systems and Computing
- ☆
This paper has been recommended for acceptance by Yannis Panagakis.