Abstract
One transfer learning approach that has gained a wide popularity lately is attribute-based zero-shot learning. Its goal is to learn novel classes that were never seen during the training stage. The classical route towards realizing this goal is to incorporate a prior knowledge, in the form of a semantic embedding of classes, and to learn to predict classes indirectly via their semantic attributes. Despite the amount of research devoted to this subject lately, no known algorithm has yet reported a predictive accuracy that could exceed the accuracy of supervised learning with very few training examples. For instance, the direct attribute prediction (DAP) algorithm, which forms a standard baseline for the task, is known to be as accurate as supervised learning when as few as two examples from each hidden class are used for training on some popular benchmark datasets! In this paper, we argue that this lack of significant results in the literature is not a coincidence; attribute-based zero-shot learning is fundamentally an ill-posed strategy. The key insight is the observation that the mechanical task of predicting an attribute is, in fact, quite different from the epistemological task of learning the “correct meaning” of the attribute itself. This renders attribute-based zero-shot learning fundamentally ill-posed. In more precise mathematical terms, attribute-based zero-shot learning is equivalent to the mirage goal of learning with respect to one distribution of instances, with the hope of being able to predict with respect to any arbitrary distribution. We demonstrate this overlooked fact on some synthetic and real datasets. The data and software related to this paper are available at https://mine.kaust.edu.sa/Pages/zero-shot-learning.aspx.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
Humans are capable of learning new concepts using a few empirical observations. This remarkable ability is arguably accomplished via transfer learning techniques, such as the bootstrapping learning strategy, where agents learn simple tasks first before tackling more complex activities [22]. For instance, humans begin to cruise and crawl before they learn how to walk. Learning to cruise and crawl allows infants to improve their locomotion skills, body balance, control of limbs, and the perception of depth, all of which are crucial pre-requisites for learning the more complex activity of walking [11, 16].
In many machine learning applications, a similar transfer learning strategy is desired when labeled examples are difficult to obtain that can faithfully represent the entire target set \(\mathcal {Y}\). This is often the case, for example, in image classification and in neural image decoding [14, 17]. The transfer learning strategy typically employed in this setting is either called few-shot, one-shot, or zero-shot learning, depending on how many labeled examples are available during the training stage [10, 14]. Here, a desired target set \(\mathcal {Y}\) (classes) is learned indirectly by learning semantic attributes instead. These attributes are, then, used to predict the classes in \(\mathcal {Y}\).
The motivation behind the attribute-based learning approach with scarce data is close in spirit to the rationale of the bootstrapping learning strategy. In brief terms, it helps to learn simple tasks first before attempting to learn more complex activities. In the context of classification, semantic attributes are (chosen to be) abundant, where a single attribute spans multiple classes. Hence, labeled examples for the semantic attributes are more plentiful, which makes the task of predicting attributes relatively easy. Moreover, the target set \(\mathcal {Y}\) is embedded in the space of semantic attributes, a.k.a. the semantic space, which makes it possible, perhaps, to predict classes that were rarely seen, if ever, during the training stage.
In this paper, we focus on the attribute-based zero-shot learning setting, where a finite number of semantic attributes is used to predict novel classes that were never seen during the training stage. More formally [14]:
Definition 1
(Attribute-Based Zero-Shot Setting). In the attribute-based zero-shot setting, we have an instance space \(\mathcal {X}\), a semantic space \(\mathcal {A}\), and a target set \(\mathcal {Y}\), where \(|\mathcal {A}|<\infty \) and \(|\mathcal {Y}|<\infty \). A sample S comprises of m examples \(\{(X_i, Y_i, A_i)\}_{i=1,\ldots , m}\), with \(X_i\in \mathcal {X}\), \(Y_i\in \mathcal {Y}\), and \(A_i\in \mathcal {A}\). Moreover, \(\mathcal {Y}\) is partitioned into two non-empty subsets: the set of visible classes \(\mathcal {Y}_V = \bigcup _{(X_i,Y_i,A_i)\in S} \{Y_i\}\) and the set of hidden classes \(\mathcal {Y}_H = \mathcal {Y}{\setminus } \mathcal {Y}_V\). The goal is to use S to learn a hypothesis \(f:\mathcal {X}\rightarrow \mathcal {Y}_H\) that can correctly predict the hidden classes \(\mathcal {Y}_H\).
The key part of Definition 1 is the final goal. Unlike the traditional setting of learning, we no longer assume that the sample size m is large enough for all classes in \(\mathcal {Y}\) to be seen during the training stage. In general, we allow \(\mathcal {Y}\) to be partitioned into two non-empty subsets \(\mathcal {Y}_V\) and \(\mathcal {Y}_H\), which, respectively, correspond to the visible and the hidden classes in the given sample S. The classical argument for why the goal of learning to predict the hidden classes is possible in this setting is that the hidden classes \(\mathcal {Y}_H\) are coupled with the instances \(\mathcal {X}\) and the visible classes \(\mathcal {Y}_V\) via the semantic space \(\mathcal {A}\) [14].
To illustrate the traditional argument for attribute-based zero-shot learning, let us consider the simple polygon shape recognition problem shown in Fig. 1. In this problem, the instance space \(\mathcal {X}\) is the set of images of polygons, i.e. two-dimensional shapes bounded by a closed chain of a finite number of line segments, while the target set \(\mathcal {Y}\) is the set of the five disjoint classes shown in Fig. 1.
In the traditional setting of learning, a large sample of instances S would be collected and a classifier would be trained on the sample (e.g. using one-vs-all or one-vs-one). One of the fundamental assumptions in the traditional setting of learning for guaranteeing generalization is the stationarity assumption; examples in the sample S are assumed to be drawn i.i.d. from the same distribution as the future examples. Along with a few additional assumptions, learning in the traditional setting can be rigorously shown to be feasible [1, 5, 19, 23].
In the zero-shot learning setting, by contrast, it is assumed that the target set \(\mathcal {Y}\) is partitioned into two non-empty subsets \(\mathcal {Y}=\mathcal {Y}_V\cup \mathcal {Y}_H\). During the training stage, only instances from the visible classes \(\mathcal {Y}_V\) are seen. The goal is to be able to predict the hidden classes correctly. This goal is arguably achieved by introducing a coupling between \(\mathcal {Y}_V\) and \(\mathcal {Y}_H\) via the semantic space. For example, we recognize that the five classes in Fig. 1 can be completely determined by the values of the following three binary attributes:
-
\(a_1\): Does the polygon contain 4 sides?
-
\(a_2\): Are all sides in the polygon of equal length?
-
\(a_3\): Does the polygon contain, at least, one acute angle?
The set of all possible answers to these three binary questions forms a semantic space \(\mathcal {A}\) for the polygon shape recognition problem. Given an instance X with the semantic embedding \(A=(a_1,a_2,a_3)\in \{0,1\}^3\), its class can be uniquely determined. For example, any equilateral triangle has the semantic embedding (0, 1, 1), which means that the latter polygon (1) does not contain four sides, (2) its sides are all of equal length, and (3) it contains some acute angles. Among the five classes in \(\mathcal {Y}\), only the class of equilateral triangles satisfy this semantic embedding. Similarly, the four remaining classes have unique semantic embeddings as well.
Because the classes can be inferred from the values of the three binary attributes mentioned above, it is often argued that hidden classes can be predicted by, first, learning to predict the values of the semantic attributes based on a sample (training set) S, and, second, by using those predicted attributes to predict the hidden classes in \(\mathcal {Y}_H\) via some hand-crafted mappings [7, 9, 13–15, 17, 20, 21]. In our example in Fig. 1, suppose the class of non-square rectangles is never seen during the training stage. If we know that a polygon has the semantic embedding (1, 0, 0), which means that it has four sides, its sides are not all of equal length, and it does not contain any acute angles, then it seems reasonable to conclude that it is a non-square rectangle even if we have not seen any non-square rectangles in the sample S. Does this imply that zero-shot learning is a well-posed approach? We will show that the answer is, in fact, negative. The key ingredient in our argument is the fact that the mechanical task of predicting an attribute is quite different from the epistemological task of learning the correct meaning of the attribute.
The rest of the paper is structured as follows. We first explain why the two tasks of “predicting" an attribute and “learning" an attribute are quite different from each other. We will illustrate this overlooked fact on the simple shape recognition problem of Fig. 1 and demonstrate it in a greater depth on some synthetic and real datasets afterward. Next, we use such a distinction between “predicting" and “learning" to argue that the attribute-based zero-shot learning approach is fundamentally ill-posed, which, we believe, explains why the previous zero-shot learning algorithms proposed in the literature have not performed significantly better than supervised learning with very few training examples.
2 Why Learning and Predicting Are Two Different Tasks
2.1 The Polygon Shape Recognition Problem
Let us return to the original polygon shape recognition example of Fig. 1. Suppose that the two classes of non-square rectangles and non-rectangular parallelograms are hidden from the sample S. That is:
In the attribute-based zero-shot learning setting, we learn to predict the three semantic attributes \((a_1,a_2,a_3)\) mentioned earlier based on the sample S that only contains examples from the visible three classes. Once we learn to predict them correctly based on the sample S, we are supposed to be able to recognize the two hidden classes via their semantic embeddings. The semantic embedding for non-square rectangles is, in this example, (1, 0, 0), while the semantic embeddings for non-rectangular parallelograms is the set \(\{(1,0,1), (1,1,1)\}\).
To see why this is, in fact, an incorrect approach, we note that the task of predicting an attribute aims, by definition, at utilizing all the relevant information in the sample S that aid the prediction task. In our example, since only the three visible classes \(\mathcal {Y}_V\) are seen in the sample S, a good predictor should infer from S the following logical assertions:
-
1.
If a polygon does not contain four sides, then it contains one acute angle. Formally:
$$\begin{aligned} (a_1=0)\rightarrow (a_3=1) \end{aligned}$$From this, the contrapositive assertion \((a_3=0)\rightarrow (a_1=1)\) is deduced as well.
-
2.
If the sides of a polygon are not of equal length, then it does not contain four sides. Formally:
$$\begin{aligned} (a_2=0)\rightarrow (a_1=0) \end{aligned}$$Again, its contrapositive assertion also holds.
-
3.
If the polygon does not contain an acute angle, then all of its sides are of equal length. Formally:
$$\begin{aligned} (a_3=0)\rightarrow (a_2=1) \end{aligned}$$
These logical assertions and others are likely to be used by a good predictor, at least implicitly, since they are always true in the sample S. In addition, such a predictor would have a good generalization ability if the instances continued to be drawn i.i.d. from the same distribution as the training sample S, i.e. if the set of visible classes remained unchanged.
If, on the other hand, an instance is now drawn from a hidden class in \(\mathcal {Y}_H\), then some of these logical assertions would no longer hold and the original algorithm that was trained to predict the semantic attributes would fail. This follows from the fact that instances drawn from the hidden classes have a different distribution. Therefore, the fact that classes can be uniquely determined by the values of the semantic attributes is of little importance here because the semantic attributes are likely to be predicted correctly for the visible classes only. Needless to mention, this violates the original goal of the attribute-based zero-shot learning setting.
2.2 Optical Digit Recognition
To show that the previous argument on the polygon shape recognition problem is not a contrived argument, let us look into a real classification problem, in which we can visualize the decision rule used by the predictors. We will use the optical digit recognition problem to illustrate our argument. In order to be able to interpret the decision rule used by the predictor, we will use the linear support vector machine (SVM) algorithm [4, 6], trained without the bias term using the LIBLINEAR package [8].
One way of introducing a semantic space for the ten digits is to use the seven-segment display shown in Fig. 2. That is, the instance space \(\mathcal {X}\) is the set of noisy digits, the classes are the ten digits \(\mathcal {Y}=\{0,1,2,\ldots , 9\}\), and the semantic space is \(\mathcal {A}=\{-1,+1\}^7\) corresponding to the seven segments. For example, using the order of segments (a,b,c,d,e,f,g) shown in Fig. 2, the digit 0 in Fig. 2 has the semantic embedding (1, 1, 1, 0, 1, 1, 1) while the digit 1 has the semantic embedding (0, 0, 1, 0, 0, 1, 0), and so on.
In our implementation, we run the experiment as followsFootnote 1. First, a perfect digit is generated, which is later contaminated with noise. In particular, every pixel is flipped with probability 0.1. As a result, the instance space is the set of noisy digits, as depicted in Fig. 3. Then, five digits are chosen to be in the visible set of classes, and the remaining digits are hidden during the training stage. We train classifiers that predict the attributes (i.e. segments) using the visible set of classes, and use those classifiers to predict the hidden classes afterward.
Now, the classical argument for attribute-based zero-shot learning goes as follows:
-
1.
Every digit can be uniquely determined by its seven-segment display. When an exact match is not found, one can carry out a nearest neighbor search [7, 9, 15, 17, 20, 21] or a maximum a posteriori estimation method [13, 14].
-
2.
Every segment in {a,b,c,d,e,f,g} is a concept class by itself that spans multiple digits. Hence, the number of training examples available for each segment is large, which makes it easier to predict.
-
3.
Because each of the seven segments spans multiple classes, we no longer need to see all of the ten digits during the training stage in order to learn to predict the seven segments reliably.
This argument clearly rests on the assumption that “learning" a concept is equivalent to the task of reliably “predicting"it. From the earlier discussion in the polygon shape matching problem, this assumption is, in fact, invalid. Figure 4 shows what happens when a proper subset of the ten digits is seen during the training stage. As shown in the figure, the linear classifier trained using SVM exploits the relevant information available in the training set to maximize its prediction accuracy of the attributes.
For example, when we train a classifier to predict the segment ‘a’ using the visible classes \(\{0,1,2,3,4\}\), a good predictor would use the fact that the segment ‘a’, which is the target concept, always co-occurs with the segment ‘g’. Therefore, the contrapositive rule implies that the absence of ‘g’ implies the absence of the segment ‘a’. This is clearly seen in Fig. 4 (top left corner). Of course, what the predictor learns is even more complex than this, as shown in Fig. 4. When novel instances from the hidden classes are present, these correlations no longer hold and the algorithm fails to predict the semantic attributes correctly. To reiterate, such a failure is fundamentally due to the fact that the hidden classes constitute a different distribution of instances from the one seen during the training stage.
The results of applying a linear SVM using binary relevance to predict the seven segments is shown in Fig. 4. In this figure, the blue regions correspond to the pixels that contribute positively to the decision rule for predicting the corresponding segment, while the red regions contribute negatively. There are two key takeaways from this figure. First, the prediction rule used by the classifier does not correspond to the “true" meaning of the semantic attribute. After all, the goal of classification is to be able to “predict"the attribute as opposed to learning what it actually means. Second, changing the set of visible classes can change the prediction rule for the same attribute quite notably. Both observations challenge the rationale behind the attribute-based zero-shot learning setting.
2.3 Zero-Shot Learning on Popular Datasets
Next, we examine the performance of zero-shot learning on benchmark datasets. Two of the most popular datasets for evaluating zero-shot learning algorithms are the Animals with Attributes (AwA) dataset [14] and the aPascal-aYahoo dataset [9]Footnote 2. We briefly describe each dataset next.
The Animals with Attributes (AwA) Dataset: The AwA dataset was collected by querying search engines, such as Google and Flickr, for images of 50 animals. Afterward, these images were manually handled to remove outliers and duplicates. The final dataset contains 30,475 images, where the minimum number of images per class is 92 and the maximum is 1,168. In addition, 85 attributes are introduced. In the zero-shot learning setting, 40 (visible) classes are used for training and 10 (hidden) classes are used for testing [14].
The aPascal-aYahoo (aP-AY) Dataset: The aP-aY dataset contains 12,695 images, which were chosen from the PASCAL VOC 2008 data set [9]. These images are used during the training stage of the zero-shot learning setting. In addition, a total of 2,644 images were collected from the Yahoo image search engine to be used during the test stage. Both sets of images have disjoint classes. More specifically, the training dataset contains 20 classes while the test dataset contains 12 classes. Moreover, every image has been annotated with 64 binary attributes.
Results: Table 1 presents some fairly-recent reported results on the two datasets AwA and aP-aY. The zero-shot learning algorithms provided in the table are the direct attribute prediction (DAP) algorithm proposed in [13, 14], which is one of the standard baseline methods for this task, the indirect attribute prediction (IAP) algorithm proposed in [13, 14], the embarrassingly simple zero-shot learning algorithm proposed in [18], and the zero-shot random forest algorithm proposed in [12]. The best reported prediction accuracy for the AwA dataset is 49.3 % while the best reported prediction accuracy for the aP-aY dataset is 26.0 %.
In order to properly interpret the reported results, we have also provided in Table 1 the number of training examples from the hidden classes that would suffice, in a traditional supervised learning setting, to obtain the same accuracy reported by the zero-shot learning algorithms in the literature. These latter figures are obtained from the experimental study conducted in [14]. Note that while about 600 examples per visible class are used during the training stage, the best reported zero-shot prediction accuracy on the hidden classes is equivalent to the accuracy of supervised learning using fewer than 20 training examples per hidden class. In fact, the zero-shot learning accuracy reported on the aP-aY is worse than the accuracy of supervised learning when as few as 2 training examples per hidden class are used.
When the area under the curve (AUC) is used as a performance measure, which is known to be more robust to class imbalance than the prediction accuracy, then the apparent merit of zero-shot learning becomes even more questionable. For instance, the popular direct attribute prediction (DAP) on the AwA dataset achieves an AUC of 0.81, which is equivalent to the performance of supervised learning using as few as 10 training examples from each hidden class only (c.f. Tables 4 and 7b in [14]). Recall, by contrast, that over 600 examples per visible class are used for training.
3 A Mathematical Formalism
The above argument and empirical evidence on the ill-posedness of attribute-based zero-shot learning can be formalized mathematically. Incidentally, this will allow us to identify paradigms of zero-shot learning for which the above argument no longer holds.
As stated in Sect. 2 and illustrated in Fig. 4, the fundamental problem with attribute-based zero-shot learning is that it aims at learning concept classes (semantic attributes) with respect to one distribution of instances (i.e. when conditioned on the visible set of classes) with the goal of being able to predict those concept classes for an arbitrary distribution of instances (i.e. when conditioned on the unknown hidden set of classes). Clearly, this is an ill-posed strategy that violates the core assumptions of statistical learning theory.
To remedy this problem, we can cast zero-shot learning as a domain adaptation problem [18]. In the standard domain adaptation setting, it is assumed that the training examples are drawn i.i.d. from some source distribution \(\mathcal {D}_S\) whereas future test examples are drawn i.i.d. from a different target distribution \(\mathcal {D}_T\). Let \(h:\mathcal {X}\rightarrow \mathcal {Y}\) be a predictor. Then, the average misclassification error rate of h with respect to \(\mathcal {D}_T\) is bounded by:
where \(d_{TV}(\mathcal {D}_S, \mathcal {D}_T)\) is the total-variation distance between the two probability distributions \(\mathcal {D}_S\) and \(\mathcal {D}_T\) [2]. Similar bounds that also hold with a high probability can be found in [3]. Hence, learning a good predictor h with respect to some source distribution \(\mathcal {D}_S\) does not guarantee a good prediction accuracy with respect to an arbitrary target distribution \(\mathcal {D}_T\) unless the two distributions are nearly identical.
Therefore, in order to turn zero-shot learning into a well-posed strategy, it is imperative that a common representation R(X) is used, such that the induced distribution of R(X) remains nearly unchanged when the instances X are conditioned on the visible set of classes or on the hidden set of classes. Then, by learning to predict semantic attributes given R(X), generalization bounds, such as the one provided in Eq. (1), guarantee a good prediction accuracy in the zero-shot setting. One method that can accomplish this goal is to divide the instances \(X_i\) into multiple local segments \(X_i\rightarrow (Z_{i,1}, Z_{i,2},\ldots )\in \mathcal {Z}^r\) such that a classifier \(h:\mathcal {Z}\rightarrow \mathcal {A}\) is trained to predict the semantic attributes in every local segment separately. If these local segments have a stable distribution across the visible and hidden set of classes, then zero-shot learning is feasible. A prototypical example of this approach is segmenting sounds into phonemes in word recognition systems and using those phonemes to recognize words (classes) [17].
4 Conclusion
Attribute-based zero-shot learning is a transfer learning strategy that has been widely studied in the literature. Its aim is to learn to predict novel classes that are never seen during the training stage by learning to predict semantic attributes instead. In this paper, we argue that attribute-based zero-shot learning is an ill-posed strategy because the two tasks of “predicting" and “learning" an attribute are fundamentally different. We demonstrate our argument on synthetic datasets and use it, finally, to explain the poor performance results that have been reported so far in the literature for various zero-shot learning algorithms on popular benchmark datasets.
Notes
- 1.
The MATLAB implementation codes that generate the images in this section are available at https://mine.kaust.edu.sa/Pages/zero-shot-learning.aspx.
- 2.
These datasets are available at:
References
Abu-Mostafa, Y.S., Magdon-Ismail, M., Lin, H.T.: Learning from data (2012)
Alabdulmohsin, I.: Algorithmic stability and uniform generalization. In: NIPS, pp. 19–27. Curran Associates, Inc. (2015)
Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 137–144. MIT Press, Cambridge (2006). http://books.nips.cc/papers/files/nips19/NIPS2006_0838.pdf
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Bousquet, O., Boucheron, S., Lugosi, G.: Introduction to statistical learning theory. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) Machine Learning 2003. LNCS (LNAI), vol. 3176, pp. 169–207. Springer, Heidelberg (2004)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Dinu, G., Baroni, M.: Improving zero-shot learning by mitigating the hubness problem. In: ICLR: Workshop Track (2015). arXiv:1412.6568
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. JMLR 9, 1871–1874 (2008)
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR, pp. 1778–1785. IEEE (2009)
Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)
Haehl, V., Vardaxis, V., Ulrich, B.: Learning to cruise: Bernstein’s theory applied to skill acquisition during infancy. Hum. Mov. Sci. 19(5), 685–715 (2000)
Jayaraman, D., Grauman, K.: Zero-shot recognition with unreliable attributes. In: NIPS, pp. 3464–3472 (2014)
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 951–958. IEEE (2009)
Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR, pp. 3337–3344. IEEE (2011)
Rader, N., Bausano, M., Richards, J.E.: On the nature of the visual-cliff-avoidance response in human infants. Child Dev. 51(1), 61–68 (1980)
Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: NIPS, pp. 1410–1418 (2009)
Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: ICML, pp. 2152–2161 (2015)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., Matsumoto, Y.: Ridge regression, hubness, and zero-shot learning. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 135–151. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23528-8_9
Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: NIPS, pp. 935–943 (2013)
Thrun, S., Mitchell, T.M.: Lifelong robot learning. Rob. Auton. Syst. 15, 25–46 (1995)
Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)
Acknowledgment
Research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST) and the Saudi Arabian Oil Company (Saudi Aramco).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Alabdulmohsin, I., Cisse, M., Zhang, X. (2016). Is Attribute-Based Zero-Shot Learning an Ill-Posed Strategy?. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2016. Lecture Notes in Computer Science(), vol 9851. Springer, Cham. https://doi.org/10.1007/978-3-319-46128-1_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-46128-1_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46127-4
Online ISBN: 978-3-319-46128-1
eBook Packages: Computer ScienceComputer Science (R0)