Novel Relevance Feedback Approach for Color Trademark Recognition Using Optimization and Learning Strategy

Latika Pinjarkar; Manisha Sharma; Smita Selot

doi:10.1515/jisys-2017-0022

Open Access Published by De Gruyter July 20, 2017

Novel Relevance Feedback Approach for Color Trademark Recognition Using Optimization and Learning Strategy

Latika Pinjarkar , Manisha Sharma and Smita Selot

From the journal Journal of Intelligent Systems

https://doi.org/10.1515/jisys-2017-0022

Abstract

The trademark registration process, apparent in all organizations nowadays, deals with recognition and retrieval of similar trademark images from trademark databases. Trademark retrieval is an imperative application area of content-based image retrieval. The main challenges in designing and developing this application area are reducing the semantic gap, obtaining higher accuracy, reducing computation complexity, and subsequently the execution time. The proposed work focuses on these challenges. This paper proposes the relevance feedback system embedded with optimization and unsupervised learning technique as the preprocessing stage, for trademark recognition. The search space is reduced by using particle swam optimization, for optimization of database feature set, which is further followed by clustering using self-organizing map. The relevance feedback technique is implemented over this preprocessed feature set. Experimentation is done using the FlickrLogos-32 PLUS dataset. To introduce variations between the training and query images, transformations are applied to each of the query image, viz. rotation, scaling, and translation of the image. The same query image is tested for various combinations of transformations. The proposed technique is invariant to various transformations, with significant performance as depicted in the results.

Keywords: Content-based image retrieval (CBIR); particle swarm optimization; relevance feedback; self-organizing map (SOM); trademark

1 Introduction

Trademark image is a graphical image containing image, text, or both that is majorly used by industries and organization for their identification. Every registered company or organization must have their own unique logo design accepted by a competent authority. An automatic trademark approval system matches an input query image with set of registered logo before finalizing it as the final trademark image of the company. Automatic searching in a logo image database is a major application under a content-based image retrieval (CBIR) system. CBIR is an image search in account of the user’s interest based upon the visual contents of an image. Most of the retrieval-based CBIR applications entail image searching, image matching, and image retrieval. The main challenge lies in converting high-level semantic content of an image interpreted by humans to low-level feature representation. A broad range of applications with these challenges in CBIR have created significant scope for research in pattern matching, recognition, and retrieval of images in the system.

An exhaustive database of feature vectors representing color, texture, and shape of trademark images is generated. A query image is submitted to the system; features are extracted and matched with the features set of stored images. These stored features might contain some irrelevant and redundant features not useful in the retrieval process. Hence, the feature set is optimized and retrieval accuracy is improved using the particle swarm optimization (PSO) technique.

PSO is an optimization technique that uses a parallel search of multiple points that are altered inside the search space. The method was suggested by Kennedy and Eberhart in 1995. PSO can be used to resolve different optimization problems, as it is a population-based optimization technique. The advantage of PSO is its fast convergence when compared with other optimization techniques such as genetic algorithm and global optimization algorithms [11]. This optimized database feature set is trained using artificial neural network-based training, i.e. self-organizing map (SOM). The SOM is an unsupervised learning technique especially applicable to handle high-dimensional data. Many data analysis tasks, such as gene clustering in the medical field, multimedia applications, pattern recognition, and web-based contents, make use of this technique. The non-linear mapping of a high-dimensional input space into two-dimensional grid of artificial neural units is the important property of SOM. During the training phase of SOM, a topological map is formed that involves the mapping of input vectors near to each other in close-by map units [14]. Feedback is taken from the user about the relevancy of the images retrieved by the CBIR system called as relevance feedback (RF) [18, 23]. The information from the feedback is used for improvement of the query. The query improvement process enhances the results of the image retrieval system in terms of image recognition.

The significant challenges in designing a logo or trademark registration framework are as follows: (i) interactive search, reducing semantic gap between user perception and image representations; (ii) assessment with importance on representative test sets; and (iii) reducing the computation complexity and thus subsequently lessening the processing time.

The organization of the paper is as follows. Section 2 describes the novelty and contribution of the work. Related work done in this area is discussed in Section 3. The proposed approach and details about dataset used are represented in Section 4. Section 5 reports experimentation results, and the conclusion is presented in Section 6.

2 Novelty and Contribution of the Work

Machine learning techniques play a vital role in adding an intelligent dimension to the problem-solving process, thereby generating a class of intelligent algorithm. The proposed work incorporates them in two phases.

Firstly, at the preprocessing stage, the database feature set is optimized through optimization (PSO) and clustered using unsupervised learning (SOM).
Secondly, basic RF is embedded with short-term learning (STL) and long-term learning (LTL), for trademark recognition and retrieval.

STL is implemented through a query improvement process of RF for an initial query. A log of feedback information is maintained to infer the navigation patterns for the next query phases under LTL. These techniques in the RF algorithm have helped bridge the semantic gap between the low-level features and user perception. The incorporation of optimization (PSO) and machine learning technique (SOM) at the preprocessing stage has narrowed the search space and reduced the computation complexity of the proposed framework, resulting in less processing time. The retrieval results of the proposed system proved to be promising when compared with the recent works of Iandola et al. [10] and Bao et al. [4].

3 Related Work

Kameyama et al. [11] suggested an approach using PSO for tuning the parameters included in the relevance evaluation algorithm of a CBIR system, by optimizing them according to the appropriateness of the retrieved results. The retrieval ranking score was enhanced by tuning the parameters that affect the similarity evaluation in a binary shape matching the CBIR system. Laaksonen et al. [14] proposed SOMs as a relevance mechanism for CBIR. An exclusive tree-structured SOM for every individual feature type was present in the PicSOM system. Rusiñol et al. [22] designed an efficient queried-by-example retrieval system for trademark images. Vienna codes and visual contents of the images were used to describe the images. The RF technique was used to improve the retrieval efficiency of the system. The database included 30,000 trademark images. Wang and Hong [27] implemented a trademark retrieval algorithm combining the image global features and local features. Zernike moments were extracted and sorted on the basis of similarity. Scale-invariant feature transform (SIFT) features were used as a similarity measure. The standard image set “MPEG7 CE Shape-2 Part-B” of 3621 trademark images was used as an image database. Hou and Shi [8] introduced a Logo on Map system consisting of three modules: picture extraction module (PEM), logo matching module (LMM), and web mapping module. The PEM was based on a keyword textual search, while the LMM was a visual search using the SIFT algorithm. Bagheri et al. [3] recommended a system based on shape feature extraction techniques. Zernike moments and Fourier descriptors were used to describe shape feature. For recognition of the logo images, the Dempster-Shafer theory combination strategy was utilized with the three classifiers. Alaei and Delalandre [1] modeled a logo recognition system for document images. This system detected the regions of interest in the document images. The piece-wise painting algorithm and probability features along with a decision tree technique were used for this detection. Laaksonen et al. [13] recommended a scheme for CBIR in big databases. The system was based upon tree structured SOMs (TS-SOMs) called PicSOM and RF technique. Image feature descriptors such as color, texture, or shape were used to form TS-SOM. Suganthan [25] implemented a shape indexing method using SOM. Pair-wise relational attribute vectors were used to extract the structural information contained in the geometric shape. These vectors were quantized using a SOM. Two SOMs were used in this proposed work, named as SOM1 and SOM2. Global histograms of relational attribute vectors were contained in SOM1. These histograms were given as input vectors to SOM2. SOM1 was used to confine the shape properties of the objects, and topology-conserving mapping for the structural shapes was generated by SOM2. The database used was from the UK Trade Mark Registry office containing the trademark image database of >10,000 images. Vesanto et al. [26] used the SOM as a vector quantization technique that places the sample vectors on a regular low-dimensional grid in a sorted style referred as SOM toolbox. Hussain and Eakins [9] presented a new approach for visual clustering of multicomponent images of trademarks, using the topological characteristics of the SOM. The proposed approach was implemented in two stages. In the first step, the features extracted from image components were used to construct a two-dimensional map. In the second step, a component similarity vector from a query image was derived. The experimentation was performed using a database of 10,000 trademark images. Okayama et al. [17] proposed two approaches based on the user’s feedback for optimization of retrieval in a CBIR system. In the first approach, a supervised training technique was employed upon the database feature set to generate a map, based on the information from the user’s feedback. The second approach used the PSO technique for optimization of the parameters in the fine-matching relaxation action according to the user’s assessment of the retrieved image ranking. Ma et al. [16] discussed a hybrid method for image retrieval and clustering, using PSO and support vector machine (SVM). The RF method based upon linear/quadratic estimators was used to get better retrieval results. Xue et al. [29] conducted a study on multiobjective PSO for feature selection. They suggested two PSO-based multiobjective feature selection algorithms. The concept of non-dominated sorting into PSO to handle feature selection problems was presented in the first algorithm. The ideas of crowding, mutation, and dominance to PSO to search for the Pareto front solutions were present in the second algorithm.

In Ref. [21], Romberg et al. designed a scalable logo recognition framework. The quantized representation of the logo regions was derived by means of the local features and the spatial structure compositions. The framework was evaluated with the FlickrLogos32 dataset. Revaud et al. [19] found how to learn a statistical model for the distribution of wrong detections yielded by an image matching algorithm. The experiments were conducted on the BelgaLogos and FlickrLogos datasets. In Ref. [20], a logo recognition framework using feature bundling was proposed. The local features and the features from the spatial neighborhood were combined into bundles. The experiments used the FlickrLogos-32 dataset for evaluation and testing. Recent works in the field [4, 5, 6, 7, 10] observed implementation of a deep learning mechanism for logo recognition. Bianco et al. [5] and Eggert et al. [6] tested the logo recognition framework by using pretrained convolutional neural networks (CNNs) and synthetically generated data. Hoi et al. [7] applied deep learning techniques for logo detection and brand recognition. Deep region-based convolutional networks procedures were tested for object detection mechanisms. Iandola et al. [10] employed deep CNNs (DCNNs) for logo recognition. DCNN architectures such as GoogLeNet-GP, Google Net-FullClassify, and Full-Inception were proposed and tested for accuracy on the FlickrLogos-32 dataset. FRCN was proposed as an improved framework for object detection over region-based CNN (R-CNN) with more accuracy and faster training time. Existing DCNN architectures, namely AlexNet and VGG-16, were integrated with the FRCN framework and tested with the FlickrLogos-32 dataset. Bao et al. [4] investigated the suitable design and settings of R-CNN for logo detection and tested the system with the FlickrLogos-32 dataset.

The work done by most of the above researchers in the area of trademark image recognition is based upon either color, texture, or shape features. Multiple feature-based image retrieval has its own advantage. The proposed system is designed based upon color, texture, and shape feature extraction. Many trademark image retrieval systems proposed are dealing with all categories of trademark images (only image, only text, or combination of both); however, the recognition results of these systems are not satisfactory as compared with the systems that are mainly intended to deal only one category of trademark, as reported in Ref. [28]. The proposed framework is dealing with all the three categories of trademark. As found in the literature survey, PSO and SOM are popular optimization and learning techniques implemented in image retrieval applications, thereby enhancing their retrieval results. To the best of our knowledge, very few of the researchers have made use of the PSO and SOM techniques integrated with the RF strategy for trademark image retrieval. In the proposed framework, the RF (with machine learning) strategy is incorporated with the PSO and SOM algorithms for trademark image retrieval. Also, the proposed framework is tested for robustness by giving the transformed (after translation, rotation, and scaling) query image as input to the system.

4 Proposed Approach

Based on the analysis of the work done in the field, a search space with a large feature set adds a large quantum to the execution time of the algorithm. CBIR-based image retrieval with intelligent agents will help the system to improve the performance. Hence, learning techniques are incorporated at the preprocessing stage and in the RF algorithm. Experiment is performed on the dataset and results are reported. The overall process of the proposed system is summarized in following steps:

Generate the image database with stored feature vectors.
Optimize the feature set.
Apply SOM for creating clusters.
Select the query image and apply transformations upon the query image (translation, rotation, and scaling).
Give transformed query image as an input.
Use RF for finding the relevant images from a few clusters until the input image is selected or rejected as trademark.

PSO is based on swarm intelligence where each solution is represented by a particle in search space. The underlying principle of PSO is based not only on local interaction of particles but also the global exchange of information among particles, analogous to the social behavior of a flock of birds. This evolutionary computation technique is not only simple to implement but reveals promising results in knowledge optimization. However, a major concern in PSO is that it easily gets trapped into local optimum in high-dimensional space and has a low convergence rate in the iterative process, and may not guarantee an optimum solution. Hence, with a reduced feature set, search space is clustered using SOM and search is performed on the topmost clusters.

4.1 Generating the Dataset of Feature Vectors

The experimentation is done using the FlickrLogos-32 PLUS dataset. The details of this dataset are given in Table 1. The publicly available FlickrLogos-32 dataset contains some non-logo images. These non-logos are removed and some more logo images are added to the set, generating the FlickrLogos-32 PLUS dataset.

Table 1:

Details of Logo Dataset Used in the Experimentation.

Dataset	Total no. images	Logos	Brand
Available: FlickrLogos-32 dataset	8240	32	32
Used: FlickrLogos-32 PLUS dataset	8500	37	37

Some sample logo images used in the framework from the FlickrLogos-32 PLUS dataset are represented in Figure 1. The visual content of the image is described in terms of its low-level features, i.e. color, texture, and shape [15]. The features of the trademark images in the database are extracted by implementing color feature extraction, texture feature extraction, and shape feature extraction. Color histogram, color moments, and color correlogram techniques are used for color feature extraction. Gabor wavelet and Haar wavelet are implemented for texture feature extraction and for shape feature extraction. Fourier descriptor and circularity features are implemented.

Figure 1:

Sample Trademark Images from FlickrLogos-32 PLUS Dataset.

4.2 Optimizing Feature Sets

The PSO technique is implemented to get a minimized feature set that is then fed as input to SOM. The image retrieval systems contain large numbers of features for dataset description. These features might contain irrelevant or redundant features not useful in the retrieval process. Therefore, feature selection/optimization is required to remove these irrelevant and redundant features from the input feature space. In the framework of optimization search, in a parameter space, a particle has its present coordinate in the search space. Every particle maintains record of its coordinates in the problem space and is known as the best solution. This value is called as Abest. Another best position is the best value, obtained by any particle among all particles in the region. This value is known as Bbest. The jth particle is denoted as Y_j=(Y_j₁, Y_j₂, …, Y_jn) and the rate of progress situation of the particle is named as velocity, which is indicated as V_j=(V_j₁, V_j₂, …, V_jn). The particle changes its positions and velocity using the following equations:

(1)Vj(m+1)=Vj(m)+d1 *q1(Abest−Yj(m))+d2*q2(Bbest−Yj(m)),

(2)Yj(m+1)=Yj(m)+Vj(m),

where V_j (m) is velocity of particle j at iteration m, Y_j (m) is the position of particle j at iteration m, V_j (m+1) is velocity of particle j at iteration m+1, Y_j (m+1) is the position of particle j at iteration m+1, q₁ and q₂ are the random numbers between (0,1), d₁ is cognitive acceleration coefficient, and d₂ is social acceleration coefficient.

PSO is an evolutionary computational technique exhibiting swarm intelligence and optimizing feature sets. As discussed, it is a promising method for optimization but does not guarantee optimization [2], hence a feature set is clustered using SOM into groups before implementing searching. A set of clusters with an average feature value close to the feature of input query are sorted, and they define the new reduced search space for implementing RF.

4.3 SOM for Creating Clusters

The SOM defines the mapping from the input data set Dⁿ onto a standard two-dimensional array of nodes (map network). Every node j in the map is associated with a parametric reference vector n_j∈Dⁿ. In the proposed approach, the array of nodes is projected onto a rectangular lattice. For mapping the input, every input vector y is evaluated against the n_j:s and the top equivalent is derived. Every input vector y∈Dⁿ is compared with the n_j:s using the Euclidean distance. The winning node d is calculated using the following equation:

(3)‖y − nd‖=min{‖y − nj‖}.

y is mapped onto d, relative to the parameter values n_j. Nodes that are topographically near to another in the array learn from the same input. The update formula is given as

(4) nj(t+1)=nj(t)+hd,j(t)[y(t)−nj(t)]

where t is the discrete-time coordinate and h_d,j is the region defining function. The early values of n_j:s are random [12].

4.4 RF Implementation

RF implementation is based on the query improvement strategy (STL) and the mining concept (LTL). The query improvement strategy is implemented through the new query point (NQP), query development (QDE), and query rewriting (QRW) methods. The proposed framework is tested for 10 iterations of RF. The visual query points (of relevant/positive images) of every iteration are grouped into clusters using the k-means clustering algorithm. A cluster number is assigned to each cluster. The RF information is kept as log records in four different tables that are used for mining frequently occurring patterns for the next query sessions. Data structures needed for storing this log information are described as follows:

Unique record table – contains query image name, iteration number, query point, and relevant image name.
Query position table – contains query image name and query point.
Navigation operation table – contains query image name, iteration number, and cluster number. The entries of this table are implemented as follows:
1. Based on retrieval results after every iteration, query points are clustered using k-means clustering.
2. The cluster number is assigned to each cluster and is saved in this table along with the iteration number.
3. These clusters are traversed for constructing a pattern for every query. For example, given a query image, if the retrieved images are available in cluster 2 of iteration 1, cluster 1 of iteration 2, and cluster 2 of iteration 3, then the pattern is N21, N12, and N23. These sequential patterns constructed are used for the mining process; the frequently occurring patterns are mined using the Apriori mining algorithm.
Record partition table – contains query point and relevant image name.

The information related to each feedback is stored in the unique record table after every iteration of the feedback. The navigation operation table is used for mining the navigation patterns. The query position table and record partition table are used for searching the relevant images. The navigation pattern tree is constructed by using the sequential patterns determined. Each branch of this tree represents the sequential pattern. The query of every sequential pattern is utilized as the seed of that tree referred as query pit.

The proposed logo retrieval system performs the search process by taking the inputs as follows:

Group of positive and negative images taken from the previous feedback.
Navigation pattern trees with query pit referred as “pn” and the sequential patterns. The search process is executed as per the following steps:

The NQP is determined by taking the average of features of positive images.
The closest query pits (root) are determined to get the matching sequential patterns.
These matching navigation pattern trees are used to determine the closest leaf nodes.
Then the top “r” relevant query points from the group of the closest leaf nodes are determined.
The top “t” relevant images are retrieved as output.

In the proposed methodology for NQP generation, step “a” is implemented. Steps “b–e” are followed for QDE. The new feature weights are determined using the features of positive images given by the user at every feedback for executing the QRW process. The details of the NQP generation, QRW, and QDE are described as follows.

4.4.1 NQP Generation

Suppose in the preceding feedback the images retrieved by the query point is denoted by old_qp. An NQP new_qp is determined by taking the average of the features of the positive images P, given by the user as a feedback. Suppose the positive images are given as P={p₁, p₂, …, p_k} and m dimensions of the jth feature R_j={r₁^Y, r₂^Y, …, r_m^Y} extracted from the Yth positive image. Then, the NQP new_qp indicated by P can be given as [23]

(5)newqp={R1,¯R2¯,…,Rc¯}, where 1≤j≤c,

Rj¯ ={r1,¯r2¯,…,rm¯}and rt¯=∑1≤y≤s,rty ∈Rj rty s.

new_qp and the images marked as positive are saved into the unique record table.

4.4.2 QRW

Suppose a set of positive images is given as P={p₁, p₂, …, p_k} determined by the old query point old_qp in the earlier feedback. The new weight of the jth feature R_j is given as [23]

(6)Tj=∑x=1 aαxαj∑y=1a∑x=1aαxαy,

Where

(7)α=∑z=1m∑i=1b(riz−rioldqp)2b,

and 1≤j≤a.

4.4.3 QDE

The weighted k nearest neighbor (KNN) search is done by using the QDE method. First, the closest query pit to every P is determined, termed as the positive query pit, and the closest query pit to each N is termed as the negative query pit. There may be some query pits present in both the positive query pit group and the negative query pit group. To handle this situation, a token pn.chk is assigned to every pit. If the pit has a greater number of negative examples than positive examples, then pn.chk=0; otherwise, pn.chk=1. After this, relevant query pits are determined. The navigation pattern tree is traversed to find a set of matching leaf nodes. The new feature weights as calculated in Eq. (6) are used in the search process to find the required images. The search process is classified into two steps. In the first step, the relevant visual query points are generated and in the second step the relevant images are determined. The query position table is used to determine the related query points. Top “r” similar query points are determined in step 1. The record partition table is searched for this top “r” similar query points to get the relevant images. The top “t” images, which are closer to new_qp, are retrieved as a result using the KNN search method.

Algorithm The proposed approach is implemented as per the following steps:

Step 1:Pre-processing of the database images

a) Extract the color feature by

i) Color histogram, ii) color correlogram, iii) color moments

b) Extract the texture feature by

i) Gabor wavelet, ii) Haar wavelet

c) Extract the shape feature by

i) Fourier descriptor, ii) circularity feature

Step 2:Apply PSO

PSO algorithm to determine the optimized (best) feature set

Step 3:Implement SOM

Database with optimized feature set is trained using SOM

Step 4:Select query image and apply transformations

Step 5:Extract features of transformed query image and give as input

Extract features of transformed image

Give transformed query image as an input to system

Step 6:Retrieve images

Identify images from the database relevant to the query image

Step 7:Take feedback

Feedback from the user

Image is relevant or not relevant

If relevant

Image is categorized as positive (P)

Else

Image is categorized as negative (N)

Step 8: Determine modified query point (new_qp)

Step 9: Retrieve images from the database

Retrieve image as relevant output image using this NQP.

5 Experiments and Results

5.1 Experimental Setup

The experimentation is done using the FlickrLogos-32 PLUS dataset. The PSO algorithm has optimized the feature space having total number of features as 5034 to selected number of features (best features) as 2519. The feature set optimized by PSO is given as input to the SOM. The SOM arranges these features into classes/clusters. When the query is given to the system, the top 3 relevant clusters/classes are identified using the average of the features of images in the cluster and Euclidean distance. The search for relevant images is done in only these three clusters.

The system first returns top 20 relevant images from the database. Feedback is taken from the user upon these 20 retrieved images. Finally, the top 9 relevant images based on color, texture, and shape similarity are retrieved as output. The performance of the system is evaluated by using standard evaluation parameters like precision, recall, and accuracy.

To introduce variations between the training and test images, transformations are applied to each of the query image viz. rotation, scaling, and translation of the image by shifting the image by the number of pixels in the X direction and in the Y direction (referred as TranslateX and TranslateY).

5.2 Time Complexity Analysis

The time complexity of the PSO algorithm is O (2ns+n²+2ns), where n is the number of particles and s is the dimension of search space. The time complexity of the SOM algorithm is O (p²), where p is the size of sample given as input. The costs of time and memory of the RF algorithm are linear with the total feature dimension [24]. The time complexity of the RF algorithm is given by

∑j=1nO (total number of images *mj)=O (total number of images)*M,

where M=∑j=1nmj, where m_j is the feature dimension of feature j.

The total time complexity of the proposed algorithm is given by T=T₁+T₂+T₃, where T₁, T₂, and T₃ are the time complexities of the PSO, SOM, and RF algorithms, respectively. T₁ and T₂ are the one-time costs, i.e. at preprocessing stage, and T₃ incurs at every query session. The execution time required by the system is 1.6 min without optimization and learning, while it requires 0.5 min after implementation of optimization and learning techniques as depicted in Table 7. The optimization technique narrows the search space by 49.96%, which is followed by clustering using SOM, thereby reducing the overall execution time of the algorithm by 68.75%.

5.3 Retrieval Results

The input transformed query image with the transformation factors as Translate X=5, Translate Y=10, rotation angle=30°, and scaling to the size 128×256, and the final retrieved images based on color, texture, and shape similarity are depicted in Tables 2–4 , respectively.

Table 2:

Final Output Retrieved Images Based on Color Similarity.

Table 3:

Final Output Retrieved Images Based on Texture Similarity.

Table 4:

Final Output Retrieved Images Based on Shape Similarity.

The evaluation parameters precision, recall, and accuracy can be defined as follows:

Precision=TP/(TP+FP),
Recall=TP/(TP+FN),
Accuracy=(TP+TN)/(TP+TN+FP+FN),

where TP=true positive: case was positive and predicted positive, TN=true negative: case was negative and predicted negative, FP=false positive: case was negative but predicted positive, and FN=false negative: case was positive but predicted negative.

For example, in each iteration of RF, for 20 output retrieved images, the TP, TN, FP, and FN counts are found as TP=8, TN=10, FP=1, and FN=1.

Then, accuracy is determined based on these TP, TN, FP, and FN counts as follows:

Accuracy=TP+TN/(TP+TN+FP+FN)

=(8+10) / (8+10+1+1)

=18/20=0.9=90%.

The results in terms of precision, recall, and accuracy for the sample query image, for different sets of transformation factors, are shown in Table 5. The system also calculates the similarity (%) in terms of color, texture, and shape of the top 9 retrieved images, as depicted in Table 6. Table 7 provides the comparison of the proposed approach with Su et al. [23]. The comparison for Iandola et al. [10] and Bao et al. [4] with the proposed approach is represented in Table 8.

Table 5:

Results in Terms of Precision, Recall, and Accuracy with the Transformed Query Image.

Query image	Transformation factors (translation, rotation, scaling)	Precision	Recall	Accuracy (%)
	Translate X: 5 Translate Y: 10 Rotate: 30° Resize: 128×256	0.962	0.916	92.81
	Translate X: 10 Translate Y: 10 Rotate: 60° Resize: 256×128	0.973	0.901	93.47
	Translate X: 10 Translate Y: 5 Rotate: 90° Resize: 128×128	0.969	0.931	92.98
	TranslateX: 10 TranslateY: 10 Rotate: 120° Resize: 128×256	0.969	0.896	93.22
Average		0.968	0.911	93.12

Table 6:

Color, Texture, and Shape Similarity with Respect to the Sample Query Image.

S. no	Color similarity (%)	Texture similarity (%)	Shape similarity (%)
1	100	100	100
2	85.284	87.033	95.654
3	81.605	83.126	94.12
4	80.769	85.012	93.561
5	78.721	83.117	89.274
6	72.214	86.813	85.891
7	72.661	86.929	80.721
8	69.939	84.862	78.632
9	65.127	82.321	75.519

Table 7:

Comparison for Su et al. [23] with the Proposed Approach.

S no.	Method	Dataset used	Parameter
S no.	Method	Dataset used	Precision	Recall	Execution time
1	Su et al. [23]	Seven classes of different categories of 200 images each (1400 images)	0.910	–	1.184667 s (for each class of 200 images)
2	Baseline framework using RF (with transformations upon query image)	FlickrLogos-32 PLUS dataset	0.938	0.897	1.6 min
3	Proposed approach (framework using PSO+SOM+RF) (with transformations upon query image)	FlickrLogos-32 PLUS dataset	0.968	0.911	0.5 min

Table 8:

Comparison for Iandola et al. [10] and Bao et al. [4] with the Proposed Approach.

S no.	Method	Dataset used	Parameter
S no.	Method	Dataset used	Mean average precision	Accuracy
1	Iandola et al. [10] (a) FRCN+AlexNet (b) FRCN+VGG16	FlickrLogos-32 dataset	73.5% (b) 74.4%	89.6% (highest with GoogLeNet-GP architecture)
2	Bao et al. [4]	FlickrLogos-32 dataset	84.2%	–
3	Proposed approach (framework using PSO+SOM+RF with machine learning)	FlickrLogos-32 PLUS dataset	96.8%	93.12%

The comparison in Table 7 illustrates that the performance of the system has improved appreciably after the integration of the PSO and SOM strategies at the preprocessing stage with the baseline framework based on RF. The implementation of these techniques needs one-time cost but results in reduced search space, due to which the execution time has dropped by 68.75%, when compared with the baseline model. Also, the results are improved in terms of precision and recall as depicted in the table.

The proposed approach is compared with the recent works based upon deep learning mechanisms suggested by Iandola et al. [10] and Bao et al. [4] for trademark recognition and retrieval. As depicted in Table 8, the performance of the proposed method is much better than that of the other two existing methods.

6 Conclusion

The proposed trademark retrieval system proved to be robust to various geometric transformations, when tested for various sets of geometric transformations on the query image. Its performance results are significant in terms of retrieval accuracy and execution time, because of the integration of the PSO and SOM strategies with the RF technique at the preprocessing stage. PSO has helped to narrow the search space, while SOM clustered this optimized database feature set into similar groups, due to which the total execution time required by the system after the implementation of RF has reduced significantly. In the future scope of the work, the variant of PSO can be suggested instead of standard PSO, which can guarantee the optimization.

Bibliography

[1] A. Alaei and M. Delalandre, A complete logo detection/recognition system for document images, in: Proceedings of Eleventh IAPR International Workshop on Document Analysis Systems (DAS), pp. 324–328, 2014.10.1109/DAS.2014.79Search in Google Scholar

[2] Q. Bai, Analysis of particle swarm optimization algorithm, Comput. Inform. Sci.3 (2010), 180–184.10.5539/cis.v3n1p180Search in Google Scholar

[3] M. Bagheri, Q. Gao and S. Escalera, Logo recognition based on the Dempster-Shafer fusion of multiple classifiers, In: Advances in Artificial Intelligence, Lecture Notes in Computer Science. 7884. pp. 1–12, Springer, 2013.10.1007/978-3-642-38457-8_1Search in Google Scholar

[4] Y. Bao, H. Li, X. Fan, R. Liu and Q. Jia, Region-based CNN for logo detection, in: ICIMCS, August 19–21, 2016, Xian, China, ACM, ISBN 978-1-4503-4850-8/16/08, DOI: http://dx.doi.org/10.1145/3007669.3007728.10.1145/3007669.3007728Search in Google Scholar

[5] S. Bianco, M. Buzzelli, D. Mazzini and R. Schettini, Logo recognition using CNN features, in: Image Analysis and Processing ICIAP 2015, pp. 438–448, Springer, 2015.10.1007/978-3-319-23234-8_41Search in Google Scholar

[6] C. Eggert, A. Winschel and R. Lienhart, On the benefit of synthetic data for company logo detection, In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, pp. 1283–1286, ACM, 2015.10.1145/2733373.2806407Search in Google Scholar

[7] S. C. Hoi, X. Wu, H. Liu, Y. Wu, H. Wang, H. Xue and Q. Wu, Logo-net; Large-scale deep logo detection and brand recognition with deep region-based convolutional networks, arXiv preprint arXiv:1511.02462, 2015.Search in Google Scholar

[8] X. Hou and H. Shi, Logo recognition using textual and visual search, Int. J. Multimed. Appl.4 (2012), 51–60.10.5121/ijma.2012.4504Search in Google Scholar

[9] M. Hussain and J. P. Eakins, Component-based visual clustering using the self-organizing, Neural Netw.20 (2007), 260–273.10.1016/j.neunet.2006.10.004Search in Google Scholar

[10] F. N. Iandola, A. Shen, P. Gao and K. K. Deeplogo, Hitting logo recognition with the deep neural network hammer, arXiv preprint arXiv: 1510.02131, 2015.Search in Google Scholar

[11] K. Kameyama, N. Oka and K. Toraichi, Optimal parameter selection in image similarity evaluation algorithms using particle swarm optimization evolutionary computation, In: CEC 16-21, July 2006, ISBN: 0-7803-9487-9.Search in Google Scholar

[12] T. Kohonen, Self-organizing Maps, 2nd ed., Springer-Verlag, New York, 1997.10.1007/978-3-642-97966-8Search in Google Scholar

[13] J. Laaksonen, M. Koskela, S. Laakso and E. Oja, PicSOM-content-based image retrieval with self-organizing maps, Pattern Recogn. Lett.21 (2000), 1199–1207.10.1016/S0167-8655(00)00082-9Search in Google Scholar

[14] J. Laaksonen, M. Koskela, S. Laakso and E. Oja, Self-organising maps as a relevance feedback technique in content-based image retrieval, Pattern Anal. Appl.4 (2001), 140–152.10.1007/PL00014575Search in Google Scholar

[15] F. Long, H. Zhang and D. D. Feng, Multimedia information retrieval and management, chapter 1, In: Fundamentals of Content-Based Image Retrieval, pp. 1–26, Springer, Berlin, 2003.10.1007/978-3-662-05300-3_1Search in Google Scholar

[16] L. Ma, L. Lin and M. Gen, A PSO-SVM approach for image retrieval and clustering, In: 41st International Conference on Computers & Industrial Engineering, Los Angeles, California, USA, October 23–25, 2011, ISBN: 978-1-62748-683-5.Search in Google Scholar

[17] M. Okayama, N. Oka and K. Kameyama, Relevance optimization in image database using feature space preference mapping and particle swarm optimization, In: ICONIP, Part II, LNCS, 49850, pp. 608–617, Springer-Verlag, Berlin, 2008.10.1007/978-3-540-69162-4_63Search in Google Scholar

[18] L. Pinjarkar, M. Sharma and K. Mehta, Comparative evaluation of image retrieval algorithms using relevance feedback and its applications, Int. J. Comput. Appl.48 (2012), 12–16.10.5120/7447-0448Search in Google Scholar

[19] J. Revaud, M. Douze and C. Schmid, Correlation-based burstiness for logo retrieval, in: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968, ACM, 2012.10.1145/2393347.2396358Search in Google Scholar

[20] S. Romberg and R. Lienhart, Bundle min-hashing for logo recognition, In: Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, pp. 113–120, ACM, 2013.10.1145/2461466.2461486Search in Google Scholar

[21] S. Romberg, L. G. Pueyo, R. Lienhart and R. Van Zwol, Scalable logo recognition in real-world images, In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval, pp. 25–32, ACM, 2011.10.1145/1991996.1992021Search in Google Scholar

[22] M. Rusiñol, D. Aldavert, D. Karatzas, R. Toledo and J. Lladós, in: Proceeding ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval, pp. 1–12, Dublin, Ireland, 2011.Search in Google Scholar

[23] J. H. Su, W. J. Huang, P. S. Yu and V. S. Tseng, Efficient relevance feedback for content-based image retrieval by mining user navigation patterns, IEEE Trans. Knowl. Data Eng.23 (2011), 360–372.10.1109/TKDE.2010.124Search in Google Scholar

[24] Z. Su, H. Zhang, S. Li and S. Ma, Relevance feedback in content-based image retrieval: Bayesian framework, feature subspaces, and progressive learning, IEEE Trans. Image Process.12 (2003), 924–937.10.1109/TIP.2003.815254Search in Google Scholar PubMed

[25] P. N. Suganthan, Shape indexing using self-organizing maps, IEEE Trans. Neural Netw.13 (2002), 835–840.10.1109/TNN.2002.1021884Search in Google Scholar PubMed

[26] J. Vesanto, J. Himberg, E. Alhoniemi and J. Parhankangas, Self-organizing map in Matlab: the SOM toolbox, In: Proceedings of the MATLAB DSP Conference, November 16–17, Espoo, Finland, pp. 35–40. 1999,Search in Google Scholar

[27] Z. Wang and K. Hong, A novel approach for trademark image retrieval by combining global features and local features, J. Comput. Inform. Syst.8 (2012), 1633–1640.Search in Google Scholar

[28] C. H. Wei, Y. Lib, W. Y. Chaub and C. T. Li, Trademark image retrieval using synthetic features for describing global shape and interior structure. Pattern Recogn.42 (2009), 386–394.10.1016/j.patcog.2008.08.019Search in Google Scholar

[29] B. Xue, M. Zhang and W. N. Browne, Particle swarm optimization for feature selection in classification: a multi-objective approach, IEEE Trans. Cybern.43 (2013), 1656–1671.10.1109/TSMCB.2012.2227469Search in Google Scholar PubMed

Received: 2017-02-01

Published Online: 2017-07-20

Published in Print: 2018-01-26

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Novel Relevance Feedback Approach for Color Trademark Recognition Using Optimization and Learning Strategy

Abstract

1 Introduction

2 Novelty and Contribution of the Work

3 Related Work

4 Proposed Approach

4.1 Generating the Dataset of Feature Vectors

4.2 Optimizing Feature Sets

4.3 SOM for Creating Clusters

4.4 RF Implementation

4.4.1 NQP Generation

4.4.2 QRW

4.4.3 QDE

5 Experiments and Results

5.1 Experimental Setup

5.2 Time Complexity Analysis

5.3 Retrieval Results

6 Conclusion

Bibliography

Journal and Issue

Articles in the same Issue