1 Introduction

The volume of data stored in, downloaded from and shared via the Internet is increasing rapidly. As reported in [17] the storage capacity of the Internet “is doubling in size every two years”. This is chiefly due to the growing number of devices connected to the Internet, including mobile phones, computers and tablets as well as other types of smart devices, “from minuscule chips to mammoth machines that use wireless technology to talk to each other (and to us)” [16]. This hard to imagine source of big data, mainly represented by images and video sequences (the most rapidly-growing types of media), provides incredible but real opportunities for building heterogeneous intelligence systems [10]. The first issues which need to be addressed are content-oriented data analysis, enrichment, retrieval and recommendation. A new content discovery platform, known as the IMCOP system, which uses these and other technologies, is presented in this paper.

The IMCOP system is the result of an international collaboration within the framework of the second joint Polish-Israeli R&D project titled "Intelligent Multimedia System for Web and IPTV Archiving. Digital Analysis and Documentation of Multimedia Content". According to initial assumptions, the capabilities of the IMCOP system should include multimedia data aggregation, analysis and processing. The implementation of these processes must be extremely flexible to fulfill the requirements of different IMCOP applications. According to customer needs, the IMCOP system should be able to perform different kinds of processing and content analysis of multimedia data aggregated using various Internet data sources. Content analysis should enrich the data – extend the metadata list – to confirm the relationship between the data and the subject matter and find its content-related connections with other data. Thanks to this flexibility, the IMCOP system should be able to address customer demands regarding relevant content and ways of presenting it to their users.

The system presented in this paper meets all these initial assumptions and expectations of the IMCOP project. For example, various types of aggregated data comprising text, still images, film footage and video sequences are considered. Mechanisms for extensive analysis of all types of aggregated data, including detection and extraction of various features and different classification approaches, were also used. A range of descriptive metadata is extracted in this way to enrich the aggregated data and give the foundation for finding connections. In addition, a flexible and efficient representation of data, known as Complex Multimedia Objects (CMO), maintaining the metadata and the content-related connections, is proposed.

As well as these objectives, the IMCOP project has the following minor goals:

  • to ensure that the IMCOP system is platform-independent and capable of incorporating external services to improve its efficiency and increase its intelligent facilities, for example by removing duplicate images from the database [9],

  • to guarantee scalability despite vast numbers of multimedia objects processed,

  • to make absolutely certain that the system is legal in terms of copyright law and that the processed objects and their content are fully protected against copying, reproducing, modifying and other forms of authentication rights violation.

It was also agreed between the project partners that the IMCOP system should comply with the Data Enrichment and Engagement Platform (DEEP), which was largely developed by the Israeli partner as part of their project activities. According to [21], the DEEP platform is “a revolutionary new solution (which) resolves the complexities of content discovery, recommendation, usability and engagement all at once, and consumers already know how to use it”. All these values can be verified through the DEEP Magazines application which is its final product [11].

The DEEP Magazines app shows how an advanced and professionally-made end-user application of the IMCOP system could work and look. However, it should be stressed that the platform’s capabilities are significantly broader than those of the DEEP platform. The DEEP platform, according to the DEEP Magazines app requirements, is designated to collect, analyze and select images and short descriptive information (e.g. news) which stay in relation to “the hottest stories about celebs, actors, movies and TV shows” [23]. In contrast, the IMCOP system can be categorized as a comprehensive and versatile content discovery and delivery platform which is capable of addressing topics of any kind. The IMCOP platform also has other functionalities. It can be applied for instance as a multimedia indexing (labeling) application, as e.g. Imagga (http://imagga.com/), or as a reverse search engine tool, as e.g. TinEye (https://www.tineye.com/).

The remaining part of this paper is organized as follows. The next section presents background materials and a literature review within the framework of content discovery platforms. The overall architecture with an insight into various categories of the IMCOP web services and the concept of Complex Multimedia Objects are introduced in the sub-sections of Section 3. Section 4 reports on the IMCOP system performance. Final conclusions and potential future improvements are presented in Section 5.

2 Background and literature review

According to the Wikipedia definition [22], a content discovery platform “is an implemented software recommendation platform which uses recommender system tools”. As defined, in turn, in [24], a content-based recommender system is a system “that recommend an item to a user based upon a description of the item and a profile of the user’s interests”. In general, there are three main approaches to recommendation system design: collaborative filtering [29], content-based filtering [4] and hybrid, where the two former approaches are combined [27]. In fact, the definitions cover a range of solutions with different goals, domains and approaches using computer science techniques such as data mining, information retrieval and filtering, machine learning, artificial intelligence and so on.

Google Scholar, Pubnet, CiteSeer and Web of Science are the leading scientific and academic literature search and recommender engines. The platforms work in two key ways. In Google Scholar, papers related to the user’s research interests (notified as Scholar Updates) are found through a statistical analysis according to “what your work is about, the citation graph between articles, the fact that interests can change over time, and the authors you work with and cite” [12]. The other way of finding relevant articles in Google Scholar is setting user alerts. Despite noticeable differences, both methods are content-based filtering approaches which use a record of the researcher’s authored papers and citations. This approach is effective for researchers who have published many papers; however, it provides poor results for others, such as graduate students.

In other types of recommender engines related to academic content such drawbacks do not occur. One such tool is PubChase. “PubChase suggests articles from PubMed on the basis of a user’s publishing record, but it also learns from the articles that the user has read and stored in his or her online library […] it adds another machine-learning technique: comparing this library with other people’s collections, with the logic that people with common research interests might benefit from each others’ preferences” [18].

Content discovery and personal recommendation have also become crucial to the global pay-TV industry due to the proliferation of choice offered by the television of today, including video-on-demand, personalized video recording, streaming services and web-based content. In addition, “the global pay-TV market is projected to grow from more than 900 million subscribers in 2014 to 1.21 billion by 2022” [19]. Such a major and interactive marketplace is an important challenge for operators, who are forced to look for new ways of satisfying customers and holding their attention for longer.

Platforms where “a blend of sophisticated content discovery algorithms, personalized recommendations are offered to drive additional purchases,” as in COMPASS from Viacess-Orca, are appropriate solutions within this context. As reported in [20], collaborative filtering, external rating and operator’s promotions are the main algorithms driving the COMPASS engines. Additionally, an algorithm known as related content, where “similar item recommendations based on content metadata (actors, directors, years, countries, etc.) and keywords” are proposed, is also essential. The Kannuu content discovery platform (http://www.kannuu.com/) is another example of a successful and innovative integration of various recommendation layers. In the Kannuu system, similar title recommendations are achieved using a proprietary metadata analysis which aims to find connections between titles.

Similarly to the COMPASS engine, metadata analysis performed in the Kannuu system comprises (for example) genre, cast, director, date and expanded keyword lists. A similar approach, based on collaborative filtering where profiling and behavioral data are used, is also applied as the social recommendations layer. This is the key difference between the Kannuu and COMPASS systems, at the social recommendations layer is integrated with social networking sites. This means Kannuu users can provide and receive recommendations from people in their social media circles. In turn, ratings, in particular personal, and streaming history are the only key algorithms in Netflix – one of the most successful providers of streaming media and video on demand in the world. Since personal ratings and past viewing habits are insufficient when used for a single user, Netflix combines ratings of all its users with similar tastes.

Personal ratings are a type of personal data collected by TV operators. However, personal data transferred to providers must be limited and reasonably selected to prevent eavesdropping such as was found in Samsung [15]. Therefore, platforms where algorithms from various levels are combined, as in the COMPASS or Kannuu recommender engines, are more efficient (their recommendations are more relevant) and more secure.

Another market where content discovery plays an important role is online publishing, mainly of blogs, podcasts, websites, etc. Outbrain, Taboola and Google AdSense are the leading platforms here. They recommend content via online media, from the largest and most widely respected such as “People” (in the case of Outbrain) and “USA Today” (in the case of Taboola), to those with limited audiences such as “BuildEazy” (in the case of AdSense). Content discovery algorithms applied in these cases are complex and sophisticated. For example, according to Outbrain, its recommender engine is based on more than 50 algorithms which are run in parallel to determine a set of candidate recommendations. These algorithms are categorized into four main groups: contextual, behavioral, personal and popular [14]. Contextual algorithms aim to find relationships (identify connections) between recommended content and that found on a target website. The Solr search engine [13] is used at this stage. The aim of behavioral algorithms is to learn the statistical behaviors of users. This can be achieved by simply signing the most visited or most rated documents on a site or by applying collaborative filtering methods. The latter approach lets users access other content “liked” by people. In turn, personal algorithms learn user properties and history. Personalization is applied in Outbrain using cookies. Recommendations of each type, given independently by behavioral, contextual and personal algorithms, are then processed (using machine learning techniques) to select the most relevant ones.

3 IMCOP platform architecture

As stated in Section 1, the IMCOP system is a capable content discovery platform. As such, it can use user metadata to discover customer-relevant content which can be delivered to websites, mobile devices, set-top boxes, etc. In this domain, IMCOP aspires in part to operate similarly to Outbrain and Taboola – the largest content discovery startups in Israel. IMCOP also uses specialized and sophisticated algorithms to select the most relevant content. Although IMCOP does not currently apply algorithms which could be categorized as behavioral and personal, it uses an assortment of intelligent automated multimedia content analysis algorithms from a range of computer vision techniques. As such, IMCOP goes beyond typical content discovery platforms and also serves as a data enrichment platform.

The IMCOP platform is a distributed system based on Service-Oriented Architecture (SOA). IMCOP services are RESTful web services, implemented according to the Representational State Transfer (REST) architectural style. As such, IMCOP services are self-contained applications with their own REST-based interface. In addition, they are fully independent from operating systems and platforms on which they are implemented and run. This means they are also scalable, fast and modifiable. All IMCOP services have been developed in Java according to the original MESCore library, which provides API, programming specifications and code examples within SDK for developers. As the MESCore library is open, IMCOP services can be freely added, edited or improved by third-party developers. This also implies that IMCOP’s capabilities can be accessed and extended by external companies and institutions interested in a partnership with the IMCOP team. The main categories of IMCOP’s services are as follows:

  • Metadata Enhancement Services (MES) – specialized services, mainly in multimedia data analysis and enrichment (there are also other types of MES driven services such as management and connection),

  • Data Aggregation Services (DAS) – mainly used for web crawlers which extract and collect data from the web as well as exchanging data with the IMCOP database, known as the Data Repository (DR).

Regardless of the category, the IMCOP services can be run in heterogeneous environments, e.g. MS Windows and Linux (×32 or ×64) using virtual machine applications. In other words, there is no need to unify the IMCOP software components. This makes them easy to implement and integrate with the rest of the system. According to the MESCore library, IMCOP services are in fact wrapper functions which call other specialized applications or processes. These applications or processes can be written and compiled, independently of each other, on any platform, e.g. Java, .NET, native C++, Phyton, etc., and then used directly in target web applications. An overall architecture of the IMCOP system is depicted in Fig. 1.

Fig. 1
figure 1

The overall IMCOP platform architecture

3.1 Metadata enhancement services

There are many different kinds of Metadata Enhancement Services (MES) in the IMCOP system. The majority focus on content discovery and metadata enhancement. MES Services of this kind are dedicated to perform selected operations from the scope of text and signal (image and video) processing. Text processing operations mainly include detection and localization of areas where text features are present; they also conduct semantically-organized and dictionary-driven text recognition. The list of image, frame and video processing applications is as follows:

  • image transforms for detecting, extracting and calculating descriptors for various types of local features, e.g. SIFT, SURF [2], MSER, Piecewise-linear [1], CEDD [7], CLD, EHD and SCD [5], FCHT [8],

  • algorithms for detecting and recognizing (e.g. using local feature descriptors listed above) different kinds of objects, content and scenes, including faces, bodies, nudity, dress color, sky, images with the Bokeh effect, landscapes or pictures with man-made structures (buildings, monuments, etc.), logos and visual watermarks, etc.,

  • procedures for estimating similarities between images or their selected regions of interest [9] and evaluating selected image and video quality metrics, including noise, blur, blockiness, slicing, etc. (acquired from [6, 25]),

  • compression of images using selected compression schemes [28],

  • algorithms designated to analyze and classify faces according to various traits, e.g. profile, presence of red eye, smile and facial hair (to identify unshaven faces), etc.,

  • text, speech and face recognition processes applied mainly in order to index film footage and video sequences.

Descriptors, labels and other values returned by the above applications during data analysis enrich the data. They stand for descriptive metadata added to other descriptive information about data, as e.g. keywords and URIs (in fact, a common scenario is to gather only the URIs of the processed data instead of the data itself) stored in data representation objects (CMO). For clarification, it should be noted that regardless of how metadata is added automatically during data analysis provided by MES services, keywords and URIs are affixed by DAS services during data aggregation. However, aside from automatic data aggregation provided by DAS services, there are also other methods of entering the data into the IMCOP system. For example, data can be entered (individually or in groups) using the GUI of the IMCOP system. GUI and selected MES services of the IMCOP system can be accessed using the following link: https://imcop.pl/. An example view of the IMCOP GUI while sample data is being entered and the results (labels) provided by selected IMCOP services after it has been processed are depicted in Figs. 2 and 3, respectively.

Fig. 2
figure 2

The IMCOP GUI and adding a sample picture to the list of processed objects

Fig. 3
figure 3

Labels given by selected services to the sample image from Fig. 2

Other categories of highly specialized MES services are also incorporated in the IMCOP system. Some, known as Management Services (MS), are control different activities of the other services and manage data-interchange processes inside the system, including Data Repository (DR). Watermark Retrieval and Embedding Services (WRES) are activated in order to mark the processed data with hidden messages, known as IMCOP signatures, and to protect the data against forbidden use, manipulation and sharing with unauthorized end-users.

Connection Services (CS) stand for the final but perhaps most important category of IMCOP services. The aim is to identify relationships (connections) between the processed data. In the case of images, for instance, connections are identified twofold:

  • by matching keywords, URIs and labels given by MES services (the Solr search engine [13] is used at this stage),

  • by analyzing numerical descriptors of selected image features to find the list of stored objects which are similar to the processed image.

From the end-user perspective, the IMCOP system can also be seen as a provider of services in the cloud. However, unlike in standard cloud computing models, IMCOP does not run client applications. Instead, end-user requests activate IMCOP services to prepare recommendations in terms of the desired multimedia content. Other details of IMCOP’s components and their functionalities can be found in preceding articles, e.g. in [3].

3.2 Data aggregation services

There are, in general, different data sources to which particular DAS services can be addressed. Their selection has to meet end-user requirements in terms of multimedia forms and data content relevance. The DEEP-like end-users, for example, need to aggregate and process textual information, images and video sequences related to celebrities, movies and actors. Thus, sources selected for data aggregation in this case should include, for example, selected multimedia data hosting websites (e.g. Getty Images), community-curated knowledge bases and encyclopedias (e.g. Wikipedia), news providers (e.g. BBC), social networking services (e.g. Twitter), etc. According to the IMCOP platform, it currently incorporates DAS services in all the above data sources, except Getty Images which operates as a commercial photo agency. Instead, the list of IMCOP’s DAS services also includes Flickr, Foursquare (https://foursquare.com/), Allocine (http://www.allocine.fr/) and the New York Times.

As DAS services have to be developed with regard to APIs, which differ from data source to data source, there is no single common model for implementing them. They also have to be implemented and configured separately because of the data source authorization requirements, data-interchange protocols and formats (e.g. XML-REST, JSON, PHP), license conditions, etc. At the end of the data aggregation process the CMO objects, which refer to data representation objects, are instantiated. With regard to the IMCOP terminology, the DAS service creates a separate and self-contained CMO object for every single aggregated data point (image, text object, etc.), which is known as a Multimedia Object (MO).

3.3 Complex multimedia objects

As stated in Section 1, CMO objects are dedicated to represent multimedia data in the IMCOP system. CMO objects are content type independent, which means that all data forms processed in the IMCOP system have the same flexible and general XML representation. After instantiation, CMO objects are exchanged between IMCOP MES services according to different schedules. MES activities planned in these schedules depend on end-user requirements and the type of processed data. The general scheme of CMO object processing is illustrated in Fig. 4.

Fig. 4
figure 4

General scheme of CMO object instantiation and processing

The definition of CMO derives from the MPEG-7 multimedia content description standard. Therefore, descriptive metadata such as topic, name, age, date of birth (e.g. in the case of an actor), keywords, brief text, etc., are registered according to MPEG-7 Description Schemes [26]. SIFT, Shape Context, MSER, Piecewise-linear or any other feature descriptors, extracted by dedicated MES services, are stored according to MPEG-7 Descriptor specification. The CMO definition extends the MPEG-7 standard in some respects. The most significant extension refers to connections between data and the way in which pointers to these connections are stored in CMO objects.

As illustrated in Fig. 4, each CMO object present in the IMCOP system has its own Universally Unique Identifier (UUID). UUID identifiers make it possible to distinguish between particular CMO objects, regardless of IMCOP distributed architecture and despite the lack of central coordination. After instantiation by DAS services, CMO objects are passed to MES services where they are processed. As a result, MES driven metadata is added to their properties. Next (or in the meantime – these processes can take place simultaneously) UUID identifiers of objects recognized as related are appended to the list which stands for the list of connected objects.

4 Performance analysis

The IMCOP system needs to be capable of serving a large number of clients. As each client may require many multimedia objects of different content forms, scalability was a major challenge facing IMCOP designers and developers. Although some of the algorithms incorporated by IMCOP services, e.g. those responsible for text and object detection and recognition, are computationally highly expensive, the IMCOP system’s ability to replicate services and to apply concurrent and parallel computing ensures that the IMCOP objectives can be put into practice.

A number of load tests were conducted to verify the above. During these tests, the IMCOP system was subjected to peaks in activity reflecting the likely demands of IMCOP users. A heavy concurrent load on the system was simulated using test plans executed by JMeter applications run from the outside of the IMCOP network, as illustrated in Fig. 5.

Fig. 5
figure 5

Configuration for generating a heavy concurrent load on the IMCOP system

Test plans executed on slave JMeter instances were also diversified as they implemented a range of scenarios involving various types and numbers of IMCOP services. This imitated the usual system load during DEEP magazine preparation.

Selected results of such tests, related to the running time of the processes, are presented in Table 1. The tests were performed in accordance with four different test plans (TP1÷TP4) which were executed iteratively for the growing number of concurrent user requests (N). However, changes in average duration of IMCOP responses (∆t) to executed test plans per single user request are shown instead of directly measured particular time periods (t) for clarity of presentation. Changes in average durations were calculated as follows:

$$ \Delta t=\frac{t_i-{t}_{i-1}}{N_i-{N}_{i-1}}\kern1em for\kern0.75em i=2,\kern0.5em 3,\dots, 8 $$
(1)
Table 1 Changes in average durations of IMCOP responses to executed test plans (TP1÷TP4) per single user request

The last column of Table 1 shows the averages of the ∆t values obtained for each iteration. A plot of a trend line (of an exponential regression type) showing the relationship between the averages \( \overline{\Delta t} \) and the growing number of concurrent user requests N is depicted in Fig. 6.

Fig. 6
figure 6

Trend line of the averages of changes ∆t versus the number of concurrent user requests N

It is clear that the greater the number of concurrent user requests N, the smaller the changes in average duration of IMCOP responses. This is mainly due to concurrent computing and parallel processing utilized in the IMCOP platform. The majority of IMCOP tasks are performed concurrently by particular IMCOP services. For example, image analysis (results shown in Fig. 3.) is performed simultaneously by all the MES services. In turn, MES services evaluating selected image quality metrics, detecting nudity and recognizing text (if present in an image), etc., were implemented using parallel processing.

There is an additional reason why the changes in average duration of IMCOP responses are smaller when the number of concurrent user requests continues to grow. This is because of load balancing which improves the distribution of processes carried out in the IMCOP system across multiple replications of MES services. The feature of the IMCOP system replicating particular services when an overload occurs and the resulting significant improvement in system performance are illustrated in Fig. 7. The plot depicted in Fig. 7 shows the relationship between the averages \( \overline{\Delta t} \) and the growing number of concurrent user requests N, as discussed above, although related to video indexing in the case of the test plan TP5.

Fig. 7
figure 7

Averages of changes ∆t versus the number of concurrent user requests N in test plan TP5

Three of the MES services of the IMCOP system are dedicated to automatic content-based video indexing tasks. Described briefly in Section 3.1, they index audio-video sequences and film footage with regard to:

  • speechrv – speech transcripts obtained using speech recognition techniques,

  • textdrv – text transcripts obtained using text detection methods and recognized using optical character recognition,

  • facerv – actors distinguished using face detection and classification methods.

Algorithms used by these services are more complex and thus more time consuming than those used in test plans TP1÷TP4. To protect the IMCOP system against overload, which may occur when complex processes consuming vast amounts of computational resources are executed, an automatic mechanism of service replication was built into the system. This mechanism is able to multiply instances of particular services and run them in IMCOP cloud and third-party machines.

Fig. 7. shows how the averages of changes ∆t obtained under TP5 test conditions vary with the growing number of service instances. It is clear that system performance increases significantly when the number of instances, instantiated as a result of the replication mechanism, increases to four per service.

5 Summary and conclusions

The IMCOP platform is a service-oriented architecture with a vast number of specialized web services. Distinct functions of IMCOP services which aggregate, analyze and enrich the processed data mean the IMCOP platform is flexible and able to meet customer needs concerning different subjects of demanded content and ways of presenting content to end-users. IMCOP’s openness to third-party services and scalability provided by SOA-driven architecture (using mechanisms of service replication and concurrent and parallel computing) means the system capabilities are unrestricted. The ability of the IMCOP platform to process different multimedia formats (text, still images, audio-video sequences, footage) ensures diversity of information sources and gives the foundation for rich presentation layers of IMCOP end-apps. As such, the IMCOP system outperforms other content discovery platforms in terms of their universality and versatility.

The IMCOP system shares certain features with other content discovery platforms, such as searching for connections (as e.g. in the Kannuu system), related content function (as e.g. in the COMPASS platform) and the Solr engine (as e.g. in Outbrain). However, in contrast to these and other platforms which have limited functionality, the IMCOP platform addresses a range of goals and serves different categories of customers. For example, the content discovery and delivery engine of the IMCOP platform can be used to produce DEEP-like magazines whose subject matter can extend beyond actors, movies and celebrities. Such automatically generated magazines can cover subjects such as cultural events in a given city. Instead of using web-based portals, users can use a DEEP-like mobile app powered by the IMCOP platform. This enables them to find the latest theatre shows and learn about the shows, directors, actors and so on. In this instance, the city is the IMCOP customer while users are the end-user of the app.

The concept of Complex Multimedia Objects (CMO) is another significant difference between the IMCOP system and other content discovery platforms. CMO objects with their Universally Unique Identifiers (UUIDs) which extend the MPEG-7 standard to hold descriptive and descriptor metadata and connection information allow the IMCOP system to exchange data with other systems. To our best knowledge, content discovery platforms described in Section 2 do not offer this capability.

There are certain drawbacks of the current implementation of the IMCOP system which need to be eliminated. Our current efforts aim to improve the accuracy of particular MES services. In addition, other types of MES and DAS services are required to extend and diversify IMCOP capabilities. However, the openness of the IMCOP system means we hope to incorporate new services in collaboration with partners.