Early identification of emerging technologies: A machine learning approach using multiple patent indicators

https://doi.org/10.1016/j.techfore.2017.10.002Get rights and content

Highlights

  • Proposing a machine learning approach to identifying emerging technologies at early stages

  • Defining 18 input and 3 output variables from the United States Patent and Trademark Office database

  • Employing feed-forward multilayer neural networks to capture nonlinear relationships between input and output variables

  • Developing two quantitative indicators to identify trends of a technology's emergingness

Abstract

Patent citation analysis is considered a useful tool for identifying emerging technologies. However, the outcomes of previous methods are likely to reveal no more than current key technologies, since they can only be performed at later stages of technology development due to the time required for patents to be cited (or fail to be cited). This study proposes a machine learning approach to identifying emerging technologies at early stages using multiple patent indicators that can be defined immediately after the relevant patents are issued. For this, first, a total of 18 input and 3 output indicators are extracted from the United States Patent and Trademark Office database. Second, a feed-forward multilayer neural network is employed to capture the complex nonlinear relationships between input and output indicators in a time period of interest. Finally, two quantitative indicators are developed to identify trends of a technology's emergingness over time. Based on this, we also provide the practical guidelines for implementation of the proposed approach. The case of pharmaceutical technology shows that our approach can facilitate responsive technology forecasting and planning.

Introduction

Emerging technologies are of great interest to a wide range of stakeholders in both industry and government who aim to set up investment-related strategies (Rotolo et al., 2015). The existing literature has shown that patent citation information is useful for measuring the economic value of a technology (Lerner, 1994, Narin et al., 1987). In this respect, many methods – such as cluster analysis, association rule mining, and conjoint analysis – have been employed to identify emerging technologies using patent citation information. However, the outcomes of previous studies are not forward-looking because most have been limited to ex post evaluation which measures past performance, impacts, or consequences (Lee et al., 2016). The value of predictive analysis for identifying emerging technologies has seldom been addressed.

Arguably, the most scientific approaches to identifying emerging technologies use curve fitting techniques (Daim et al., 2006, Shin et al., 2013) and stochastic models (Jang et al., 2017, Lee et al., 2011, Lee et al., 2012, Lee et al., 2016, Lee et al., 2017) to show future-projected trends of a technology by estimating the future citation counts of the relevant patents as a quantitative proxy. Curve fitting techniques using least squares estimation or least absolute deviation fit growth curves to time-series patent citation data and extrapolate those curves beyond the range of the data, whereas stochastic models estimate probability distributions of patent citations in the future by analysing fluctuations observed in historical data. However, the outcomes of these methods are likely to reveal no more than current key technologies, since they can only be performed at later stages of technology development due to the time required for patents to be cited (or fail to be cited) (Haupt et al., 2007). It should be noted that the time lag between citing and cited patents is found to be between 4 and 5 years on average (Verspagen and De Loo, 1999), and the latest patents have naturally less chance to be cited by other patents (Karki, 1997). Moreover, these methods have been criticised due to their reliance on making assumptions about pre-determined growth curves and probability distributions (Jang et al., 2017, Lee et al., 2011, Lee et al., 2012, Lee et al., 2016, Lee et al., 2017, Shin et al., 2013), which are difficult to identify at early stages of technology development and are heterogeneous across technologies. Hence, curve fitting techniques and stochastic models are of little practical assistance in identifying emerging technologies, especially when a technology is at its early stages and there is no historical data (Jang et al., 2017).

As a remedy, we propose a machine learning approach to identifying emerging technologies at early stages using multiple patent indicators that can be defined immediately after the relevant patents are issued. Economic and innovation literature has presented a wide range of patent indicators – such as patent family and originality – that may be indicative of the future citation count of patents and that further the relevant technology's economic value (Lerner, 1994, Narin et al., 1987). The tenet of this research is that analysis of those patent indicators can provide evidence for a patent's value and further the relevant technology's value in the future. For this, first, a total of 18 input and 3 output indicators are extracted from the United States Patent and Trademark Office (USPTO) database. Second, a feed-forward multilayer neural network – that is a supervised machine learning technique inspired by attempts to model the neuro-physical structure of the human brain – is employed to capture the complex nonlinear relationships between input and output indicators in a time period of interest. The primary advantage of this method for identifying emerging technologies is its ability to infer a function from observations (Buscema et al., 2017). It should be noted that there is no theoretical understanding of the relationships between those patent indicators, and moreover, the complexity and nonlinearity associated with innovation processes makes the design of a certain function impractical (Chen et al., 2012). Finally, two quantitative indicators are developed to identify trends of a technology's emergingness over time. Based on this, we also provide the practical guidelines for the implementation of our approach in terms of the choice of machine learning models and model update.

We applied the proposed approach to support Korean small and medium-sized high tech companies in technology forecasting at the request of the Korea Institute of Science and Technology Information (KISTI). We adopted the USPTO database for this research, since it contains the most representative data for analysing international technology (Lee et al., 2013). Our experience showed that the proposed approach can find emerging technologies at early stages, using the limited patent indicators that can be defined and extracted immediately after the relevant patents are issued. Our method also enabled us to perform systematic and continuous monitoring of emerging technologies, yielding high potential benefits at relatively low cost. Moreover, the results of our case study enabled us to identify a way to improve the proposed approach, which we expect to be a useful complementary tool to support experts' decision making in emerging technologies, especially for small and medium-sized high-tech companies. We believe that the systematic process and quantitative outcomes our approach offers can facilitate responsive technology forecasting and planning.

This paper is organised as follows. Section 2 presents the background to our research and Section 3 explains the research framework and methodology, which are then illustrate by a case study on pharmaceutical technology in Section 4. Section 5 provides the guidelines for implementation of our approach. Finally, Section 6 offers our conclusions.

Section snippets

Definitions and characteristics of emerging technologies

Although emerging technologies have been the subject of many previous studies, there is no consensus as to what qualifies a technology to be emergent (Rotolo et al., 2015). As Table 1 reports, the definitions and concepts of emerging technologies presented by a number of studies overlap, but at the same time, point to different characteristics. For instance, Day and Schoemaker (2000) defined an emerging technology as a science-based innovation that has the potential to create a new industry or

Methodology

Fig. 1 shows the overall process of the proposed approach. Given the complexities involved, the proposed approach is designed to be executed in four discrete steps: data collection and pre-processing; defining and extracting patent indicators; assessing the value of patents; and identifying trends of a technology's emergingness.

Overview

We conducted a case study of pharmaceutical technology for three reasons. First, a patent normally equals a product in the pharmaceutical industry so that the technological value of a patent is directly related to its commercial value (Chen and Chang, 2010). Second, patent management activities such as valuation and protection is especially important in this industry compared to those of other industries since the manufacturing process is relatively easy to replicate and can be copied with a

Guidelines for implementation of our approach

Newly developed methods should be carefully deployed in practice. There are many issues to be considered for practical implementation. First, we employed classification models to assess the value of patents since predicting the exact future citation count of a patent is not the focus of our analysis. However, the value of patents can also be assessed by using regression models with such performance metrics as mean absolute error (MAE) and mean absolute percentage error (MAPE). Second, although

Conclusions

This study has proposed a machine learning approach for identifying emerging technologies at early stages using multiple patent indicators that can be defined immediately after the relevant patents are issued. The central tenet of the proposed approach is that patent indicators – such as patent family and originality – can provide evidence for a patent's value and further the relevant technology's value in the future. To this end, a total of 18 input and 3 output indicators were extracted from

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIP) (No. 2017R1C1B2011434) and supported by the Future Strategic Fund (No. 1.140010.01) of Ulsan National Institute of Science and Technology (UNIST).

Changyong Lee is an associate professor of the School of Management Engineering at Ulsan National Institute of Science and Technology (UNIST). He holds a BS in computer science and industrial engineering from Korea Advanced Institute of Science and Technology, and a PhD in industrial engineering from Seoul National University. His research interests lie in the areas of applied data mining and machine learning techniques, future-oriented technology analysis, robust technology planning,

References (70)

  • Y.S. Chen et al.

    Nonlinear influence on R&D project performance

    Technol. Forecast. Soc. Chang.

    (2012)
  • P. Criscuolo et al.

    Does it matter where patent citations come from? Inventor vs. examiner citations in European patents

    Res. Policy

    (2008)
  • T.U. Daim et al.

    Forecasting emerging technologies: use of bibliometrics and patent analysis

    Technol. Forecast. Soc. Chang.

    (2006)
  • H. Ernst

    Patent information for strategic technology management

    World Patent Inf.

    (2003)
  • R. Genuer et al.

    Variable selection using random forests

    Pattern Recogn. Lett.

    (2010)
  • M. Gevrey et al.

    Review and comparison of methods to study the contribution of variables in artificial neural network models

    Ecol. Model.

    (2003)
  • D. Guellec et al.

    Applications, grants and the value of patent

    Econ. Lett.

    (2000)
  • D. Harhoff et al.

    Citations, family size, opposition and the value of patent rights

    Res. Policy

    (2003)
  • R. Haupt et al.

    Patent indicators for the technology life cycle development

    Res. Policy

    (2007)
  • H.J. Jang et al.

    Hawkes process-based technology impact analysis

    J. Inf. Secur.

    (2017)
  • G.H. Jeong et al.

    A qualitative cross-impact approach to find the key technology

    Technol. Forecast. Soc. Chang.

    (1997)
  • Y. Jeong et al.

    Forecasting technology substitution based on hazard function

    Technol. Forecast. Soc. Chang.

    (2016)
  • J. Joung et al.

    Monitoring emerging technologies for technology planning using technical keyword based analysis from patent data

    Technol. Forecast. Soc. Chang.

    (2017)
  • Y. Ju et al.

    Patent-based QFD framework development for identification of emerging technologies and related business models: a case of robot technology in Korea

    Technol. Forecast. Soc. Chang.

    (2015)
  • M.M.S. Karki

    Patent citation analysis: a policy analysis tool

    World Patent Inf.

    (1997)
  • H. Kim et al.

    Concentric diversification based on technological capabilities: link analysis of products and technologies

    Technol. Forecast. Soc. Chang.

    (2017)
  • S. Lee et al.

    Business planning based on technological capabilities: patent analysis for technology-driven roadmapping

    Technol. Forecast. Soc. Chang.

    (2009)
  • H. Lee et al.

    Technology clustering based on evolutionary patterns: the case of information and communications technologies

    Technol. Forecast. Soc. Chang.

    (2011)
  • C. Lee et al.

    A stochastic patent citation analysis approach to assessing future technological impacts

    Technol. Forecast. Soc. Chang.

    (2012)
  • C. Lee et al.

    Novelty-focused patent mapping for technology opportunity analysis

    Technol. Forecast. Soc. Chang.

    (2015)
  • C. Lee et al.

    Stochastic technology life cycle analysis using multiple patent indicators

    Technol. Forecast. Soc. Chang.

    (2016)
  • Z. Ma et al.

    Patent application and technological collaboration in inventive activities: 1980–2005

    Technovation

    (2008)
  • M. Meyer

    Are patenting scientists the better scholars?: an exploratory comparison of inventor-authors with their non-inventing peers in nano-science and technology

    Res. Policy

    (2006)
  • F. Narin et al.

    Patents as indicators of corporate technological strength

    Res. Policy

    (1987)
  • J.D. Olden et al.

    An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data

    Ecol. Model.

    (2004)
  • Cited by (153)

    View all citing articles on Scopus

    Changyong Lee is an associate professor of the School of Management Engineering at Ulsan National Institute of Science and Technology (UNIST). He holds a BS in computer science and industrial engineering from Korea Advanced Institute of Science and Technology, and a PhD in industrial engineering from Seoul National University. His research interests lie in the areas of applied data mining and machine learning techniques, future-oriented technology analysis, robust technology planning, intellectual property management, and service science.

    Ohjin Kwon is a director of Centre for Future Information R&D at Korea Institute of Science and Technology Information. He obtained a BS and MS in computer science at Kwangwoon University, and a PhD in computer science at University of Seoul. His research areas are information systems, technology intelligence, and patent analysis.

    Myeongjung Kim is a PhD student of School of Management Engineering at UNIST. He holds a BS in business administration from UNIST. His research interests include applied data mining and machine learning, technology intelligence, and intellectual property management.

    Daeil Kwon is an assistant professor of system design and control engineering at UNIST. He received his PhD in mechanical engineering from the University of Maryland, and his BS in mechanical engineering from Pohang University of Science and Technology. His research interests include prognostics and health management of electronics, reliability modelling, and use condition characterisation.

    View full text