1 Introduction

Nanotechnology is an emerging and dynamic field. Analysts argue that it is likely to have a horizontal impact across an entire range of industries and great implications on human health, the environment, sustainability, and national security (Roco and Bainbridge 2001; Lux Research 2007). To assess the perceived potential value of the emerging technology and promote its development, various governments have prioritized nanotechnology in their national agenda for science and technology development. Such a trend has led to an increase in investments in nanotechnology R&D, a rapidly growing body of scientific publications and patent applications, and greater attention to the development of the field by the policy community, industry, and the general public. The contribution from social scientists to the development of nanotechnology has been an increasing number of studies on the characteristics of the newly established field, the dynamics of worldwide R&D activities, and the economic and societal implications of the technology. Through these studies, social scientists aim to help policy makers and general public understand how nanoscience and nanotechnology are developed, how the scientific and technological advances are diffused and what policies and actions should be taken to promote the positive economic and societal impacts but at the same time prevent potential negative consequences. Although a variety of research methodologies are employed, including interviews, social network techniques, bibliometric analyses, and economic analyses, a majority of these studies have relied on nanotechnology publications and patents as the data source.

Publications and patents are good representations of results of scientific and technological endeavors. Scientists communicate their findings through publications. Publications are not the only, but definitely an important component of knowledge generation and dissemination. Similarly, patents are not just legal documents which give a temporary monopoly to an inventor in exchange for disclosing the information of its invention; they are also regarded as a paper trail of technology advancement. Over 50,000 nanotechnology articles have been published annually worldwide in recent years, and more than 2,500 patents are filed at major patent offices such as the European Patent Office.Footnote 1 This wealth of data is desirable for rigorous quantitative analysis, because statistically significant results are attainable. In addition, publications and patents are intrinsically rich information sources which, through the analysis of author and inventor names, institutional and assignees’ addresses, subject classification and patent class, references and citations, enable the measurement of important aspects of scientific and technological activities. Because of the opportunities offered by these rich texts, bibliometrics has become a popular tool for the study of nanotechnology development from a social science perspective. However, some characteristics of publication and patent data could prove to be serious limitations for use in further analysis. For example, the importance of journal articles, books and proceedings are not similar for all scientific disciplines. Likewise, not all inventions meet the patentability criteria set by the patent offices.Footnote 2

In this paper, we contribute to the growing literature in this field by implementing a comprehensive review of more than 120 social science studies on nanoscience and technology, 90% of which analyze publications and patents in nanotechnology. We discuss intensively four debates emerging from these studies and reveal the contrasting views surrounding these four topics, namely; whether nanotechnology is an interdisciplinary field, whether nanoscience and nanotechnology are closely interlinked, whether nanotechnology development is path dependent, and who is winning the global nanorace?

In addition to reviewing these four debates, we provide a thorough analysis of the bibliometric search strategies used in the different studies to harvest nanotechnology publications and patents. These strategies include lexical queries, evolutionary lexical queries, citation analysis, and the use of core journal sets to find nanotechnology articles. We use these strategies to obtain sets of nanotechnology publications published in 2006, for later comparison, from the Web of Science’s Science Citation Index and evaluate the search results on the following aspects: the size of dataset, distribution of the articles across subject areas, countries and institutions and source journals. We found that most of the lexical queries that are compared produce very similar ranking tables when looking at the top subject areas and journals, and the most prolific countries and institutions. The reason that these four strategies produce broadly similar results is that they share a core set of keywords which by and large form the search strategy of Glanzel et al. (2003), and through these keywords a batch of the common publications are harvested. Furthermore, we conclude that the search strategy which identifies nanotechnology publications based on only a relatively small number of core journals would not produce robust analytical results for benchmarking an emerging field such as nanotechnology.

The rest of this paper is organized as follows. Section 2 reviews four debates in the social science literature pertain to nanotechnology development. Section 3 compares the various search strategies used to identify the nanotechnology publications and patents. Section 4 concludes the paper.

2 Four debates in the social science studies of analyzing nanotechnology publication and patent

The social science studies reviewed in this paper research the processes by which nanoscience is conducted and nanotechnology is developed. However, the topics are not only related to scientific discovery, invention and technological development, but also cover knowledge creation, changes in patterns of scientific research and shifts in commercial investment. From the studies we reviewed, we have identified four intellectual debates, each of which draw considerable attention from the scholarly community and is formed by a group of publications respectively. In this section, we discuss the central issues of the four debates, reveal the different views exchanged in the debates and investigate the methodological reasons contributing to these diverse arguments.

2.1 Is nanotechnology an interdisciplinary field?

Science and technology policy makers have high hopes for interdisciplinary research in emerging fields such as nanotechnology, under the assumption that interdisciplinary research will bring breakthroughs solving existing scientific and technological problems, thus enhancing human welfare. This rhetoric, that nanotechnology attracts researchers from different disciplines, and as a result of these interdisciplinary collaborations will generate new ways of thinking and catalyze revolutionary new science, is well received by policy makers and by the scientific community (Rafols and Meyer 2007). However, some critics are skeptical of the motivation for promoting interdisciplinary research. Weingart (2000) points to the fact that the emphasis on interdisciplinary research is driven by political goals and therefore in need of legitimacy. In order to win the support from policy makers, scientists have to invent problem definitions and label their original research projects in accordance with political rhetoric which boasts interdisciplinary research. Given this the skeptical view of interdisciplinary research, one would want to inquire whether the “nano” prefix appears more often in titles of papers, journals, conferences, research projects and programs simply because the research in traditional disciplines has to be relabeled, or whether indeed a new and interdisciplinary field is emerging.

An early bibliometric study by Meyer and Persson (1998) examined nanoscience and technology papers published during 1991–1996 period in line with a journal classification developed by Hicks and Katz (1996) and found that 1,299 out of 5,416 papers were published in interdisciplinary journals. The number of nanotechnology papers published in these journals is only second to the number of the papers published in the field of physical sciences. They then claimed that the share of the interdisciplinary publications in nanotechnology is exceptionally high. However, their conclusion is challenged by Schummer (2004a) on two fronts. First, Meyer and Persson’s sample only includes the “nano-title-papers”, but excludes papers with keywords such as “fullerene”, “quantum wires” and “quantum dots”, which are undisputedly considered as nanotechnology papers as well. Second, if a physics paper is published in an interdisciplinary journal or a multidisciplinary journal such as Science or Nature, it does not necessarily mean that the paper itself is an interdisciplinary or multidisciplinary study.

Schummer (2004a) conducted a co-author analysis of over 600 papers published in 2002–2003 in eight “nanotechnology journals”. He first collected information about the departmental affiliations of the papers’ authors and assigned a discipline to each of them. Schummer then calculated a multidisciplinarity index which is the number of disciplines involved in nanotechnology, related by authorship, in at least 5% of the total number of the papers. He then also calculated an interdisciplinarity index which is defined as the number of papers written by co-authors from at least two or more disciplines divided by the total number of the papers. The eight nanotechnology journals are compared to a typically disciplinary journal The Journal of the American Chemical Society (JACS) on the basis of the two indices mentioned above. The results show that the authors of 63.5% of the papers in the eight nanotechnology journals are from a single discipline. The interdisciplinarity index of the papers from the eight nanotechnology journals is only a little higher than that of JACS. A similar result is reached by comparing the multidisciplinarity indices. Moreover, the eight nanotechnology journals can be classified into groups within classical disciplinary patterns such as “nano-physics journals”, “nano-chemistry journals” and “nano-materials science journals” based on the two indices. Schummer thus concluded that there is no one field of nanotechnology, but several different fields of “nano-physics”, “nano-chemistry”, “nano-electrical engineering”, etc. which are quite unrelated to each other. Leydesdorff (2006)’ study also demonstrated that nanotechnology journals have yet to form a core set of literatures which cite each other actively. Instead, the articles in these journals provide a large number of references to journals in more established fields, supposedly with the purpose of legitimizing the nanotechnology studies.

Using a methodology which connects nanotechnology papers based on the similarity of their references, Bassecoulard, Lelu and Zitt (2007) classified the nanotechnology papers into thematic clusters. They found only a moderate degree of multidisciplinarity of the articles at the thematic cluster level. In fact, only two disciplines: physics and chemistry are involved in those multidisciplinary papers. Meyer (2007) conducted cluster analysis of the Swedish nanotechnology patents granted by USPTO between 1992 and 2001 according to the International Patent Classification (IPC). The patents are grouped together if their IPC subclasses are identical or similar. Several distinct patent groups, such as nano-bio, nano-layers and nano-optics, emerge in the visualization of the analytical results, which then points to the probable fact that the nanotechnology related patents are rather heterogeneous and can be divided into different technology fields. Meyer (2007) also summarized the anecdotal evidences from the cases studies on the firms engaged in nanotechnology. The evidences demonstrated that most nanotechnology firms are only specialized in one particular nano-scale technology, that is, nano-scale technologies are not converged into a single technology field, but are general purpose technologies applied in different industrial sectors.

Rafols and Meyer (2007) contributed to the debate over the interdisciplinarity of nanotechnology by studying five research projects in the field of molecular motors, a specialization within the field of bionanotechnology. They argued that there is a high degree of interdisciplinarity in the cognitive aspects of research such as the use of references and instrumentalities, but a low degree of interdisciplinarity in the social dimension, for instance, the affiliation and background of the researchers. Following the above they argue that interdisciplinarity of nanotechnology is reflected by the bibliometric analyses of citation and references, and thus should be considered more as an epistemic characteristic. Implicitly, Rafols and Meyer’s (2007) analysis explains why studies such as the one by Schummer (2004a), which focus on the social dimension of the field using such as the author affiliations, would conclude exactly the opposite.

In summary, the diversity of the arguments in the debate on the interdisciplinarity of nanotechnology to a great extent result from the lack of consensus on what the definition of interdisciplinarity is. As Rinia (2007) reviewed, measurement of interdisciplinarity can be undertaken at the levels of discipline, department, research group, scientists, journal, and citation. The results may differ according to the methodologies chosen. We argue that this is an example of the methodological difficulties related to the bibliometric approach, and the reason why Schummer (2004a) study based on the affiliation of the authors would reach a different conclusion from Meyer and Persson’s (1998) study based on journal classification. Nevertheless, a bourgeoning number of literature, as reviewed above, show that nanotechnology is not a single homogenous science or technology field, but a variety of nano-scale technologies spanning across various traditional disciplines.

2.2 Are nanoscience and nanotechnology closely interlinked?

The interaction between science and technology in the emerging nanotechnology field is studied intensively in the literature. Questions are asked about whether nanoscience and nanotechnology are closely interlinked so that one can declare the advent of “techno-science” in the field, and whether scientific research and technological development complement each other so that scholars who undertake both activities can achieve better performance. In the studies we reviewed, publications are considered the outcome of scientific research, and as such represent science. Patents, by contrast, are regarded as the output of technological development. Furthermore, publication and patent data are linked together, by matching the authors of publications with the inventors filing patent applications or by matching publications with the patents’ non-patent literature (NPL) references.

By matching nanotechnology related USPTO patents granted between 1976 and 1999 with nanotechnology Science Citation Index (SCI) publications in the period 1991–1996 through patent citation, Meyer (2000a, 2001a) finds that 181 nanotechnology patents cited only 275 nanotechnology papers out of more than 5000 SCI papers. Meyer then concludes that nanoscience and nanotechnology are separate activities that are related to each other in a mediated manner. Focusing on patent citations to general academic articles rather than merely to nanotechnology publications, Hu et al. (2007) and Igami and Okazaki (2007) however argue that nanoscience research provides an increasingly important foundation for nanotechnology innovation. Hu et al. (2007) demonstrate that about 60% of around 50,000 nanotechnology USPTO patents granted between 1976 and 2004 have an average of approximately 18 academic citations. The number of academic citations per patent and the number of academic article citations per journal and year for the top most cited journals have increased significantly in the observation period. Igami and Okazaki (2007) revealed that the ratio of the non-patent literature (NPL) citations to the total amount of citations (including non-patent literature citations and the references to other patents) in EPO nanotechnology patent applications is twice as high as the ratio of NPL citations in the total EPO applications. They therefore contended that scientific research is likely to have a crucial influence in the development of nanotechnology. The difference between these viewpoints is virtually caused by the diverse measurements of science. Hu et al. (2007) and Igami and Okazaki (2007) adopt a broad definition of science by regarding all NPL as a representation of scientific research. In contrast, Meyer (2000a, 2001a) defines science more narrowly by only counting nanotechnology publications. In fact, Verbeek et al.’s (2003) exercise confirms that nanotechnology publications only accounted for a tiny share of the total NPL citations both of the USPTO and of the EPO patents, which is below 0.25%.

No matter what conclusions they reach, the above bibliometric studies all use patent citations as a proxy for the linkage between science and technology. However, the bibliometrics approach is challenged by Meyer’s (2000b) study of 10 nanotechnology patents based on the interviews with the inventors. He found that in only one of the ten cases one can draw a meaningful link between a patent and a particular publication that has stimulated the invention. Meyer contended that a patent citation link in the field of nanotechnology does not necessarily indicate science-dependence of technology, but should be interpreted as an indication of the multifaceted interplay between science and technology.

Investigating the science and technology relationship in the field of nanotechnology from a different angle, Huang et al. (2005), Meyer (2006a, b) and Bonaccorsi and Thoma (2007) studied the performance of patenting scientists and specifically whether they would perform better than their peers who do not apply for patents and whether the patents whose inventors are scientists would have a higher impact than other patents. The underlying assumption is that if nanoscience and nanotechnology complement each other, scientific research and technological development, when undertaken simultaneously, would exert a positive impact on the performance of scientists.

Huang et al. (2005) matched the principal investigators of the awards of the US National Science Foundation (NSF) in the field of nanotechnology with the names of inventors of USPTO nanotechnology patents. They identified 307 principal investigators-inventors who are associated with 760 nanotechnology patents and 628 awards. They found that the patents of the NSF-funded researchers have higher impact factors than those associated with other private and publicly funded groups. The number of cites per NSF-funded inventor is about 10, as compared to 2 for all inventors of USPTO nanotechnology patents. Meyer (2006a, b) collected publication data of nanotechnology research from the SCI-Expanded and patent data from the USPTO for Belgium, the United Kingdom, and Germany, both of which cover the period from 1992 to 2001. He matched the inventors and the authors based on the surnames and initials of the inventors. Meyer found that patenting scientists outperformed their solely publishing, non-inventing peers in terms of publication counts and citation frequency. Bonaccorsi and Thoma (2007) harvested their records from the SCI and the Social Science Citation Index (SSCI) databases, and matched the authors of nanotechnology publications between 1988 and 2001, and the inventors of nanotechnology-related USPTO patents after 1971. They found that the quality of the patents whose inventors have no scientific publication is lower than that of the patents that have at least one inventor who has published scientific papers. Moreover, the inventors who co-apply for patents with academic researchers account for 87% of the top 1% of most productive inventors and 77% of the top 5%. Based on these findings, they contended that complementarity, in terms of having at least one academic collaborator in applying for patents, has a positive impact on inventive performance.

In summary, inconsistent conclusions with regard to whether nanoscience and nanotechnology are closely linked are reached due to the differences in defining science-technology linkages. If one regards the sporadic NPL references in nanotechnology patents to nanotechnology publications as the science-technology linkage, science and technology look only loosely connected. However, if all academic (NPL) citations in nanotechnology patents, which increased rapidly in the past two decades, are taken into account, science seems to play an important role in the development of nanotechnology. Lack of a consensus of what should be measured and the diverse use of different pieces of bibliometric data contained in publication and patent records can convey confusing messages in bibliometric analysis. Nevertheless, the studies reviewed in this section show consistent evidence that scientists engaged in both scientific research and technology development in the field of nanotechnology achieved a better performance, which indicates a positive impact of the interaction between science and technology.

2.3 Is nanotechnology development path dependent?

As we concluded in chapter 2.1 of this paper, nanotechnology is not a single science field but involves knowledge gained from various traditional fields including physics, chemistry, materials science, life science and electrical engineering. Similarly, nanotechnology can be applied in diverse industrial sectors and areas such as textile, aerospace, automobile, energy, environment, information and communications technology sector etc. Given the broad fields that nanotechnology involves and wide areas where its application occurs, one may speculate that the nanotechnology R&D and innovation activities would be observed in a wide range of institutions and companies in the existing or new industries. An opposite case would be that nanotechnology development only emerged in a small number of clusters and regions where strong technological capabilities and active scientific research activities, as a prerequisite of nanotechnology development, is already present. These two contrasting views have been debated in the literature.

In an analogy between nanotechnology and biotechnology, Darby and Zucker (2003) found that U.S. firms become involved in nanotechnology wherever and whenever scientists publish breakthrough academic articles, similar to the case of biotechnology. A regional high average education level is an important determinant of the entry of nanotechnology companies, but the historical levels of venture-capital activity in the region are not. In a different study, Zucker and Darby (2005) argued that nanotechnology is developed from basic science breakthroughs including important instrument inventions such as the scanning probe microscope and the atomic force microscope. Therefore, much of the knowledge remained tacit in nanotechnology as it did in biotechnology in the early stages of this field’s development. In these early stages knowledge was best transmitted by someone trained by one of the scientific discoverers. Darby and Zucker in these two studies argue that commercialization of nanotechnology is dependent on the past scientific research and discoveries in the same US regions. To further confirm the hypothesis of path dependence, Zucker et al. (2007) reveal that regional growth of new knowledge in nanotechnology, as measured by publication and patent counts, has been positively affected by both the size of existing regional stocks of recorded knowledge in all scientific fields and the extent to which tacit knowledge in all fields flows among the institutions of different organizational types. The level of federal funding has a strong influence on the numbers of both publications and patents.

Contributing to the debate, Shapira and Youtie (2008) argued that a variety of factors may impact on the level of nanotechnology R&D and commercialization, which not only include path-dependent stocks of knowledge, but also firm capabilities, finance and other resources and capital investment in facilities, institutional strategies and linkages, and human capital availability. They hypothesized on two contrasting scenarios, i.e. nanotechnology R&D converge on the regions that have large government laboratories or dominant research universities with accumulated advantages; on the contrary, nanotechnology would diverge and spread not only to the regions that are equipped with strong R&D capabilities, but to the others which manage to attract leading scientists and develop high-level scholarly networks. Shapira and Youtie investigated the top 30 US metropolitan areas that lead in nanotechnology research activity, during the 1990–2006 time frame. They found that there is considerable nanotechnology R&D and innovation activity in locations recognized as prominent in previous rounds of technological development, including Boston and Silicon Valley. Yet there is also evidence of divergent activities in other areas with high concentrations of human capital and development of effective networks.

From a perspective of a firm’s knowledge base, Avanel et al. (2007) investigated whether a firm needs to develop new labs, hire new R&D staff, form new alliances or even invest in new location to organize nanotechnology R&D, or it can integrate nanotechnology into its existing R&D projects. Avenel et al. found that small firms rely on utilizing more their existing capabilities to organize their nanotechnology R&D activities. In contrast, large firms expand their knowledge of nanotechnology through building up new capabilities. In other words, nanotechnology R&D in existing small firms is more path dependent, when compared to the similar activities in large firms. In a related comparative study of nanotechnology and biotechnology firms, Rothaermel and Thursby (2007) argue that an incumbent firm’s ability to exploit new methods of invention initially depends on its access to tacit knowledge with regard to the employment of the new methods. Over time, however, as firms learn and/or the knowledge becomes codified in routine procedures or commercially available equipment, inventive output, which is measured by the number of USPTO patents, becomes more dependent on traditional R&D investments. They found that patenting in biotechnology is explained not only by a firm’s knowledge stock in terms of past biotech patenting, but also through knowledge gained from R&D alliances. By contrast, technological change related to nanotechnology would be embodied in physical capital investment such as purchasing expensive equipments, so the incumbent nanotechnology firms depend on R&D investment to exploit the invention. In this sense, the development of nanotechnology in incumbent firms is more dependent on the traditional R&D investment than that of biotechnology, particularly in the early stage of development of the technologies.

To summarize, the evidence presented in the debate shows that considerable nanotechnology innovation activity indeed did take place in locations which were recognized as frontrunners in the previous rounds of new technological development, such as Boston and Silicon Valley. Yet there are also divergent activities in other regions where high-level human capital and effective networks are developed. With regard to knowledge accumulation and technological development at the firm level, studies demonstrate that small firms rely on their existing capabilities to organize the nanotechnology R&D activities, while large firms expand their knowledge-base through the building up of new capabilities. Moreover, compared to biotechnology, the development of nanotechnology in incumbent firms is more dependent on traditional R&D investment, particularly in the early stage of technological development.

2.4 Who is winning the global nanorace?

As a result of the perceived potential of nanotechnology, promoting its development has become an important element of science and technology policy in many countries and transnational organizations such as the European Union and Organization for Economic Co-operation and Development. According to Hullmann (2006b), a global race on nanotechnology R&D funding, scientific publications and patents has started, but the progress of different countries varies. Scholars and policy makers are therefore keen on mapping the worldwide R&D of nanotechnology and benchmarking the strengths and weaknesses of various countries, regions and trade blocs. Examples of such efforts are reports prepared by the European Commission (2003), Warris (2004) for the Australian Academy of Science, Holtum (2005) for the British Engineering and Physical Sciences Research Council, and research articles by Dunn and Whatmore (2002), Compano and Hullmann (2002), Heinze (2004), Santo et al. (2006), Zhou and Leydesdorff (2006), and Miyazaki and Islam (2007). Which country is winning the global nanotechnology race is an intensively examined, and hotly debated, topic in these benchmarking exercises.

In terms of R&D funding of nanotechnology, Hullmann (2007) demonstrates that the US, as a country, provides the largest share of public funding in the world in 2004 and 2005. Taking the EU as a whole, the European commission is the largest individual funding organization worldwide. However, the EU Member States accounted together for a much larger share of the public expenditure in Europe than the European Commission. Among the Member States, Germany was ranked at the top, followed by France and the UK. Regarding the Asian countries, Japan was ranked second in the world, trailing behind the US. South Korea’s funding is comparable to that of the top European countries. China was ranked 8th behind the abovementioned top countries. However, if the funding of individual countries is measured by purchase power parities instead of the nominal exchange rate, China’s rank would be elevated.

With regard to scientific publications, Kostoff and his co-authors (Kostoff et al. 2006a, b, 2007a, b, c, d) in a series of papers analyzed the records of Science Citation Index (SCI), Social Science Citation Index (SSCI) and Engineering Compendex publications and found that although the growth of the global nanotechnology scientific articles is a worldwide phenomenon, the most rapid growth in publications during the past decade has occurred in the East Asian nations, notably China and South Korea. While the United States remains the leader in the production of aggregate nanotechnology research articles, China has achieved parity or has even taken the lead in some selected nanotechnology sub-areas. Following Kostoff and his collaborators’ work, Leydesdorff and Wagner (2009) inquired whether the US is losing ground in nanotechnology research. They analyzed the SCI publications and in particular ten core journals, which they selected, from the field of nanotechnology and argued that the US is still performing better than any other country in terms of highly cited papers and the ratio of citation to publication. They recognized, however, that China has indeed become the second largest nation in both numbers of papers published and citations after the US. Echoing Kostoff’s conclusions, Youtie, Shapira and Porter (2008) revealed that the decline of the relative share of the US and the EU27 in the world’s nanotechnology publications does not result from the decrease of the absolute number of their nanotechnology publications, but from the fact that the Asian countries such as China have increased their publications rapidly, hence taking a larger share of the total. Although China made impressive stride in producing a large number of nanotechnology papers, the impact of Chinese papers remains modest (Youtie et al. 2008; Guan and Ma 2007).

There is no disputing that the US is the most active country in the world in nanotechnology patenting, measured by either the patent application in the US Patent Office (USPTO) or the European Patent Office (EPO; Hullmann 2006b; Huang et al. 2003). Li et al. (2007b) observed that the US filed more nanotechnology patents in both patent offices than any other country in almost every year from 1978 to 2004. They also found that the top 20 countries ranked by the nanotechnology patent applications in the USPTO and EPO are very similar. However, a longitudinal analysis of the USPTO patents by Wong et al. (2007) indicated that the patent applications from South Korea, Australia and Taiwan have increased rather rapidly after 2000. China’s progress in nanotechnology patenting was not as impressive as its performance in producing scientific publications, as the ranking of this country measured through patent applications in USPTO and EPO was around the 20th place (Li et al. 2007b; Huang et al. 2003).

In summary, the benchmarking literature invariably point out the leading position of the US. The traditional science and technology powerhouses such as the European countries and Japan were not falling behind in the global race, either measured by the scientific publications or patents. What is striking however is the entry of new players, such as South Korea, Taiwan and China. Nevertheless, it would be premature to say that China or other new comers would seriously challenge the lead of the triad countries (the US, the EU and Japan) in a near future, as for example China still lags behind in some key aspects such as patent applications and publication quality.

3 The search strategies used to harvest nanotechnology publications and patents

Researchers who study the development of nanoscience and nanotechnology through the analysis of publication and patent data are confronted with a fundamental question: Which publications or patents fall within the field of nanotechnology? It is notoriously difficult to define the boundary of a multidisciplinary and emerging field such as nanotechnology and harvest the relevant publications and patents. Moreover, the difference between various existing definitions of nanotechnology further exacerbates the problem (Bawa 2007).

In the literature reviewed in this paper, the search for nanotechnology patents is carried out basically through two methods. The first one is by searching the patent databases using a combined set of keywords, much the same as the methodology used to identify nanotechnology publications in publication databases. In fact, this method which is coined as “lexical query” is the most popular search methodology used in the literature to harvest publication records. The second method is using the nanotechnology patent classes which were recently established by the USPTO (Class 977), EPO (Y01 N class) and Japanese Patent Office (JPO) (ZNM class).Footnote 3 Although using a search method that utilizes patent classes, or any other pre-indexed system for that matter, is a perfectly acceptable methodology, it is due to its initial non-dependence on keyword definition and its dependence on classifications, not of interest to us in the following comparison. Hence, in this section we concentrate on examining the following four different methodologies used to search nanotechnology articles in the publication databases. They are lexical query, evolutionary lexical query, citation analysis and harvesting publications in core journals.

3.1 Lexical query

In conducting lexical queries, Tolles (2001), Meyer et al. (2001), and Dunn and Whatmore (2002) used nano*Footnote 4 as their elementary search string. Glanzel et al. (2003) and Noyons et al. (2003) both used nanotechnology-related keywords to build their search strategies. Porter et al. (2008) implemented a modular search in which they combined nano* and nanotechnology-related keywords. Irrelevant records, including the keywords that are indeed not related to nanotechnology such as NaNO3, nanoliter, and nanoplankton, have to be excluded. The exclusion is done either by “cleaning” irrelevant records from the search outcome such as Porter et al. (2008) did or by embedding the exclusion terms in the combined keywords search string, headed by the Boolean “NOT”, such as done by Glanzel et al. (2003) and Noyons et al. (2003). Nanotechnology scientists are usually invited to provide expert assistance in this process of keyword selection and exclusion.

The fast expansion of the nanotechnology field is posing challenges to the lexical query approach. Mogoutov and Kahane (2007) claimed that the core of related keywords will experience an even more rapid growth than the entire database of nanotechnology publications. Early bibliometric analysis by, for instance, Braun et al. (1997) and Tolles (2001), which harvested publications through respective nano-prefixed keywords, or merely the simple term “nano*”, suffered from the omission of biotechnology-related publications whose keywords were less likely to contain the prefix “nano”. Another criticism of the lexical query approach is its subjectivity when using experts to define the keyword set used. Search outcomes will then inevitably be biased toward fields that the experts used are more knowledgeable in.

3.2 Evolutionary lexical query

The evolutionary lexical query differs from the lexical query primarily because of its automatic and iterative way of obtaining search keywords that minimizes the input of experts. Using the evolutionary lexical query approach, scholars in the first step retrieve a core set of nanotechnology publications. In the Nanobank Project, Zucker et al. (2007) obtained core nanotechnology publications from the weekly Virtual Journal of Nanoscale Science & Technology, which includes the latest research articles appearing in a variety of source publications in the field. Mogoutov and Kahane (2007) retrieved core publications through a simple nano prefix search strategy. In the second step, scholars harvested a set of keywords from the core publications and ranked the keywords by their level of relevance to the field, based on the frequencies of the keywords or combined keywords appearing in the core publications. Zucker et al. (2007) and Kostoff et al. (2006a, b) used these expanded keyword sets to harvest additional publications and repeated the process until the publications converged on a relatively consistent set of keywords that changed only slightly between iterations. Different from Zucker et al. (2007) and Kostoff et al. (2006a, b), Mogoutov and Kahane (2007) did not adopt a multiple-stage iterative process but involved experts in verifying and modifying the expanded keyword set.

Minimization of expert intervention, and thus subjectivity, is a significant advantage of the evolutionary lexical query approach over the standard lexical query approach. However, the selection of keywords in the evolutionary lexical query approach, based on the probability of relevance of the keywords, is still determined by researchers, and it needs to be validated by experts.

3.3 Citation analysis

To retrieve nanotechnology publications, Zitt and Bassecoulard (2006) demonstrated a hybrid lexical-citation approach. In the first step, they harvested a set of “seed” nanotechnology publications by using a search strategy largely identical to the one used by Noyons et al. (2003). Secondly, they identified a set of “core” literature cited by the seed literature. In the third step, they found a final set of nanotechnology literature that cited the “core” literature. They controlled the selection of the core literature and the final set of nanotechnology literature using finely tuned threshold parameters that strike a balance between the specificity and the coverage of the publications. In the jargon of information science, they manage the trade-off between the exclusion of relevant publications (i.e., the recall problem known as “silence”) and the inclusion of irrelevant publications (i.e., the precision problem known as “noise”). By carefully choosing the parameters, Zitt and Bassecoulard obtained the final set of literature, which contains 178,000 publications, 56,000 more than the seed literature. In the seed literature, the publications on material sciences, applied physics, condensed matter physics, and physical chemistry are in descending order according to their shares in total. In the final literature, the publications on these four subfields are also prominent, but the ranks of their shares are reversed.

Unlike lexical query, which is more subjective, citation analysis depends very little on expert intervention. However, subjectivity has not been fully removed from the process because the size of the final literature set is still determined by the parameters chosen by the researchers. While the final literature set would be larger and its coverage more comprehensive, it would also contain more “noise”. Another difficulty in implementing the methodology is that it necessitates setting up a citation linkage between all the papers in the database. According to Mogoutov and Kahane (2007), no more than a dozen institutions in the world would have access to the full Web of Science database to use the pre-built citation links. Although in itself a mere logistical problem, it is all too real in the cash-strapped world of social science research.

3.4 Publications in the core nanotechnology journals

Unlike most researchers, who identify nanotechnology publications through lexical queries or citation analysis, Leydesdorff and his co-authors use journals as the unit of analysis and extract articles from a set of core journals. Zhou and Leydesdorff (2006) distinguished a set of three core nanotechnology journals and a group of 85 journals related to the field. Based on the concept of “betweenness centrality”, proposed by Leydesdorff (2007) as an indicator for measuring the interdisciplinarity of scientific journals, Leydesdorff and Zhou (2007) identified ten core journals on nanotechnology.

Compared to using lexical queries and citation analysis, collecting nanotechnology publications from a limited number of field specific journals is relatively straightforward. Our criticism of using this method revolves around the fact that the publications in these core journals only cover a small part of the entire corpus of nanoscience and technology related literature. As we will see in the below Sect. 3.5, the widely used lexical query strategies each identify more than 500 journals which publish nanotechnology articles. The total number of publications harvested by the lexical query strategies is 5–10 times greater than the number of the publications in the 10 core journals proposed by Leydesdorff and Zhou (2007). Moreover, as the technology is emerging and evolving, the set of journals which publish the nanotechnology related articles are also changing. The analysis based on a very limited number of the core journals would not provide robust results (A detailed analysis is seen in Sect. 3.5). Arguably, to draw a more comprehensive picture and precisely characterize the dynamics of the emerging field, one needs to resort to more complex search strategies.

3.5 Comparative analysis of search strategies

Table 1 summarizes the characteristics, strengths, and weaknesses of using lexical queries, evolutionary lexical queries, citation analysis, and the search strategy using a set of core journals from the field. Provided that a variety of the search methodologies are employed in the literature, and diverse conclusions are derived based on the datasets constructed by these different strategies, one might inquiry whether the different strategies would affect the robustness of the analysis results. For example, would the positions of the countries in global nanorace be changed if different search strategies are used to build the nanotechnology publication and patent datasets? In addition, enormous energy and efforts have been invested to develop new search strategies in recent years. Almost every individual or research group tends to develop its own search queries. Would the additional efforts be worthwhile in the sense that new strategies will measure the field more precisely than the old ones and then also give rise to materially different conclusions?

Table 1 Strengths and weaknesses of different search strategies

Few studies have investigated these issues, except for the ones by Porter et al. (2008) and Mogoutov and Kahane (2007). Porter et al. compare the search outcomes of their strategy with those of citation analysis (Zitt and Bassecoulard 2006) and lexical query strategies (Noyons et al. 2003; Kostoff et al. 2006c). They find that the search outcomes of Zitt and Bassecoulard and Kostoff et al. resemble theirs in terms of total number of records, share of the publications with certain keywords, country ranking and selected topical areas, authors and source journals, though some minor difference existed. The retrieved records by Noyons et al. however contain a higher percentage of biotechnology publications than those of Porter et al. Mogoutov and Kahane (2007) compare their strategy with Zitt and Bassecoulard’s (2006) and Noyons et al. (2003) by examining the 2005 publications. Mogoutov and Kahane find that their strategy extracts twice the articles that Noyons et al.’s strategy does and 50% more than what Zitt and Bassecoulard’s strategy (only its lexical query part) is able to extract. The field distribution of the articles retrieved by Noyons et al. is found to be biased toward bio-medicine as well.

In this section, we undertake a large scale comparative study to examine the following six different strategies. Different from Porter et al. (2008) and Mogoutov and Kahane (2007), our objective of comparison is not to verify the search outcome of our own strategy, but to examine the possible influence of the different strategies on the constructed nanotechnology publication datasets and accordingly on the results of analyses based on these datasets.

  1. 1.

    Glanzel et al. (2003) (from now on called glanzel)

  2. 2.

    Leydesdorff and Zhou (2007) (from now on called leydesdorff)

  3. 3.

    Mogoutov and Kahane (2007) (from now on called mogoutov)

  4. 4.

    NANO* (from now on called nano*Footnote 5)

  5. 5.

    Noyons et al. (2003) (from now on called noyons)

  6. 6.

    Porter et al. (2008) (from now on called porter)

Out of the myriad of lexical queries, we select the above six queries for the following reasons. First, we aim to have a broad representation of the different categories of search strategies. nano*, glanzel, noyons and porter are standard lexical query strategies. Differently, mogoutov falls into the category of the evolutionary lexical query. leydesdorff is an alternative strategy based on a selection of core journals. Alas, we are not able to replicate the citation analysis such as by Zitt and Bassecoulard (2006) due to the aforementioned problem regarding the access to the complete Web of Science database with the pre-built citation links. Second, we would also like to compare the strategies with diverse sophistication levels which are measured by the length of the keyword lists. nano* is the simplest and most straightforward search strategy, providing a benchmark dataset. glanzel, mogoutov, noyons and porter are more sophisticated. Actually, nano* is a part of the keyword set in glanzel, noyons and porter. Third, because the development of the search strategies usually involves experts, experts may bring bias to strategies. We are thus interested to compare the strategies which involved different groups of experts. glanzel and noyons’ reports are two major studies sponsored by the European commission. porter, developed in the United States, probably involved different nanoscientists, compared to those employed on the EU projects. The compared studies have all looked at nanotechnology as a whole, and not at niche subjects or nanotechnology sub-fields specifically. They have also endeavored, either as part of the study or as main aim of the study, to map the worldwide nanotechnology landscape.

In June and July 2008, we applied nano*, glanzel, noyons, porter, mogoutov, and leydesdorff to the ISI/SCI-E database (ISI Web of Knowledge [v.4.2] and [v.4.3]), specifically to the topic field for the publication year 2006, all languages, and articles only. For nano*, this resulted in the search: “TS = nano* AND PY = 2006, All languages, Article, SCI-Expanded”. We extracted 39,889 articles by simply searching “nano*”. We were able to extract 46,177 articles by using glanzel, Footnote 6 47,002 articles using noyons, 57,900 using porter, Footnote 7 86,751 using mogoutov, and 9,027 articles published in the ten core nanotechnology journals defined by leydesdorff (Table 2).

Table 2 Search outcomes by different strategies (Science Citation Index Expanded, 2006)

We compare the records retrieved by the different strategies in the following aspects: the size of dataset, distribution of the articles across the subject areas, the country and institution ranking, the source journals and the overlap between the datasets. porter and mogoutov cover significantly more publications than glanzel and noyons. The size of the nanotechnology publication dataset established by porter is 25% larger than the one by glanzel. mogoutov extracts the most records: 88% more than glanzel. This finding is consistent with Mogoutov and Kahane (2007)’s assessment of their own strategy. Moreover, since glanzel, noyons, porter, and mogoutov include nano* and other keywords such as “fullerene”, “quantum wires” and “quantum dots” etc. that are undisputedly considered as nanotechnology terms, naturally, we find that the datasets, retrieved by glanzel, noyons, porter, and mogoutov, are larger than the one obtained by the relatively simplistic nano* strategy. The publications harvested from the ten core journals as defined by leydesdorff are the smallest batch, accounting for only 20% of glanzel.

As seen in Table 2, glanzel, noyons, porter, and mogoutov produce identical rankings of the top seven subject areas in which most of the articles are published and the top ten most prolific countries. The four strategies also render similar rankings of the most prolific institutions. The top three ranked institutions identified by the four strategies are indeed identical. In terms of top ten journals which publish most nanotechnology articles (Table 3), glanzel and noyons give exactly the same list with slight difference of some journals’ ranks. Only one journal in porter’s list and three in mogoutov’s list are not included in glanzel and noyons’ list. The reason that these four strategies produce broadly similar results is that they share a core set of keywords which largely constitute the search strategy of glanzel, and through these keywords a batch of common publications are harvested. In order to study how many publications are shared by these different strategies, we obtain a unique article set for each one. A unique article set includes the articles retrieved by only one strategy, but not by the others. Seen in Table 4, only 0.6% of the glanzel dataset is not shared by the others’. In other words, 99.4% of the glanzel dataset is included in other strategies’ datasets. About 96.4% of the noyons dataset, 88.3% of the porter dataset and 61.8% of the mogoutov dataset are in common.Footnote 8 Given the datasets constructed by glanzel, noyons and porter overlap considerably, it is natural to obtain similar results from them. Although 38.2% of the mogoutov dataset is unique, the unique part produces similar results as the common part (seen in Table 4), so that the overall results based on mogoutov do not differ greatly either. That is the reason that benchmarking performance of individual countries in the global nanorace led to rather consistent conclusions, no matter which search strategy is employed in the studies, as we review in Sect. 2.2.

Table 3 Ranking of the top 10 journals by different strategies in terms of publishing most of nanotechnology articles (Science Citation Index Expanded, 2006)
Table 4 The unique records extracted by different search strategies (Science Citation Index Expanded, 2006)

mogoutov, a more recently developed strategy, harvests nearly twice the amount of articles than glanzel does and is more sophisticated than glanzel as well. The insights gained from the benchmark analyses based on the two different strategies are nevertheless marginal. The overlap of the queries and similarity of the strategies are the very reason of this finding. However, we should be cautious not to generalize the findings and to disregard future new search strategies which may contribute to understanding the newly emerging sub-fields of nanotechnology through the addition of new search terms. Continuous efforts are recommended to be made to monitor the evolution of nanoscience and technology, though the technologies might be developed in a slower pace than popularly assumed in some sub-fields, which would result in relatively static keyword sets and stable analytical results based on these keywords (Ahmad and Al-Thubaity 2003).

leydesdorff not only harvests the least records, but also produces a ranking of the subject areas significantly different from those by glanzel, noyons, porter, and mogoutov. Because Thomson ISI can assign multiple subject categories to a journal, and glanzel, noyons, porter, and mogoutov all cover more than 500 journals, it is understandable that the subject area ranking for the publications in only 10 journals will be very different from a subject ranking based on more than 500 journals. Moreover, further examination of the journals covered by the different strategies demonstrates that six of the top ten journals which are identified by leydesdorff are not selected by the other strategies (Table 3). Moreover, 35% of the publications in these ten core journals are not overlapping with the publications extracted by the other strategies (Table 4). In the country and institution rankings, compared to glanzel, noyons, porter, and mogoutov, leydesdorff differs noticeably in the sense that its search outcome is biased towards the United States and the American institutions. Provided that the ten core journals are relatively more frequently cited journals which publish better articles, the strength of the scientific research in the United States is reflected by the distinct visibility of the American scientists and institutions in these top journals. This is consistent with the finding of Leydesdorff and Wagner’s (2009) study which also uses the ten core journal, as reviewed in Sect. 2.2.

Based on Table 3, we choose the top ten journals identified by glanzel and noyons (marked in bold text), which are also the top journals most frequently selected by the other strategies, to form an experimental search strategy and compare it with leydesdorff. This experimental search strategy can identify 24,859 articles for 2006, forming a dataset two and half times as large as that of leydesdorff. The rankings of the subject areas, countries and institutions of this strategy are, however, significantly different from that of leydesdorff, and from those of glanzel, noyons, porter, and mogoutov as well (seen in Table 2). We thus argue that studies relying on a very limited number of core journals, and used to benchmark the nanotechnology field would not produce robust results simply because the nanotechnology related articles are published in a much wider range of journals and the different criteria used for selecting these core journals dramatically affect the analytical results.

The comparative analysis of the search outcome of leydesdorff reveals that delineation of the nanotechnology field by a relatively small number of core journals would miss a significant amount of relevant articles. Accordingly, the analytical results based on the core journal publications would not be consistent with the others based on complete lexical queries which cover a broad range of journals. Although harvesting the publications from a fixed, core, set of journals is the most straightforward strategy to implement and involves the least data-cleaning efforts, the methodology does not satisfy the need of defining a sound boundary of an emerging field such as nanotechnology. It would be more appropriate for researching a long established research field with entrenched, stable, research outlets.

4 Conclusion

We in this paper undertake a comprehensive review of more than 120 social science studies on nanoscience and nanotechnology, most of which are based on the analyses of the nanotechnology publications and patents. We discussed intensively four intellectual debates emerging from these studies, namely whether nanotechnology is an interdisciplinary field, whether nanoscience and nanotechnology are closely interlinked, whether nanotechnology development is path dependent and who is winning the global nanorace. We also probe how it could be that these different studies reach rather different conclusions on the same topic.

In the debates of interdisciplinarity of nanotechnology and the relationship between nanoscience and nanotechnology, the contradictory arguments to a great extent results from the diverse methodologies used to study the issues. Schummer (2004a) examination of author affiliations led to the argument that nanotechnology is not an interdisciplinary field, contradicting Meyer and Persson’s (1998) view based on journal classification. Similarly, if one regards only the sporadic NPL references in nanotechnology patents to nanoscience publications as the measurement of science-technology linkage, a conclusion that nanoscience and nanotechnology are loosely connected would be reached. However, if all academic citations of nanotechnology patents, which increased rapidly in the past two decades, are taken into account, science seems to lay an important foundation for the development of nanotechnology. These are examples of the methodological limitations of the bibliometric approach in studying an emerging field such as nanotechnology.

The evidence presented in the debate on whether nanotechnology development is path dependent shows that nanotechnology innovative activity not only takes place in regions which performed well in the previous rounds of technological development, but also in other regions where more recently high-level human capital has been carefully nurtured and where effective research networks are successfully developed. With regards to the path dependence of technological development and knowledge accumulation within firms, the studies demonstrate that small firms rely on their existing capabilities to organize nanotechnology R&D activities, while larger firms expand their knowledge through building up new capabilities. The exercises of examining the performances of countries in the global nanorace invariably point to the lead of the US in this field and the sustained advantage of the traditional science and technology frontrunners such as Japan, and a number of European countries. Although the entry of new players such as South Korea, Taiwan and China is phenomenal, it would be premature to say they could seriously challenge the lead of the triad countries (the US, the EU and Japan) in the near future.

In addition to reviewing the four intellectual debates, we conduct a comparative analysis of the methodologies used in the reviewed literature to harvest publications, which include lexical queries, evolutionary lexical queries, citation analysis, and the use of core journals. We find that most of the lexical queries (glanzel, noyons, porter, and mogoutov) that we compare produce very similar ranking tables for the top nanotechnology subject areas, the top journals, and the most prolific countries and institutions. The reason that these four strategies produce broadly similar results is that they share a core set of keywords which largely form the search strategy of glanzel, and through these keywords a batch of common publications are harvested. However, mogoutov harvests nearly twice the amount of articles than glanzel does and the strategy is more sophisticated than glanzel as well. The insights gained from the benchmark analyses based on the two different strategies are nevertheless marginal. Only leydesdorff differs significantly from the lexical query strategies in terms of the ranking of the top ten subject areas, countries, and institutions. We conclude however that delineation of the nanotechnology field by a relatively small number of core journals such as by leydesdorff would miss a significant part of the articles in the field. Accordingly, the analytical results based on the core journal publications would be inconsistent with the results reached through the various lexical queries.

However, we would not want to discourage the development of innovative search strategies, such as further refinements of the modular lexical query, which may reveal additional insights in newly emerging sub-fields of nanotechnology, for instance through the mining of subject specific, or new, terminology. It could also be that nanotechnology is developing at a slower pace than popularly assumed, resulting in relatively static keyword sets and thus stable analytical results. Although seemingly paradoxal, from this point of view continuous efforts should then be recommended to monitor the technological development through the regular updating of keyword sets and accordingly the search strategies in order to discover any possible changes in direction of the field.