Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published September 27, 2017 | Version v1
Dataset Open

Dissimilarity between scientific fields

  • 1. Max Planck INstitute for the Physics of Complex Systems, Dresden, Germany
  • 2. School of Mathematics, The University of Sydney

Description

Datasets and supporting material used in the manuscript

"Using text analysis to quantify the similarity and evolution of scientific disciplines", by L. Dias, M. Gerlach, J. Scharloth and E. G. Altmann, available at https://arxiv.org/abs/1706.08671

There are four types of information:

1. Classification

One file (classification.csv)

Provides the classification of scientific fields in domains, disciplines, and specialties, according to the ISI-Web-of-Science/OECD classification.

2. Divergencies

Seven ".csv" files D_level_dimension.csv

The divergence between two scientific fields, as discussed in the manuscript (E.g., Fig. 1). The files correspond to the combinations between three dimensions (experts, citations, and language) and three levels of classification of scientific fields (domains, disciplines, and speciaties).

The first row and column in each file indicates the number of the scientific field, see the file "classficiation.csv" for details.

3. Temporal evolution

One file (D_over_time.csv)

The language divergence between two disciplines D_i,j computed at different years (y in [1991-2014]). The two first columns indicate the code of the disciplines i and j, see file classification.csv mentioned in point 1 above. The first row indicates the year. The entries of the table are D_i,j. The entry "nan" indicates that in that year the corpus of disciplines i and j were not long enough for the computation of D_i,j (less than 20,000 types), see Materials and Methods of the paper. The results of this table were used in Fig. 4 of the paper.


4. List of words

The list of contractions was obtained from the  Wikipedia List of English Contractions (http://en.wikipedia.org/wiki/Wikipedia:List_of_English_contractions).

The list of stop word was constructed mixing the lists found in  NLTK (http://www.nltk.org/), Gensim (http://radimrehurek.com/gensim/index.html), Mallet (http://mallet.cs.umass.edu/) and the Python Machine Learning Toolkit (http://scikit-learn.org).


List of Contractions:

"she'll": 'she will', "shouldn't've": 'should not have', "she'll've": 'she will have', "don't": 'do not', "should've": 'should have', "won't": 'will not', "who'll've": 'who will have', "he's": 'he is', "when's": 'when is', "we've": 'we have', "he'd": 'he had', "ma'am": 'madam', "y'all're": 'you all are', "he'd've": 'he would have', "how'd'y": 'how do you', "shan't've": 'shall not have', "haven't": 'have not', "who's": 'who is', 'gonna': 'going to', "they'd": 'they would', "oughtn't": 'ought not', "you've": 'you have', "she'd've": 'she would have', "we'll": 'we will', "mayn't": 'may not', "they've": 'they have', "mustn't've": 'must not have', "could've": 'could have', "what've": 'what have', "mustn't": 'must not', "isn't": 'is not', "that'd've": 'that would have', "i'll": 'i will', "why's": 'why is', "you'd": 'you would', "couldn't've": 'could not have', "they'll've": 'they will have', "we'd": 'we would', "y'all'd": 'you all would', "he'll've": 'he will have', "shan't": 'shall not', "y'all'd've": 'you all would have', "there'd": 'there would', "needn't": 'need not', "where'd": 'where did', "hadn't've": 'had not have', "wouldn't've": 'would not have', "there's": 'there is', "shouldn't": 'should not', "they'll": 'they will', "needn't've": 'need not have', "mightn't": 'might not', "you're": 'you are', "so've": 'so have', "what'll": 'what will', "mightn't've": 'might not have', "hadn't": 'had not', "aren't": 'are not', "where's": 'where is', "wouldn't": 'would not', "i'd": 'i would', "weren't": 'were not', "would've": 'would have', "i'm": 'i am', "it'll": 'it will', "we'd've": 'we would have', "can't": 'cannot', "y'all": 'you all', "couldn't": 'could not', "how'll": 'how will', "doesn't": 'does not', "when've": 'when have', "how's": 'how is', "it's": 'it is', "y'all've": 'you all have', "how'd": 'how did', "we're": 'we are', "it'd": 'it would', "what're": 'what are', "i've": 'i have', "oughtn't've": 'ought not have', "what's": 'what is', "ain't": 'am not', "who'll": 'who will', "i'd've": 'i would have', "must've": 'must have', "they're": 'they are', "you'd've": 'you would have', "wasn't": 'was not', "it'll've": 'it will have', "hasn't": 'has not', "won't've": 'will not have', "so's": 'so is', "you'll've": 'you will have', "there'd've": 'there would have', "i'll've": 'i will have', "didn't": 'did not', "where've": 'where have', "they'd've": 'they would have', "why've": 'why have', "it'd've": 'it would have', "who've": 'who have', "sha'n't": 'shall not', "to've": 'to have', "o'clock": 'of the clock', "let's": 'let us', "what'll've": 'what will have', "might've": 'might have', "he'll": 'he will', "that'd": 'that would', 'wanna': 'want to', "we'll've": 'we will have', "she'd": 'she would', "can't've": 'cannot have', "you'll": 'you will', "will've": 'will have', "she's": 'she is', "that's": 'that is'

List of Stopwords:

a, about, above, after, afterward, afterwards, again, against, all, almost, along, already, also, although, always, am, among, amongst, an, and, another, any, anybody, anyhow, anyone, anything, anyway, anyways, anywhere, are, around, as, aside, at, be, became, because, become, becomes, becoming, been, before, beforehand, behind, being, below, beside, besides, between, beyond, both, but, by, can, "cant", cannot, could, "couldnt", did, "didnt", do, does, "doesnt", doing, "dont", down, downwards, due, each, eg, either, else, elsewhere, enough, etc, even, ever, every, everybody, everyone, everything, everywhere, ex, except, for, former, formerly, find, found, from, further, furthermore, get, gets, getting, go, goes, going, gone, got, gotten, had, has, "hasnt", have, having, he, hence, her, here, hereafter, hereby, herein, hereupon, hers, herself, him, himself, his, hither, hitherto, how, however, i, ie, if, ii, iii, in, indeed, insofar, instead, into, inward, is, it, its, itself, iv, just, less, may, maybe, me, meanwhile, might, mine, more, moreover, most, mostly, must, my, myself, neither, nevertheless, new, no, non, none, nonetheless, nor, not, now, nowhere, obviously, occurs, of, off, often, on, only, onto, or, other, others, otherwise, our, ours, ourselves, out, over, own, perhaps, put, quite, rather, respectively, same, several, shall, she, should, show, showed, shown, shows, similar, since, so, some, somehow, someone, something, sometime, sometimes, somewhere, still, such, than, that, "thats", the, their, theirs, them, themselves, then, thence, thenceforth, there, "theres", thereafter, thereby, therefore, therein, theres, thereupon, these, they, this, thorough, thoroughly, those, though, through, throughout, thru, thus, to, together, too, toward, towards, under, until, unto, up, upon, upwards, us, use, used, using, various, very, was, we, well, were, what, whatever, when, whence, whenever, where, whereafter, whereas, whereby, wherein, whereupon, wherever, whether, which, while, whither, who, whoever, whole, whom, whose, why, will, with, within, without, would, yet, you, your, yours, yourself, yourselves

Notes

Results obtained on articles indexed at the ISI-Web of Science, (Thomson Reuters, http://isiknowledge.com/). We thank M. Palzenberger and the Max Planck Digital Library for providing access to the data

Files

classification.csv

Files (1.6 MB)

Name Size Download all
md5:dcfc2b8d23b2199227806b76a76226b2
20.2 kB Preview Download
md5:4427ff9f30b0e365ad911b7c74e6ace5
18.5 kB Preview Download
md5:9575f8aee2a87c8abceef766c977b8db
3.3 kB Preview Download
md5:80d9c1d55fc6956945bb9d3535b118a5
18.5 kB Preview Download
md5:c32eaa19ad5b98e5403b35af73d58ef3
955 Bytes Preview Download
md5:22fa63c41323126a80d745f6479e5952
117.6 kB Preview Download
md5:144ed1c9feb5835501230d8c53e0d85b
621.3 kB Preview Download
md5:f7551221a9918052c1046c9812a81359
103.5 kB Preview Download
md5:a1d3d22b2eeb3717841b8a27fda30ce9
652.3 kB Preview Download

Additional details

References

  • "Using text analysis to quantify the similarity and evolution of scientific disciplines", by L. Dias, M. Gerlach, J. Scharloth and E. G. Altmann, available at https://arxiv.org/abs/1706.08671