Dissimilarity between scientific fields
Creators
- 1. Max Planck INstitute for the Physics of Complex Systems, Dresden, Germany
- 2. School of Mathematics, The University of Sydney
Description
Datasets and supporting material used in the manuscript
"Using text analysis to quantify the similarity and evolution of scientific disciplines", by L. Dias, M. Gerlach, J. Scharloth and E. G. Altmann, available at https://arxiv.org/abs/1706.08671
There are four types of information:
1. Classification
One file (classification.csv)
Provides the classification of scientific fields in domains, disciplines, and specialties, according to the ISI-Web-of-Science/OECD classification.
2. Divergencies
Seven ".csv" files D_level_dimension.csv
The divergence between two scientific fields, as discussed in the manuscript (E.g., Fig. 1). The files correspond to the combinations between three dimensions (experts, citations, and language) and three levels of classification of scientific fields (domains, disciplines, and speciaties).
The first row and column in each file indicates the number of the scientific field, see the file "classficiation.csv" for details.
3. Temporal evolution
One file (D_over_time.csv)
The language divergence between two disciplines D_i,j computed at different years (y in [1991-2014]). The two first columns indicate the code of the disciplines i and j, see file classification.csv mentioned in point 1 above. The first row indicates the year. The entries of the table are D_i,j. The entry "nan" indicates that in that year the corpus of disciplines i and j were not long enough for the computation of D_i,j (less than 20,000 types), see Materials and Methods of the paper. The results of this table were used in Fig. 4 of the paper.
4. List of words
The list of contractions was obtained from the Wikipedia List of English Contractions (http://en.wikipedia.org/wiki/Wikipedia:List_of_English_contractions).
The list of stop word was constructed mixing the lists found in NLTK (http://www.nltk.org/), Gensim (http://radimrehurek.com/gensim/index.html), Mallet (http://mallet.cs.umass.edu/) and the Python Machine Learning Toolkit (http://scikit-learn.org).
List of Contractions:
"she'll": 'she will', "shouldn't've": 'should not have', "she'll've": 'she will have', "don't": 'do not', "should've": 'should have', "won't": 'will not', "who'll've": 'who will have', "he's": 'he is', "when's": 'when is', "we've": 'we have', "he'd": 'he had', "ma'am": 'madam', "y'all're": 'you all are', "he'd've": 'he would have', "how'd'y": 'how do you', "shan't've": 'shall not have', "haven't": 'have not', "who's": 'who is', 'gonna': 'going to', "they'd": 'they would', "oughtn't": 'ought not', "you've": 'you have', "she'd've": 'she would have', "we'll": 'we will', "mayn't": 'may not', "they've": 'they have', "mustn't've": 'must not have', "could've": 'could have', "what've": 'what have', "mustn't": 'must not', "isn't": 'is not', "that'd've": 'that would have', "i'll": 'i will', "why's": 'why is', "you'd": 'you would', "couldn't've": 'could not have', "they'll've": 'they will have', "we'd": 'we would', "y'all'd": 'you all would', "he'll've": 'he will have', "shan't": 'shall not', "y'all'd've": 'you all would have', "there'd": 'there would', "needn't": 'need not', "where'd": 'where did', "hadn't've": 'had not have', "wouldn't've": 'would not have', "there's": 'there is', "shouldn't": 'should not', "they'll": 'they will', "needn't've": 'need not have', "mightn't": 'might not', "you're": 'you are', "so've": 'so have', "what'll": 'what will', "mightn't've": 'might not have', "hadn't": 'had not', "aren't": 'are not', "where's": 'where is', "wouldn't": 'would not', "i'd": 'i would', "weren't": 'were not', "would've": 'would have', "i'm": 'i am', "it'll": 'it will', "we'd've": 'we would have', "can't": 'cannot', "y'all": 'you all', "couldn't": 'could not', "how'll": 'how will', "doesn't": 'does not', "when've": 'when have', "how's": 'how is', "it's": 'it is', "y'all've": 'you all have', "how'd": 'how did', "we're": 'we are', "it'd": 'it would', "what're": 'what are', "i've": 'i have', "oughtn't've": 'ought not have', "what's": 'what is', "ain't": 'am not', "who'll": 'who will', "i'd've": 'i would have', "must've": 'must have', "they're": 'they are', "you'd've": 'you would have', "wasn't": 'was not', "it'll've": 'it will have', "hasn't": 'has not', "won't've": 'will not have', "so's": 'so is', "you'll've": 'you will have', "there'd've": 'there would have', "i'll've": 'i will have', "didn't": 'did not', "where've": 'where have', "they'd've": 'they would have', "why've": 'why have', "it'd've": 'it would have', "who've": 'who have', "sha'n't": 'shall not', "to've": 'to have', "o'clock": 'of the clock', "let's": 'let us', "what'll've": 'what will have', "might've": 'might have', "he'll": 'he will', "that'd": 'that would', 'wanna': 'want to', "we'll've": 'we will have', "she'd": 'she would', "can't've": 'cannot have', "you'll": 'you will', "will've": 'will have', "she's": 'she is', "that's": 'that is'
List of Stopwords:
a, about, above, after, afterward, afterwards, again, against, all, almost, along, already, also, although, always, am, among, amongst, an, and, another, any, anybody, anyhow, anyone, anything, anyway, anyways, anywhere, are, around, as, aside, at, be, became, because, become, becomes, becoming, been, before, beforehand, behind, being, below, beside, besides, between, beyond, both, but, by, can, "cant", cannot, could, "couldnt", did, "didnt", do, does, "doesnt", doing, "dont", down, downwards, due, each, eg, either, else, elsewhere, enough, etc, even, ever, every, everybody, everyone, everything, everywhere, ex, except, for, former, formerly, find, found, from, further, furthermore, get, gets, getting, go, goes, going, gone, got, gotten, had, has, "hasnt", have, having, he, hence, her, here, hereafter, hereby, herein, hereupon, hers, herself, him, himself, his, hither, hitherto, how, however, i, ie, if, ii, iii, in, indeed, insofar, instead, into, inward, is, it, its, itself, iv, just, less, may, maybe, me, meanwhile, might, mine, more, moreover, most, mostly, must, my, myself, neither, nevertheless, new, no, non, none, nonetheless, nor, not, now, nowhere, obviously, occurs, of, off, often, on, only, onto, or, other, others, otherwise, our, ours, ourselves, out, over, own, perhaps, put, quite, rather, respectively, same, several, shall, she, should, show, showed, shown, shows, similar, since, so, some, somehow, someone, something, sometime, sometimes, somewhere, still, such, than, that, "thats", the, their, theirs, them, themselves, then, thence, thenceforth, there, "theres", thereafter, thereby, therefore, therein, theres, thereupon, these, they, this, thorough, thoroughly, those, though, through, throughout, thru, thus, to, together, too, toward, towards, under, until, unto, up, upon, upwards, us, use, used, using, various, very, was, we, well, were, what, whatever, when, whence, whenever, where, whereafter, whereas, whereby, wherein, whereupon, wherever, whether, which, while, whither, who, whoever, whole, whom, whose, why, will, with, within, without, would, yet, you, your, yours, yourself, yourselves
Notes
Files
classification.csv
Files
(1.6 MB)
Name | Size | Download all |
---|---|---|
md5:dcfc2b8d23b2199227806b76a76226b2
|
20.2 kB | Preview Download |
md5:4427ff9f30b0e365ad911b7c74e6ace5
|
18.5 kB | Preview Download |
md5:9575f8aee2a87c8abceef766c977b8db
|
3.3 kB | Preview Download |
md5:80d9c1d55fc6956945bb9d3535b118a5
|
18.5 kB | Preview Download |
md5:c32eaa19ad5b98e5403b35af73d58ef3
|
955 Bytes | Preview Download |
md5:22fa63c41323126a80d745f6479e5952
|
117.6 kB | Preview Download |
md5:144ed1c9feb5835501230d8c53e0d85b
|
621.3 kB | Preview Download |
md5:f7551221a9918052c1046c9812a81359
|
103.5 kB | Preview Download |
md5:a1d3d22b2eeb3717841b8a27fda30ce9
|
652.3 kB | Preview Download |
Additional details
References
- "Using text analysis to quantify the similarity and evolution of scientific disciplines", by L. Dias, M. Gerlach, J. Scharloth and E. G. Altmann, available at https://arxiv.org/abs/1706.08671