Skip to main content

Advertisement

Log in

A preliminary text classification of the precursory accelerating seismicity corpus: inference on some theoretical trends in earthquake predictability research from 1988 to 2018

  • Original Article
  • Published:
Journal of Seismology Aims and scope Submit manuscript

Abstract

Text analytics based on supervised machine learning has shown great promise in a multitude of domains but has yet to be applied to seismology. We describe some common classifiers (Naïve Bayes, k-Nearest Neighbors, Support Vector Machines, and Random Forests) as well as the standard steps of supervised learning (training, validation of model parameter adjustments, and testing). To illustrate text classification on a seismological corpus, we use a hundred articles related to the topic of precursory accelerating seismicity, spanning from 1988 to 2010. This corpus was labelled by Mignan [Tectonophysics, 2011] with the precursor whether explained by critical processes (i.e., cascade triggering) or by other processes (such as signature of main fault loading). We investigate how the classification process can be automatized to help analyze larger corpora in order to better understand trends in earthquake predictability research. We find that the Naïve Bayes model performs best, in agreement with the machine learning literature for the case of small datasets, with cross-validation accuracies showing the model’s predictive ability for both binary classification (“critical process” or else) and a multiclass classification (“non-critical process,” “agnostic,” “critical process assumed,” “critical process demonstrated”). Prediction on a dozen of articles published since 2011 shows however a weak generalization, which can be explained, in part, by the empirical variance of the small training set. This preliminary study demonstrates the potential of supervised learning to reveal textual patterns in the seismological literature. Manual labelling remains essential but is made transparent by an investigation of Naïve Bayes keyword posterior probabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Adamaki AK, Roberts RG (2017) Precursory activity before larger events in Greece revealed by aggregated seismicity data. Pure Appl Geophys 174:1331–1343. https://doi.org/10.1007/s00024-017-1465-6

    Article  Google Scholar 

  • Aggarwal CC (2018) Machine learning for text. Springer Nature, 493 pp. https://doi.org/10.1007/978-3-319-73531-3

  • Bak P, Tang C (1989) Earthquakes as a self-organized critical phenomenon. J Geophys Res 94:15,635–15,637

    Article  Google Scholar 

  • Bennet KP, Campbell C (2000) Support vector machines: hype or hallelujah? SIGKDD Explor 2:1–13

    Article  Google Scholar 

  • Benoit K (2018) Quantitative analysis of textual data, package 'quanteda', available at https://cran.r-project.org/web/packages/quanteda/ (last assessed August 2018)

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  • Bouchon M, Marsan D (2015) Reply to 'Artificial seismic acceleration'. Nat Geosci 8:83

    Article  Google Scholar 

  • Bouchon M, Durand V, Marsan D, Karabulut H, Schmittbuhl J (2013) The long precursory phase of most large interplate earthquakes. Nat Geosci 6:299–302

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall/CRC, Taylor & Francis Group 358 pp

    Google Scholar 

  • Bufe CG, Varnes DJ (1993) Predictive modeling of the seismic cycle of the greater San Francisco Bay region. J Geophys Res 98:9,871–9,883

    Article  Google Scholar 

  • Christou EV, Karakaisis G, Scordilis E (2016) Time dependent seismicity along the western coast of Canada. Res Geophys 5:5730

    Article  Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    Google Scholar 

  • Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory IT-13:21–27

    Article  Google Scholar 

  • De Santis A, Cianchini G, Di Giovambattista R (2015) Accelerating moment release revisited: examples of application to Italian seismic sequences. Tectonophysics 639:82–98. https://doi.org/10.1016/j.tecto.2014.11.015

    Article  Google Scholar 

  • Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29:103–130

    Article  Google Scholar 

  • Felzer KR, Page MT, Michael AJ (2015) Artificial seismic acceleration. Nat Geosci 8:82–83

    Article  Google Scholar 

  • Forman G (2008) BNS feature scaling: an improved representation over TF-IDF for SVM text classification, ACM 17th Conf. Info. and Knowl. Management 263-270

  • Freund Y, Schapire RE (1999) A short introduction to boosting. J Japanese Soc AI 14:771–780

    Google Scholar 

  • Geller RJ (1997) Earthquake prediction: a critical review. Geophys J Int 131:425–450

    Article  Google Scholar 

  • Glez-Peña D, Laurenco A, Lopez-Fernandez H, Reboiro-Jato M, Fdez-Riverola F (2013) Web scraping technologies in an API world. Brief Bioinform 15:788–797

    Article  Google Scholar 

  • Grimmer J, Stewart BM (2013) Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit Anal 21:267–297. https://doi.org/10.1093/pan/mps028

    Article  Google Scholar 

  • Grün B, Hornik K (2017). Topic models, package 'topicmodels', available at https://cran.r-project.org/web/packages/topicmodels/ (last assessed August 2018)

  • Guilhem A, Bürgmann R, Freed AM, Tabrez Ali S (2013) Testing the accelerating moment release (AMR) hypothesis in areas of high stress. Geophys J Int 195:785–798. https://doi.org/10.1093/gji/ggt298

    Article  Google Scholar 

  • Hardebeck JL, Felzer KR, Michael AJ (2008) Improved tests reveal that the accelerating moment release hypothesis is statistically insignificant. J Geophys Res 113:B08310. https://doi.org/10.1029/2007JB005410

    Article  Google Scholar 

  • Hechenbichler, K., and K. P. Schliep (2004). Weighted k-nearest-neighbor techniques and ordinal classification. Discussion paper 399, SFB 386, Ludwig-Maximilians University, Munich

  • Hough S (2010) Predicting the unpredictable: the tumultuous science of earthquake prediction. Princeton University Press 272 pp

  • Huang H, Meng L (2018) Slow unlocking processes preceding the 2015 Mw 8.4 Illapel, Chile, earthquake. Geophys Res Lett 45:3914–3922. https://doi.org/10.1029/2018GL077060

    Article  Google Scholar 

  • Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31:651–666. https://doi.org/10.1016/j.patrec.2009.09.011

    Article  Google Scholar 

  • Jiang C, Wu Z (2012) Insights into the long-to-intermediate-term pre-shock accelerating moment release (AMR) from the March 11, 2011, off the Pacific coast of Tohoku, Japan, M9 earthquake. Earth Planets Space 64:765–769

    Article  Google Scholar 

  • Jiang C, Wu Z (2013) Intermediate-term medium-range precursory accelerating seismicity prior to the 12 May 2008, Wenchuan earthquake. Pure Appl Geophys 170:209–219. https://doi.org/10.1007/s00024-011-0413-0

    Article  Google Scholar 

  • Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. Mach Learn ECML-98:137–142

    Google Scholar 

  • Karakaisis GF, Parazachos CB, Scordilis EM (2013) Recent reliable observations and improved tests on synthetic catalogs with spatiotemporal clustering verify precursory decelerating-accelerating seismicity. J Seismol 17:1063–1072. https://doi.org/10.1007/s10950-013-9372-5

    Article  Google Scholar 

  • Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab—an S4 package for Kernelt methods in R. J Stat Softw 11:1–20

    Article  Google Scholar 

  • Kazemian J, Hatami MR (2017) Temporal variations of seismic parameters in Tehran region. Pure Appl Geophys 174:3841–3852. https://doi.org/10.1007/s00024-017-1549-3

    Article  Google Scholar 

  • Kharde VA, Sonawane SS (2016) Sentiment analysis of Twitter data: a survey of techniques. Int J Comput Appl 139:5–15

    Google Scholar 

  • King GCP (1983) The accommodation of large strains in the upper lithosphere of the earth and other solids by self-similar fault systems: the geometrical origin of b-value. Pure Appl Geophys 121:761–815

    Article  Google Scholar 

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI'95 Proceed 14th Int Joint Conf AI 2:1137–1143

    Google Scholar 

  • Kuhn T (1970) The structure of scientific revolutions, enlarged. In: International encyclopedia of unified science, 2nd edn. The University of Chicago Press 210 pp

  • Lagios E, Papadimitriou P, Novali F, Sakkas V, Fumagalli A, Vlachou K, Del Conte S (2012) Combined seismicity pattern analysis, DGPS and PSInSAR studies in the broader area of Cephalonia (Greece). Tectonophysics 524-525:43–58. https://doi.org/10.1016/j.tecto.2011.12.015

    Article  Google Scholar 

  • Liaw A, Wiener M (2018). Breiman and Cutler's random forests for classification and regression, package 'randomForest', available at https://cran.r-project.org/web/packages/randomForest/ (last assessed August 2018)

  • Mignan A (2011) Retrospective on the accelerating seismic release (ASR) hypothesis: controversy and new horizons. Tectonophysics 505:1–16. https://doi.org/10.1016/j.tecto.2011.03.010

    Article  Google Scholar 

  • Mignan A (2012) Seismicity precursors to large earthquakes unified in a stress accumulation framework. Geophys Res Lett 39:L21308. https://doi.org/10.1029/2012GL053946

  • Mignan A (2014) The debate on the prognostic value of earthquake foreshocks: a meta-analysis. Sci Rep 4:4099. https://doi.org/10.1038/srep04099

    Article  Google Scholar 

  • Mignan A (2015) Modeling aftershocks as a stretched exponential relaxation. Geophys Res Lett 42:9726–9732. https://doi.org/10.1002/2015GL066232

    Article  Google Scholar 

  • Mignan A, King GCP, Bowman D (2007) A mathematical formulation of accelerating moment release based on the stress accumulation model. J Geophys Res 112:B07308. https://doi.org/10.1029/2006JB004671

    Article  Google Scholar 

  • Mouselimis L (2018). Kernel k nearest neighbors, package 'KernelKnn', available at https://cran.r-project.org/web/packages/KernelKnn/ (last assessed August 2018)

  • Ng AY, Jordan MI (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Adv Neural Inf Proces Syst 14:605–610

    Google Scholar 

  • Ng S-K, Wong M (1999) Toward routine automatic pathway discovery from on-line scientific text abstracts. Genome Inform 10:104–112

    Google Scholar 

  • Ogata Y (1988) Statistical models for earthquake occurrences and residual analysis for point processes. J Am Stat Assoc 83:9–27

    Article  Google Scholar 

  • Papadopoulos GA (1988) Long-term accelerating foreshock activity may indicate the occurrence time of a strong shock in the Western Hellenic Arc. Tectonophysics 152:179–192

    Article  Google Scholar 

  • Papazachos BC, Karakaisis GF, Papazachos CB, Scordilis EM (2007) Evaluation of the results for an intermediate-term prediction of the 8 January 2006 Mw 6.9 Cythera earthquake in the southwestern Aegean. Bull Seismol Soc Am 97:347–352. https://doi.org/10.1785/0120060075

    Article  Google Scholar 

  • Pearce D, Rantala V (1983) New foundations for metascience. Synthese 56:1–26

    Google Scholar 

  • Pliakis D, Papakostas T, Vallianatos F (2012) A first principles approach to understand the physics of precursory accelerating seismicity. Ann Geophys 55:165–170. https://doi.org/10.4401/ag-5363

    Google Scholar 

  • Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33:1–39. https://doi.org/10.1007/s10462-009-9124-7

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagation errors. Nature 323:533–536

    Article  Google Scholar 

  • Salton G, McGill M (eds) (1983) Introduction to modern information retrieval. McGraw-Hill

  • Sammis CG, Sornette D (2002) Positive feedback, memory, and the predictability of earthquakes. PNAS 99:2501–2508. https://doi.org/10.1073/pnas.012580999

    Article  Google Scholar 

  • Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34:1–47

    Article  Google Scholar 

  • Seif S, Mignan A, Zechar JD, Werner MJ, Wiemer S (2017) Estimating ETAS: the effects of truncation, missing data, and model assumptions. J Geophys Res Solid Earth 122:449–469. https://doi.org/10.1002/2016JB012809

    Article  Google Scholar 

  • Seif S, Zechar JD, Mignan A, Nandan S, Wiemer S (2018) Foreshocks and their potential deviation from general seismicity. Bull Seismol Soc Am 109:1–18. https://doi.org/10.1785/0120170188

    Article  Google Scholar 

  • Sornette D (2000) Critical phenomena in natural sciences, chaos, fractal, self-organization and disorder: concepts and tools. Springer 434 pp

  • Steinwart I, Christmann A (2008) Support vector machines, information science and statistics. Springer 601 pp

  • Tsytsarau M, Palpanas T (2012) Survey on mining subjective data on the web. Data Lin Knowl Disc 24:478–514. https://doi.org/10.1007/s10618-011-0238-6

    Article  Google Scholar 

  • Welbers K, Van Atteveldt W, Benoit K (2017) Text analysis in R. Commun Methods Meas 11:245–265. https://doi.org/10.1080/19312458.2017.1387238

    Article  Google Scholar 

Download references

Acknowledgments

I thank Pablo Nieto and Marco Broccardo for discussions on the topic of text classification, as well as reviewer Riccardo Zaccarelli for his valuable comments.

Data and resources

All the corpus articles are available on journal websites. The corpus meta-data and labelling are provided in the supplementary material to this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Mignan.

Electronic supplementary material

ESM 1

(DOCX 24 kb)

ESM 2

(JSON 171 kb)

ESM 3

(JSON 5 kb)

ESM 4

(JSON 5 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mignan, A. A preliminary text classification of the precursory accelerating seismicity corpus: inference on some theoretical trends in earthquake predictability research from 1988 to 2018. J Seismol 23, 771–785 (2019). https://doi.org/10.1007/s10950-019-09833-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10950-019-09833-2

Keywords

Navigation