research-article

Classification of Protein Crystallization Images using EfficientNet with Data Augmentation

Authors:
David William Edwards II

Troy University, USA

Troy University, USA
View Profile

,
Imren Dinc

Troy University, USA

Troy University, USA
View Profile

CSBio2020: CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and BioinformaticsNovember 2020Pages 54–60https://doi.org/10.1145/3429210.3429220

Published:20 November 2020Publication History

CSBio2020: CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics

Pages 54–60

ABSTRACT

In this paper, we applied EfficientNet, a scalable deep convolution neural network, with a custom data augmentation stage to a public protein crystallization image dataset called MARCO. The MARCO dataset has 493,214 protein crystallization images collected from several well-known institutions. In our experiments, EfficientNet outperformed the accuracies reported in the previous studies, and it reached an overall 96.71% testing and 91.33% validation accuracy on the dataset. Also, EfficientNet achieved 97.23% crystal detection accuracy in testing data, which is significant improvement over existing studies.

References

Andrew E Bruno, Patrick Charbonneau, Janet Newman, Edward H Snell, David R So, Vincent Vanhoucke, Christopher J Watkins, Shawn Williams, and Julie Wilson. 2018. Classification of crystallization outcomes using deep convolutional neural networks. PLOS one 13, 6 (2018), e0198883.Google ScholarCross Ref
Andrew E Bruno, Patrick Charbonneau, Janet Newman, Edward H Snell, David R So, Vincent Vanhoucke, Christopher J Watkins, Shawn Williams, and Julie Wilson. 2018. MARCO Dataset. (2018).Google Scholar
CrystalTrak. 2009. X-ray crystallography - CrystalTrak software user manual. http://xray.dhvi.duke.edu/files/documents/training%20-%20CrystalTrak.pdf. Accessed: 05-09-2020.Google Scholar
DeepCrystal. 2017. DeepCrystal Tool Website. https://www.deepcrystal.com. Accessed: 05-09-2020.Google Scholar
Formulatrix. 2002. Rock Maker official website. http://formulatrix.com/protein-crystallization/products/rock-maker/index.html. Accessed: 05-09-2020.Google Scholar
Richard Giegé. 2013. A historical perspective on protein crystallization from 1840 to the present day. FEBS Journal 280, 24 (2013), 6456–6497.Google ScholarCross Ref
Scott Harrison, Brian Lahue, Zhengwei Peng, Anthony Donofrio, Charlie Chang, and Meir Glick. 2017. Extending ‘predict first’to the design-make-test cycle in small-molecule drug discovery.Google Scholar
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.Google ScholarCross Ref
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).Google Scholar
Alexander McPherson and Bob Cudney. 2014. Optimization of crystallization conditions for biological macromolecules. Structural Biology and Crystallization Communications 70, 11(2014), 1445–1467.Google ScholarCross Ref
Janet Newman, Evan E Bolton, Jochen Müller-Dieckmann, Vincent J Fazio, DTravis Gallagher, David Lovell, Joseph R Luft, Thomas S Peat, David Ratcliffe, Roger A Sayle, 2012. On the need for an international effort to capture, share and use crystallization screening data. Acta Crystallographica Section F: Structural Biology and Crystallization Communications 68, 3 (2012), 253–258.Google ScholarCross Ref
David W Opitz and Jude W Shavlik. 1996. Actively searching for an effective neural network ensemble. Connection Science 8, 3-4 (1996), 337–354.Google ScholarCross Ref
Jathushan Rajasegaran, Vinoj Jayasundara, Sandaru Jayasekara, Hirunima Jayasekara, Suranga Seneviratne, and Ranga Rodrigo. 2019. Deepcaps: Going deeper with capsule networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10725–10733.Google ScholarCross Ref
Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2017. Searching for activation functions. arXiv preprint arXiv:1710.05941(2017).Google Scholar
Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. 2017. Dynamic routing between capsules. In Advances in neural information processing systems. 3856–3866. Google ScholarDigital Library
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Madhav Sigdel, Imren Dinc, Madhu S Sigdel, Semih Dinc, Marc L Pusey, and Ramazan S Aygun. 2017. Feature analysis for classification of trace fluorescent labeled protein crystallization images. BioData mining 10, 1 (2017), 14.Google Scholar
Madhav Sigdel, Marc L. Pusey, and Ramazan S. Aygun. 2015. CrystPro: Spatiotemporal Analysis of Protein Crystallization Images. Crystal Growth & Design 15, 11 (2015), 5254–5262.Google ScholarCross Ref
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence. Google ScholarDigital Library
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818–2826.Google ScholarCross Ref
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. 2019. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2820–2828.Google ScholarCross Ref
Mingxing Tan and Quoc V Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv preprint arXiv:1905.11946(2019).Google Scholar
Julie Wilson. 2006. Automated classification of images from crystallisation experiments. In Industrial Conference on Data Mining. Springer, 459–473. Google ScholarDigital Library
Sebastien C Wong, Adam Gatt, Victor Stamatescu, and Mark D McDonnell. 2016. Understanding data augmentation for classification: when to warp?. In 2016 international conference on digital image computing: techniques and applications (DICTA). IEEE, 1–6.Google ScholarCross Ref
Zhilu Zhang and Mert Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in neural information processing systems. 8778–8788. Google ScholarDigital Library

Classification of Protein Crystallization Images using EfficientNet with Data Augmentation
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

On the performance evaluation of object classification models in low altitude aerial data
Abstract
This paper compares the classification performance of machine learning classifiers vs. deep learning-based handcrafted models and various pretrained deep networks. The proposed study performs a comprehensive analysis of object classification ...
Read More
Studying the Formation Process of Clusters in Solution at the Initial Stage of Protein Crystallization
ICBEB '12: Proceedings of the 2012 International Conference on Biomedical Engineering and Biotechnology

The monomers or aggregates of protein molecules in protein crystallization solution at the initial stage of crystallization process are the basis of subsequent nucleation process. Tanaka et al (Tanaka S., et al, J. Cryst. Growth, 1996) concluded that ...
Read More
Generative adversarial network based synthetic data training model for lightweight convolutional neural networks
Abstract
Inadequate training data is a significant challenge for deep learning techniques, particularly in applications where data is difficult to get, and publicly available datasets are uncommon owing to ethical and privacy concerns. Various approaches, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

CSBio2020: CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics
November 2020
110 pages
ISBN:9781450388238
DOI:10.1145/3429210

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 November 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
convolutional neural networks
deep learning
efficientnet
protein crystallization
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate23of37submissions,62%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 106
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Classification of Protein Crystallization Images using EfficientNet with Data Augmentation

CSBio2020: CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics

ABSTRACT

References

Cited By

Recommendations

On the performance evaluation of object classification models in low altitude aerial data

Studying the Formation Process of Clusters in Solution at the Initial Stage of Protein Crystallization

Generative adversarial network based synthetic data training model for lightweight convolutional neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Classification of Protein Crystallization Images using EfficientNet with Data Augmentation

CSBio2020: CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics

ABSTRACT

References

Cited By

Recommendations

On the performance evaluation of object classification models in low altitude aerial data

Studying the Formation Process of Clusters in Solution at the Initial Stage of Protein Crystallization

Generative adversarial network based synthetic data training model for lightweight convolutional neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media