ABSTRACT
In this paper, we applied EfficientNet, a scalable deep convolution neural network, with a custom data augmentation stage to a public protein crystallization image dataset called MARCO. The MARCO dataset has 493,214 protein crystallization images collected from several well-known institutions. In our experiments, EfficientNet outperformed the accuracies reported in the previous studies, and it reached an overall 96.71% testing and 91.33% validation accuracy on the dataset. Also, EfficientNet achieved 97.23% crystal detection accuracy in testing data, which is significant improvement over existing studies.
- Andrew E Bruno, Patrick Charbonneau, Janet Newman, Edward H Snell, David R So, Vincent Vanhoucke, Christopher J Watkins, Shawn Williams, and Julie Wilson. 2018. Classification of crystallization outcomes using deep convolutional neural networks. PLOS one 13, 6 (2018), e0198883.Google ScholarCross Ref
- Andrew E Bruno, Patrick Charbonneau, Janet Newman, Edward H Snell, David R So, Vincent Vanhoucke, Christopher J Watkins, Shawn Williams, and Julie Wilson. 2018. MARCO Dataset. (2018).Google Scholar
- CrystalTrak. 2009. X-ray crystallography - CrystalTrak software user manual. http://xray.dhvi.duke.edu/files/documents/training%20-%20CrystalTrak.pdf. Accessed: 05-09-2020.Google Scholar
- DeepCrystal. 2017. DeepCrystal Tool Website. https://www.deepcrystal.com. Accessed: 05-09-2020.Google Scholar
- Formulatrix. 2002. Rock Maker official website. http://formulatrix.com/protein-crystallization/products/rock-maker/index.html. Accessed: 05-09-2020.Google Scholar
- Richard Giegé. 2013. A historical perspective on protein crystallization from 1840 to the present day. FEBS Journal 280, 24 (2013), 6456–6497.Google ScholarCross Ref
- Scott Harrison, Brian Lahue, Zhengwei Peng, Anthony Donofrio, Charlie Chang, and Meir Glick. 2017. Extending ‘predict first’to the design-make-test cycle in small-molecule drug discovery.Google Scholar
- Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.Google ScholarCross Ref
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).Google Scholar
- Alexander McPherson and Bob Cudney. 2014. Optimization of crystallization conditions for biological macromolecules. Structural Biology and Crystallization Communications 70, 11(2014), 1445–1467.Google ScholarCross Ref
- Janet Newman, Evan E Bolton, Jochen Müller-Dieckmann, Vincent J Fazio, DTravis Gallagher, David Lovell, Joseph R Luft, Thomas S Peat, David Ratcliffe, Roger A Sayle, 2012. On the need for an international effort to capture, share and use crystallization screening data. Acta Crystallographica Section F: Structural Biology and Crystallization Communications 68, 3 (2012), 253–258.Google ScholarCross Ref
- David W Opitz and Jude W Shavlik. 1996. Actively searching for an effective neural network ensemble. Connection Science 8, 3-4 (1996), 337–354.Google ScholarCross Ref
- Jathushan Rajasegaran, Vinoj Jayasundara, Sandaru Jayasekara, Hirunima Jayasekara, Suranga Seneviratne, and Ranga Rodrigo. 2019. Deepcaps: Going deeper with capsule networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10725–10733.Google ScholarCross Ref
- Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2017. Searching for activation functions. arXiv preprint arXiv:1710.05941(2017).Google Scholar
- Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. 2017. Dynamic routing between capsules. In Advances in neural information processing systems. 3856–3866. Google ScholarDigital Library
- Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Madhav Sigdel, Imren Dinc, Madhu S Sigdel, Semih Dinc, Marc L Pusey, and Ramazan S Aygun. 2017. Feature analysis for classification of trace fluorescent labeled protein crystallization images. BioData mining 10, 1 (2017), 14.Google Scholar
- Madhav Sigdel, Marc L. Pusey, and Ramazan S. Aygun. 2015. CrystPro: Spatiotemporal Analysis of Protein Crystallization Images. Crystal Growth & Design 15, 11 (2015), 5254–5262.Google ScholarCross Ref
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence. Google ScholarDigital Library
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818–2826.Google ScholarCross Ref
- Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. 2019. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2820–2828.Google ScholarCross Ref
- Mingxing Tan and Quoc V Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv preprint arXiv:1905.11946(2019).Google Scholar
- Julie Wilson. 2006. Automated classification of images from crystallisation experiments. In Industrial Conference on Data Mining. Springer, 459–473. Google ScholarDigital Library
- Sebastien C Wong, Adam Gatt, Victor Stamatescu, and Mark D McDonnell. 2016. Understanding data augmentation for classification: when to warp?. In 2016 international conference on digital image computing: techniques and applications (DICTA). IEEE, 1–6.Google ScholarCross Ref
- Zhilu Zhang and Mert Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in neural information processing systems. 8778–8788. Google ScholarDigital Library
- Classification of Protein Crystallization Images using EfficientNet with Data Augmentation
Recommendations
On the performance evaluation of object classification models in low altitude aerial data
AbstractThis paper compares the classification performance of machine learning classifiers vs. deep learning-based handcrafted models and various pretrained deep networks. The proposed study performs a comprehensive analysis of object classification ...
Studying the Formation Process of Clusters in Solution at the Initial Stage of Protein Crystallization
ICBEB '12: Proceedings of the 2012 International Conference on Biomedical Engineering and BiotechnologyThe monomers or aggregates of protein molecules in protein crystallization solution at the initial stage of crystallization process are the basis of subsequent nucleation process. Tanaka et al (Tanaka S., et al, J. Cryst. Growth, 1996) concluded that ...
Generative adversarial network based synthetic data training model for lightweight convolutional neural networks
AbstractInadequate training data is a significant challenge for deep learning techniques, particularly in applications where data is difficult to get, and publicly available datasets are uncommon owing to ethical and privacy concerns. Various approaches, ...
Comments