Skip to main content
Log in

Research on digital media animation control technology based on recurrent neural network using speech technology

  • Original article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

A vivid and lifelike virtual speaker can attract the user's attention, and the construction of a lifelike virtual speaker not only requires a beautiful static appearance, but also has mouth movements, facial expressions and body movements that are truly synchronized with the voice. Virtual speaker refers to a technology in which a computer generates an animated facial image that can speak. In order to add special effects such as image editing and beautification in the broadcast screen. This paper proposes a voice-driven facial animation synthesis method based on deep BLSTM. A Neural Network BLSTM-RNN Using Audio-Visual Dual Modal Information Training of Speakers, uses the active appearance model to model the face image, and uses the AAM model parameters as Network output, to study the influence of network structure and input of different voice features on the effect of animation synthesis. The experimental results based on the LIPS2008 standard evaluation library show that the network effect with BLSTM layer is obviously better than that of forward network, and the three-layer model structure based on BLSTM—forward- BLSTM 256 node (BFB256) is the best. FBank, fundamental frequency and energy combination can further improve animation synthesis effect. The main aim of this paper is to study the method of speech-driven facial animation synthesis based on deep BLSTM-RNN, and tries the synthesis effect of different neural network structures and different speech features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Chen H, Wu L, Dou Q, Qin J, Li S, Cheng JZ, Ni D, Heng PA (2017) Ultrasound standard plane detection using a composite neural network framework. IEEE Trans Cybern 47(6):1576–1586

    Article  Google Scholar 

  • Choi K, Fazekas G, Sandler M, Cho K (2017) Convolutional recurrent neural networks for music classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp. 2392–2396

  • Cui Y, Zhao S, Wang H, Xie L, Chen Y, Han J, Guo L, Zhou F, Liu T (2018) Identifying brain networks at multiple time scales via deep recurrent neural network. IEEE J Biomed Health Inform 23(6):2515–2525

    Article  Google Scholar 

  • Deshmukh S, Thirupathi Rao K, Shabaz M (2021) Collaborative learning based straggler prevention in large-scale distributed computing framework. In: Kaur M (ed) Security and communication networks, vol 2021. Hindawi Limited, London, pp 1–9. https://doi.org/10.1155/2021/8340925

    Chapter  Google Scholar 

  • Du C, Huang L (2018) Text classification research with attention-based recurrent neural networks. Int J Comput Commun Control 13(1):50–61

    Article  Google Scholar 

  • Duraisamy S, Pugalendhi GK, Balaji P (2019) Reducing energy consumption of wireless sensor networks using rules and extreme learning machine algorithm. J Eng 2019(9):5443–5448

    Article  Google Scholar 

  • Gu K, Zhou Y, Huang T (2020) FLNet: landmark driven fetching and learning network for faithful talking facial animation synthesis. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, no 07. pp 10861–10868

  • Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Intern J Uncertain Fuzziness Knowl-Based Syst 6(02):107–116

    Article  Google Scholar 

  • Hu K, Wang Z, Wang W, Martens KAE, Wang L, Tan T, Lewis SJ, Feng DD (2019) Graph sequence recurrent neural network for vision-based freezing of gait detection. IEEE Trans Image Process 29:1890–1901

    Article  MathSciNet  Google Scholar 

  • Lee K, Lee S, Lee J (2018) Interactive character animation by learning multi-objective control. ACM Trans Graph (TOG) 37(6):1–10

    Google Scholar 

  • Li G, Kawan B, Wang H, Zhang H (2017) Neural-network-based modelling and analysis for time series prediction of ship motion. Ship Technol Res 64(1):30–39

    Article  Google Scholar 

  • Ma L, Zhang Y (2021) Research on vehicle license plate recognition technology based on deep convolutional neural networks. Microprocess Microsyst 82:103932

    Article  Google Scholar 

  • Mahajan K, Garg U, Shabaz M (2021) CPIDM: a clustering-based profound iterating deep learning model for HSI segmentation. In: Shanmuganathan V (ed) Wireless communications and mobile computing, vol 2021. Hindawi Limited, London, pp 1–12. https://doi.org/10.1155/2021/7279260

    Chapter  Google Scholar 

  • Prakash C, Kumar R, Mittal N (2018) Recent developments in human gait research: parameters, approaches, applications, machine learning techniques, datasets and challenges. Artif Intell Rev 49(1):1–40

    Article  Google Scholar 

  • Prasanalakshmi B, Kannammal A, Sridevi R (2011) Frequency domain combination for preserving data in space specified token with high security. Informatics engineering and information science. Springer, Berlin, Heidelberg, pp 319–330. https://doi.org/10.1007/978-3-642-25327-0_28

    Chapter  Google Scholar 

  • Prasanalakshmi B, Kannammal A (2013) ECC based biometric encryption of compressed image for security over network channels. In: Proceedings of the fourth international conference on signal and image processing 2012 (ICSIP 2012). Springer, India, pp 343–351

  • Rakhra M, Singh R, Lohani TK, Shabaz M (2021) Metaheuristic and machine learning-based smart engine for renting and sharing of agriculture equipment. In: Singh D (ed) Mathematical problems in engineering, vol 2021. Hindawi Limited, London, pp 1–13. https://doi.org/10.1155/2021/5561065

    Chapter  Google Scholar 

  • Sadiq R, Erzin E (2020) Emotion dependent domain adaptation for speech driven affective facial feature synthesis. IEEE Trans Affect Comput

  • Said AI, El-Emary TI (2020) Diastereoselective synthesis of atropisomeric pyrazolyl pyrrolo [3, 4-d] isoxazolidines via pyrazolyl nitrone cycloaddition to facially divergent maleimides: intensive NMR and DFT studies. RSC Adv 10(2):845–850

    Article  Google Scholar 

  • Sha Y, Wang MD (2017) Interpretable predictions of clinical outcomes with an attention-based recurrent neural network. In: Proceedings of the 8th ACM international conference on bioinformatics, computational biology, and health informatics. pp 233–240

  • Sharma A, Kumar R (2017) A framework for pre-computated multi-constrained quickest QoS path algorithm. J Telecommun Electron Comput Eng 9(3–6):73–77

    Google Scholar 

  • Sharma L, Sharma A, Dash AK, Bisht GS, Gupta GL (2021) A standardized polyherbal preparation POL-6 diminishes alcohol withdrawal anxiety by regulating Gabra1, Gabra2, Gabra3, Gabra4, Gabra5 gene expression of GABA A receptor signaling pathway in rats. BMC Complement Med Ther 21(1):1–15

    Article  Google Scholar 

  • Spoerer CJ, Kietzmann TC, Mehrer J, Charest I, Kriegeskorte N (2020) Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision. PLoS Comput Biol 16(10):e1008215

    Article  Google Scholar 

  • Tang S, Shabaz M (2021) A new face image recognition algorithm based on cerebellum-basal ganglia mechanism. In: Chakraborty C (ed) Journal of healthcare engineering, vol 2021. Hindawi Limited, London, pp 1–11. https://doi.org/10.1155/2021/3688881

    Chapter  Google Scholar 

  • Thangthai A, Milner B, Taylor S (2019) Synthesising visual speech using dynamic visemes and deep learning architectures. Comput Speech Lang 55:101–119

    Article  Google Scholar 

  • Trisiana A, Sutikno A, Wicaksono AG (2020) Digital media-based character education model as a learning innovation in the midst of a corona pandemic. Webology 17(2)

  • Wang H, Zhao S, Dong Q, Cui Y, Chen Y, Han J, Xie L, Liu T (2018) Recognizing brain states using deep sparse recurrent neural network. IEEE Trans Med Imaging 38(4):1058–1068

    Article  Google Scholar 

  • Wang B, Yao X, Jiang Y, Sun C, Shabaz M (2021) Design of a real-time monitoring system for smoke and dust in thermal power plants based on improved genetic algorithm. In: Singh D (ed) Journal of healthcare engineering, vol 2021. Hindawi Limited, London, pp 1–10. https://doi.org/10.1155/2021/7212567

    Chapter  Google Scholar 

  • Wang Y, Neves L, Metze F (2016) Audio-based multimedia event detection using deep recurrent neural networks. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2742–2746

  • Wang M, Bradley D, Zafeiriou S, Beeler T (2020) Facial expression synthesis using a global‐local multilinear framework. In: Computer graphics forum, vol 39, no 2. pp 235–245

  • Wong WC, Chee E, Li J, Wang X (2018) Recurrent neural network-based model predictive control for continuous pharmaceutical manufacturing. Mathematics 6(11):242

    Article  Google Scholar 

  • Xia B, Li Y, Li Q, Li T (2017) Attention-based recurrent neural network for location recommendation. In: 2017 12th international conference on intelligent systems and knowledge engineering (ISKE). IEEE, pp. 1–6

  • Yadav SP, Zaidi S, Mishra A, Yadav V (2021) Survey on machine learning in speech emotion recognition and vision systems using a recurrent neural network (RNN). Arch Comput Methods Eng 1–18

  • Yan Z, Yu Y, Shabaz M (2021) Optimization research on deep learning and temporal segmentation algorithm of video shot in basketball games. In: Gupta SK (ed) Computational intelligence and neuroscience, vol 2021. Hindawi Limited, London, pp 1–10. https://doi.org/10.1155/2021/4674140

    Chapter  Google Scholar 

  • Yu L, Yu J, Ling Q (2018) BLTRCNN-based 3-D articulatory movement prediction: learning articulatory synchronicity from both text and audio inputs. IEEE Trans Multimed 21(7):1621–1632

    Article  Google Scholar 

  • Zhang QJ, Gupta KC, Devabhaktuni VK (2003) Artificial neural networks for RF and microwave design-from theory to practice. IEEE Trans Microw Theory Tech 51(4):1339–1350

    Article  Google Scholar 

  • Zhang Z, Beck A, Magnenat-Thalmann N (2014) Human-like behavior generation based on head-arms model for robot tracking external targets and body parts. IEEE Trans Cybern 45(8):1390–1400

    Article  Google Scholar 

  • Zhou F (2020) Practical exploration of the blended teaching of online and offline teaching for digital media art majors—taking flash graphic animation design as an example. Adv Soc Sci 09(9):1365–1369

    Google Scholar 

  • Zhou X, Ling ZH, Dai LR (2020) Learning and modeling unit embeddings using deep neural networks for unit-selection-based mandarin speech synthesis. ACM Trans Asian Low-Resour Lang Inf Process (TALLIP) 19(3):1–14

    Article  Google Scholar 

  • Zhou Y, Mishra S, Gligorijevic J, Bhatia T, Bhamidipati N (2019) Understanding consumer journey using attention based recurrent neural networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp 3102–3111

  • Zhu X, Fu B, Yang Y, Ma Y, Hao J, Chen S, Liu S, Li T, Liu S, Guo W, Liao Z (2019) Attention-based recurrent neural network for influenza epidemic prediction. In: BMC Bioinformatics, vol 20. Springer, New York. doi:https://doi.org/10.1186/s12859-019-3131-8

Download references

Acknowledgements

The research is supported by postdoc fellowship granted by the Institute of Computer Technologies and Information Security, Southern Federal University, project No PD/20-03-KT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Shabaz.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest and all ethical issues including human or animal participation has been done. No such consent is applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Sharma, A. & Shabaz, M. Research on digital media animation control technology based on recurrent neural network using speech technology. Int J Syst Assur Eng Manag 13 (Suppl 1), 564–575 (2022). https://doi.org/10.1007/s13198-021-01540-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-021-01540-x

Keywords

Navigation