skip to main content
10.1145/3472749.3474742acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article
Open Access

Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones

Authors Info & Claims
Published:12 October 2021Publication History

ABSTRACT

Editing operations such as cut, copy, paste, and correcting errors in typed text are often tedious and challenging to perform on smartphones. In this paper, we present VT, a voice and touch-based multi-modal text editing and correction method for smartphones. To edit text with VT, the user glides over a text fragment with a finger and dictates a command, such as ”bold” to change the format of the fragment, or the user can tap inside a text area and speak a command such as ”highlight this paragraph” to edit the text. For text correcting, the user taps approximately at the area of erroneous text fragment and dictates the new content for substitution or insertion. VT combines touch and voice inputs with language context such as language model and phrase similarity to infer a user’s editing intention, which can handle ambiguities and noisy input signals. It is a great advantage over the existing error correction methods (e.g., iOS’s Voice Control) which require precise cursor control or text selection. Our evaluation shows that VT significantly improves the efficiency of text editing and text correcting on smartphones over the touch-only method and the iOS’s Voice Control method. Our user studies showed that VT reduced the text editing time by 30.80%, and text correcting time by 29.97% over the touch-only method. VT reduced the text editing time by 30.81%, and text correcting time by 47.96% over the iOS’s Voice Control method.

References

  1. Ohoud Alharbi, Ahmed Sabbir Arif, Wolfgang Stuerzlinger, Mark D. Dunlop, and Andreas Komninos. 2019. WiseType: A Tablet Keyboard with Color-Coded Visualization and Various Editing Options for Error Correction. In Proceedings of the 45th Graphics Interface Conference on Proceedings of Graphics Interface 2019 (Kingston, Canada) (GI’19). Canadian Human-Computer Communications Society, Waterloo, CAN, Article 4, 10 pages. https://doi.org/10.20380/GI2019.04Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jessalyn Alvina, Carla F. Griggio, Xiaojun Bi, and Wendy E. Mackay. 2017. CommandBoard: Creating a General-Purpose Command Gesture Input Space for Soft Keyboard. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (Québec City, QC, Canada) (UIST ’17). ACM, New York, NY, USA, 17–28. https://doi.org/10.1145/3126594.3126639Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Apple. 2018. About the keyboards settings on your iPhone, iPad, and iPod touch. https://support.apple.com/en-us/HT202178. [Online; accessed 22-August-2019].Google ScholarGoogle Scholar
  4. Ahmed Sabbir Arif, Sunjun Kim, Wolfgang Stuerzlinger, Geehyuk Lee, and Ali Mazalek. 2016. Evaluation of a Smart-Restorable Backspace Technique to Facilitate Text Entry Error Correction. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 5151–5162. https://doi.org/10.1145/2858036.2858407Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Xiaojun Bi, Yang Li, and Shumin Zhai. 2013. FFitts Law: Modeling Finger Touch with Fitts’ Law. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). Association for Computing Machinery, New York, NY, USA, 1363–1372. https://doi.org/10.1145/2470654.2466180Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Xiaojun Bi, Tom Ouyang, and Shumin Zhai. 2014. Both Complete and Correct?: Multi-objective Optimization of Touchscreen Keyboard. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). ACM, New York, NY, USA, 2297–2306. https://doi.org/10.1145/2556288.2557414Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chung-Cheng Chiu, Tara N Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J Weiss, Kanishka Rao, Ekaterina Gonina, 2018. State-of-the-art speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4774–4778.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Wenzhe Cui, Jingjie Zheng, Blaine Lewis, Daniel Vogel, and Xiaojun Bi. 2019. HotStrokes: Word-Gesture Shortcuts on a Trackpad. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). ACM, New York, NY, USA, Article 165, 13 pages. https://doi.org/10.1145/3290605.3300395Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Wenzhe Cui, Suwen Zhu, Mingrui Ray Zhang, H. Andrew Schwartz, Jacob O. Wobbrock, and Xiaojun Bi. 2020. JustCorrect: Intelligent Post Hoc Text Correction Techniques on Smartphones. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology(Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 487–499. https://doi.org/10.1145/3379337.3415857Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mark Davies. 2018. The corpus of contemporary American English: 1990-present.Google ScholarGoogle Scholar
  11. Android developers. 2021. Android EditText. https://developer.android.com/reference/android/widget/EditText. [Online; Accessed: 2021-04-06].Google ScholarGoogle Scholar
  12. Android developers. 2021. Android.Speech. https://developer.android.com/reference/android/speech/package-summary. [Online; Accessed: 2021-04-06].Google ScholarGoogle Scholar
  13. A. D. N. Edwards. 2002. Multimodal Interaction and People with Disabilities. Springer Netherlands, Dordrecht. https://doi.org/10.1007/978-94-017-2367-1_5Google ScholarGoogle Scholar
  14. Michael Fischer, Giovanni Campagna, Silei Xu, and Monica S. Lam. 2018. Brassau: Automatic Generation of Graphical User Interfaces for Virtual Assistants. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services (Barcelona, Spain) (MobileHCI ’18). Association for Computing Machinery, New York, NY, USA, Article 33, 12 pages. https://doi.org/10.1145/3229434.3229481Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Vittorio Fuccella, Poika Isokoski, and Benoit Martin. 2013. Gestures and Widgets: Performance in Text Editing on Multi-touch Capable Mobile Devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). ACM, New York, NY, USA, 2785–2794. https://doi.org/10.1145/2470654.2481385Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Vittorio Fuccella and Benoît Martin. 2017. TouchTap: A Gestural Technique to Edit Text on Multi-Touch Capable Mobile Devices. In Proceedings of the 12th Biannual Conference on Italian SIGCHI Chapter (Cagliari, Italy) (CHItaly ’17). Association for Computing Machinery, New York, NY, USA, Article 21, 6 pages. https://doi.org/10.1145/3125571.3125579Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joshua Goodman, Gina Venolia, Keith Steury, and Chauncey Parker. 2002. Language Modeling for Soft Keyboards. In Proceedings of the 7th International Conference on Intelligent User Interfaces (San Francisco, California, USA) (IUI ’02). ACM, New York, NY, USA, 194–195. https://doi.org/10.1145/502716.502753Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Google. 2021. Get started with Voice Access. https://support.google.com/accessibility/android/answer/6151848?hl=en. [Online; Accessed: 2021-07-18].Google ScholarGoogle Scholar
  19. google.com. 2021. Type with your voice. https://support.google.com/docs/answer/4492226?hl=en#zippy=%2Cselect-text. [Online; accessed 6-April-2021].Google ScholarGoogle Scholar
  20. Christian Holz and Patrick Baudisch. 2011. Understanding Touch. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 2501–2510. https://doi.org/10.1145/1978942.1979308Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. iMore.com. 2021. Everything you can do with Voice Control on iPhone and iPad. https://www.imore.com/everything-you-can-do-voice-control-iphone-and-ipad. [Online; Accessed: 2021-07-18].Google ScholarGoogle Scholar
  22. ExIdeas Inc. 2018. MessagEase - The Smartest Touch Screen keyboard. https://www.exideas.com/ME/index.php. [Online; accessed 22-August-2019].Google ScholarGoogle Scholar
  23. Grammarly Inc.2020. Grammarly Keyboard. https://en.wikipedia.org/wiki/Grammarly [Online; accessed May-2020].Google ScholarGoogle Scholar
  24. Poika Isokoski, Benoît Martin, Paul Gandouly, and Thomas Stephanov. 2010. Motor Efficiency of Text Entry in a Combination of a Soft Keyboard and Unistrokes. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries (Reykjavik, Iceland) (NordiCHI ’10). ACM, New York, NY, USA, 683–686. https://doi.org/10.1145/1868914.1869004Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Michael Johnston, John Chen, Patrick Ehlen, Hyuckchul Jung, Jay Lieske, Aarthi Reddy, Ethan Selfridge, Svetlana Stoyanchev, Brant Vasilieff, and Jay Wilpon. 2014. MVA: The Multimodal Virtual Assistant. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). Association for Computational Linguistics, Philadelphia, PA, U.S.A., 257–259. https://doi.org/10.3115/v1/W14-4335Google ScholarGoogle ScholarCross RefCross Ref
  26. Clare-Marie Karat, Christine Halverson, Daniel Horn, and John Karat. 1999. Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 568–575. https://doi.org/10.1145/302979.303160Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Bryan Klimt and Yiming Yang. 2004. The enron corpus: A new dataset for email classification research. In European Conference on Machine Learning. Springer, 217–226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Andreas Komninos, Mark Dunlop, Kyriakos Katsaris, and John Garofalakis. 2018. A Glimpse of Mobile Text Entry Errors and Corrective Behaviour in the Wild. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (Barcelona, Spain) (MobileHCI ’18). ACM, New York, NY, USA, 221–228. https://doi.org/10.1145/3236112.3236143Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Krahnstoever, S. Kettebekov, M. Yeasin, and R. Sharma. 2002. A Real-Time Framework for Natural Multimodal Interaction with Large Screen Displays. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces(ICMI ’02). IEEE Computer Society, USA, 349. https://doi.org/10.1109/ICMI.2002.1167020Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Per Ola Kristensson and Shumin Zhai. 2007. Command Strokes with and Without Preview: Using Pen Gestures on Keyboard for Command Selection. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). ACM, New York, NY, USA, 1137–1146. https://doi.org/10.1145/1240624.1240797Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Vladimir Iosifovich Levenshtein. 1966. Binary codes capable of correcting deletions, insertions and reversals.Soviet Physics Doklady 10, 8 (feb 1966), 707–710. Doklady Akademii Nauk SSSR, V163 No4 845-848 1965.Google ScholarGoogle Scholar
  32. Toby Jia-Jun Li, Jingya Chen, Haijun Xia, Tom M. Mitchell, and Brad A. Myers. 2020. Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 1094–1107. https://doi.org/10.1145/3379337.3415820Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Google LLC.2020. Gboard. https://en.wikipedia.org/wiki/Gboard [Online; accessed May-2020].Google ScholarGoogle Scholar
  34. Matt Mahoney. 2011. About Text8 file. http://mattmahoney.net/dc/textdata.html. [Online; accessed May-2020].Google ScholarGoogle Scholar
  35. Jennifer Mankoff, Gregory D Abowd, and Scott E Hudson. 2000. OOPS: a toolkit supporting mediation techniques for resolving ambiguity in recognition-based interfaces. Computers & Graphics 24, 6 (2000), 819–834. https://doi.org/10.1016/S0097-8493(00)00085-6 Calligraphic Interfaces: towards a new generation of interactive systems.Google ScholarGoogle ScholarCross RefCross Ref
  36. Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. http://arxiv.org/abs/1301.3781Google ScholarGoogle Scholar
  37. nuance.com. 2021. Dragon Speech Recognition - Get More Done by Voice: Dragon. https://www.nuance.com/dragon.html. [Online; accessed 6-April-2021].Google ScholarGoogle Scholar
  38. Per Ola Kristensson and Keith Vertanen. 2011. Asynchronous Multimodal Text Entry Using Speech and Gesture Keyboards.. In Proceedings of the International Conference on Spoken Language Processing (Florence, Italy). 581–584.Google ScholarGoogle ScholarCross RefCross Ref
  39. Sharon Oviatt and Philip Cohen. 2000. Perceptual User Interfaces: Multimodal Interfaces That Process What Comes Naturally. Commun. ACM 43, 3 (March 2000), 45–53. https://doi.org/10.1145/330534.330538Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sharon Oviatt, Phil Cohen, Lizhong Wu, John Vergo, Lisbeth Duncan, Bernhard Suhm, Josh Bers, Thomas Holzman, Terry Winograd, James Landay, Jim Larson, and David Ferro. 2000. Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions. Hum.-Comput. Interact. 15, 4 (Dec. 2000), 263–322. https://doi.org/10.1207/S15327051HCI1504_1Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sharon Oviatt and Philip R. Cohen. 2015. The Paradigm Shift to Multimodality in Contemporary Computer Interfaces. Morgan & Claypool Publishers.Google ScholarGoogle Scholar
  42. S. Oviatt and R. VanGent. 1996. Error resolution during multimodal human-computer interaction. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, Vol. 1. 204–207 vol.1. https://doi.org/10.1109/ICSLP.1996.607077Google ScholarGoogle ScholarCross RefCross Ref
  43. Kseniia Palin, Anna Feit, Sunjun Kim, Per Ola Kristensson, and Antti Oulasvirta. 2019. How do People Type on Mobile Devices? Observations from a Study with 37,000 Volunteers.. In Proceedings of 21st International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI’19). ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Radiah Rivu, Yasmeen Abdrabou, Ken Pfeuffer, Mariam Hassib, and Florian Alt. 2020. Gaze’N’Touch: Enhancing Text Selection on Mobile Devices Using Gaze. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3334480.3382802Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ritam Jyoti Sarmah, Yunpeng Ding, Di Wang, Cheuk Yin Phipson Lee, Toby Jia-Jun Li, and Xiang ’Anthony’ Chen. 2020. Geno: A Developer Tool for Authoring Multimodal Interaction on Existing Web Applications. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 1169–1181. https://doi.org/10.1145/3379337.3415848Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Khe Chai Sim. 2010. Haptic Voice Recognition: Augmenting speech modality with touch events for efficient speech recognition. In 2010 IEEE Spoken Language Technology Workshop. 73–78. https://doi.org/10.1109/SLT.2010.5700825Google ScholarGoogle ScholarCross RefCross Ref
  47. Khe Chai Sim. 2012. Speak-as-you-swipe (SAYS): A Multimodal Interface Combining Speech and Gesture Keyboard Synchronously for Continuous Mobile Text Entry. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (Santa Monica, California, USA) (ICMI ’12). ACM, New York, NY, USA, 555–560. https://doi.org/10.1145/2388676.2388793Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Shyamli Sindhwani, Christof Lutteroth, and Gerald Weber. 2019. ReType: Quick Text Editing with Keyboard and Gaze. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). ACM, New York, NY, USA, Article 203, 13 pages. https://doi.org/10.1145/3290605.3300433Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Bernhard Suhm, Brad Myers, and Alex Waibel. 2001. Multimodal error correction for speech user interfaces. ACM transactions on computer-human interaction (TOCHI) 8, 1(2001), 60–98.Google ScholarGoogle Scholar
  50. Keith Vertanen, Haythem Memmi, Justin Emge, Shyam Reyal, and Per Ola Kristensson. 2015. VelociTap: Investigating Fast Mobile Text Entry Using Sentence-Based Decoding of Touchscreen Keyboard Input. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). ACM, New York, NY, USA, 659–668. https://doi.org/10.1145/2702123.2702135Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Daniel Vogel and Patrick Baudisch. 2007. Shift: A Technique for Operating Pen-Based Interfaces Using Touch. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). Association for Computing Machinery, New York, NY, USA, 657–666. https://doi.org/10.1145/1240624.1240727Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Klaus Weidner. 2018. Hackers Keyboard. http://code.google.com/p/hackerskeyboard/ [Online; accessed 22-August-2019].Google ScholarGoogle Scholar
  53. Jackie (Junrui) Yang, Monica S. Lam, and James A. Landay. 2020. DoThisHere: Multimodal Interaction to Improve Cross-Application Tasks on Mobile Devices. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 35–44. https://doi.org/10.1145/3379337.3415841Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Mingrui Ray Zhang, He Wen, and Jacob O. Wobbrock. 2019. Type, Then Correct: Intelligent Text Correction Techniques for Mobile Text Entry Using Neural Networks. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (New Orleans, LA, USA) (UIST ’19). Association for Computing Machinery, New York, NY, USA, 843–855. https://doi.org/10.1145/3332165.3347924Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Mingrui Ray Zhang and O. Jacob Wobbrock. 2020. Gedit: Keyboard gestures for mobile text editing. In Proceedings of Graphics Interface (GI ’20)(Toronto, Ontario) (GI ’20). Canadian Information Processing Society, Toronto, Ontario, 97–104.Google ScholarGoogle Scholar
  1. Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format