ABSTRACT
Editing operations such as cut, copy, paste, and correcting errors in typed text are often tedious and challenging to perform on smartphones. In this paper, we present VT, a voice and touch-based multi-modal text editing and correction method for smartphones. To edit text with VT, the user glides over a text fragment with a finger and dictates a command, such as ”bold” to change the format of the fragment, or the user can tap inside a text area and speak a command such as ”highlight this paragraph” to edit the text. For text correcting, the user taps approximately at the area of erroneous text fragment and dictates the new content for substitution or insertion. VT combines touch and voice inputs with language context such as language model and phrase similarity to infer a user’s editing intention, which can handle ambiguities and noisy input signals. It is a great advantage over the existing error correction methods (e.g., iOS’s Voice Control) which require precise cursor control or text selection. Our evaluation shows that VT significantly improves the efficiency of text editing and text correcting on smartphones over the touch-only method and the iOS’s Voice Control method. Our user studies showed that VT reduced the text editing time by 30.80%, and text correcting time by 29.97% over the touch-only method. VT reduced the text editing time by 30.81%, and text correcting time by 47.96% over the iOS’s Voice Control method.
- Ohoud Alharbi, Ahmed Sabbir Arif, Wolfgang Stuerzlinger, Mark D. Dunlop, and Andreas Komninos. 2019. WiseType: A Tablet Keyboard with Color-Coded Visualization and Various Editing Options for Error Correction. In Proceedings of the 45th Graphics Interface Conference on Proceedings of Graphics Interface 2019 (Kingston, Canada) (GI’19). Canadian Human-Computer Communications Society, Waterloo, CAN, Article 4, 10 pages. https://doi.org/10.20380/GI2019.04Google ScholarDigital Library
- Jessalyn Alvina, Carla F. Griggio, Xiaojun Bi, and Wendy E. Mackay. 2017. CommandBoard: Creating a General-Purpose Command Gesture Input Space for Soft Keyboard. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (Québec City, QC, Canada) (UIST ’17). ACM, New York, NY, USA, 17–28. https://doi.org/10.1145/3126594.3126639Google ScholarDigital Library
- Apple. 2018. About the keyboards settings on your iPhone, iPad, and iPod touch. https://support.apple.com/en-us/HT202178. [Online; accessed 22-August-2019].Google Scholar
- Ahmed Sabbir Arif, Sunjun Kim, Wolfgang Stuerzlinger, Geehyuk Lee, and Ali Mazalek. 2016. Evaluation of a Smart-Restorable Backspace Technique to Facilitate Text Entry Error Correction. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 5151–5162. https://doi.org/10.1145/2858036.2858407Google ScholarDigital Library
- Xiaojun Bi, Yang Li, and Shumin Zhai. 2013. FFitts Law: Modeling Finger Touch with Fitts’ Law. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). Association for Computing Machinery, New York, NY, USA, 1363–1372. https://doi.org/10.1145/2470654.2466180Google ScholarDigital Library
- Xiaojun Bi, Tom Ouyang, and Shumin Zhai. 2014. Both Complete and Correct?: Multi-objective Optimization of Touchscreen Keyboard. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). ACM, New York, NY, USA, 2297–2306. https://doi.org/10.1145/2556288.2557414Google ScholarDigital Library
- Chung-Cheng Chiu, Tara N Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J Weiss, Kanishka Rao, Ekaterina Gonina, 2018. State-of-the-art speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4774–4778.Google ScholarDigital Library
- Wenzhe Cui, Jingjie Zheng, Blaine Lewis, Daniel Vogel, and Xiaojun Bi. 2019. HotStrokes: Word-Gesture Shortcuts on a Trackpad. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). ACM, New York, NY, USA, Article 165, 13 pages. https://doi.org/10.1145/3290605.3300395Google ScholarDigital Library
- Wenzhe Cui, Suwen Zhu, Mingrui Ray Zhang, H. Andrew Schwartz, Jacob O. Wobbrock, and Xiaojun Bi. 2020. JustCorrect: Intelligent Post Hoc Text Correction Techniques on Smartphones. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology(Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 487–499. https://doi.org/10.1145/3379337.3415857Google ScholarDigital Library
- Mark Davies. 2018. The corpus of contemporary American English: 1990-present.Google Scholar
- Android developers. 2021. Android EditText. https://developer.android.com/reference/android/widget/EditText. [Online; Accessed: 2021-04-06].Google Scholar
- Android developers. 2021. Android.Speech. https://developer.android.com/reference/android/speech/package-summary. [Online; Accessed: 2021-04-06].Google Scholar
- A. D. N. Edwards. 2002. Multimodal Interaction and People with Disabilities. Springer Netherlands, Dordrecht. https://doi.org/10.1007/978-94-017-2367-1_5Google Scholar
- Michael Fischer, Giovanni Campagna, Silei Xu, and Monica S. Lam. 2018. Brassau: Automatic Generation of Graphical User Interfaces for Virtual Assistants. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services (Barcelona, Spain) (MobileHCI ’18). Association for Computing Machinery, New York, NY, USA, Article 33, 12 pages. https://doi.org/10.1145/3229434.3229481Google ScholarDigital Library
- Vittorio Fuccella, Poika Isokoski, and Benoit Martin. 2013. Gestures and Widgets: Performance in Text Editing on Multi-touch Capable Mobile Devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). ACM, New York, NY, USA, 2785–2794. https://doi.org/10.1145/2470654.2481385Google ScholarDigital Library
- Vittorio Fuccella and Benoît Martin. 2017. TouchTap: A Gestural Technique to Edit Text on Multi-Touch Capable Mobile Devices. In Proceedings of the 12th Biannual Conference on Italian SIGCHI Chapter (Cagliari, Italy) (CHItaly ’17). Association for Computing Machinery, New York, NY, USA, Article 21, 6 pages. https://doi.org/10.1145/3125571.3125579Google ScholarDigital Library
- Joshua Goodman, Gina Venolia, Keith Steury, and Chauncey Parker. 2002. Language Modeling for Soft Keyboards. In Proceedings of the 7th International Conference on Intelligent User Interfaces (San Francisco, California, USA) (IUI ’02). ACM, New York, NY, USA, 194–195. https://doi.org/10.1145/502716.502753Google ScholarDigital Library
- Google. 2021. Get started with Voice Access. https://support.google.com/accessibility/android/answer/6151848?hl=en. [Online; Accessed: 2021-07-18].Google Scholar
- google.com. 2021. Type with your voice. https://support.google.com/docs/answer/4492226?hl=en#zippy=%2Cselect-text. [Online; accessed 6-April-2021].Google Scholar
- Christian Holz and Patrick Baudisch. 2011. Understanding Touch. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 2501–2510. https://doi.org/10.1145/1978942.1979308Google ScholarDigital Library
- iMore.com. 2021. Everything you can do with Voice Control on iPhone and iPad. https://www.imore.com/everything-you-can-do-voice-control-iphone-and-ipad. [Online; Accessed: 2021-07-18].Google Scholar
- ExIdeas Inc. 2018. MessagEase - The Smartest Touch Screen keyboard. https://www.exideas.com/ME/index.php. [Online; accessed 22-August-2019].Google Scholar
- Grammarly Inc.2020. Grammarly Keyboard. https://en.wikipedia.org/wiki/Grammarly [Online; accessed May-2020].Google Scholar
- Poika Isokoski, Benoît Martin, Paul Gandouly, and Thomas Stephanov. 2010. Motor Efficiency of Text Entry in a Combination of a Soft Keyboard and Unistrokes. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries (Reykjavik, Iceland) (NordiCHI ’10). ACM, New York, NY, USA, 683–686. https://doi.org/10.1145/1868914.1869004Google ScholarDigital Library
- Michael Johnston, John Chen, Patrick Ehlen, Hyuckchul Jung, Jay Lieske, Aarthi Reddy, Ethan Selfridge, Svetlana Stoyanchev, Brant Vasilieff, and Jay Wilpon. 2014. MVA: The Multimodal Virtual Assistant. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). Association for Computational Linguistics, Philadelphia, PA, U.S.A., 257–259. https://doi.org/10.3115/v1/W14-4335Google ScholarCross Ref
- Clare-Marie Karat, Christine Halverson, Daniel Horn, and John Karat. 1999. Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 568–575. https://doi.org/10.1145/302979.303160Google ScholarDigital Library
- Bryan Klimt and Yiming Yang. 2004. The enron corpus: A new dataset for email classification research. In European Conference on Machine Learning. Springer, 217–226.Google ScholarDigital Library
- Andreas Komninos, Mark Dunlop, Kyriakos Katsaris, and John Garofalakis. 2018. A Glimpse of Mobile Text Entry Errors and Corrective Behaviour in the Wild. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (Barcelona, Spain) (MobileHCI ’18). ACM, New York, NY, USA, 221–228. https://doi.org/10.1145/3236112.3236143Google ScholarDigital Library
- N. Krahnstoever, S. Kettebekov, M. Yeasin, and R. Sharma. 2002. A Real-Time Framework for Natural Multimodal Interaction with Large Screen Displays. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces(ICMI ’02). IEEE Computer Society, USA, 349. https://doi.org/10.1109/ICMI.2002.1167020Google ScholarDigital Library
- Per Ola Kristensson and Shumin Zhai. 2007. Command Strokes with and Without Preview: Using Pen Gestures on Keyboard for Command Selection. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). ACM, New York, NY, USA, 1137–1146. https://doi.org/10.1145/1240624.1240797Google ScholarDigital Library
- Vladimir Iosifovich Levenshtein. 1966. Binary codes capable of correcting deletions, insertions and reversals.Soviet Physics Doklady 10, 8 (feb 1966), 707–710. Doklady Akademii Nauk SSSR, V163 No4 845-848 1965.Google Scholar
- Toby Jia-Jun Li, Jingya Chen, Haijun Xia, Tom M. Mitchell, and Brad A. Myers. 2020. Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 1094–1107. https://doi.org/10.1145/3379337.3415820Google ScholarDigital Library
- Google LLC.2020. Gboard. https://en.wikipedia.org/wiki/Gboard [Online; accessed May-2020].Google Scholar
- Matt Mahoney. 2011. About Text8 file. http://mattmahoney.net/dc/textdata.html. [Online; accessed May-2020].Google Scholar
- Jennifer Mankoff, Gregory D Abowd, and Scott E Hudson. 2000. OOPS: a toolkit supporting mediation techniques for resolving ambiguity in recognition-based interfaces. Computers & Graphics 24, 6 (2000), 819–834. https://doi.org/10.1016/S0097-8493(00)00085-6 Calligraphic Interfaces: towards a new generation of interactive systems.Google ScholarCross Ref
- Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. http://arxiv.org/abs/1301.3781Google Scholar
- nuance.com. 2021. Dragon Speech Recognition - Get More Done by Voice: Dragon. https://www.nuance.com/dragon.html. [Online; accessed 6-April-2021].Google Scholar
- Per Ola Kristensson and Keith Vertanen. 2011. Asynchronous Multimodal Text Entry Using Speech and Gesture Keyboards.. In Proceedings of the International Conference on Spoken Language Processing (Florence, Italy). 581–584.Google ScholarCross Ref
- Sharon Oviatt and Philip Cohen. 2000. Perceptual User Interfaces: Multimodal Interfaces That Process What Comes Naturally. Commun. ACM 43, 3 (March 2000), 45–53. https://doi.org/10.1145/330534.330538Google ScholarDigital Library
- Sharon Oviatt, Phil Cohen, Lizhong Wu, John Vergo, Lisbeth Duncan, Bernhard Suhm, Josh Bers, Thomas Holzman, Terry Winograd, James Landay, Jim Larson, and David Ferro. 2000. Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions. Hum.-Comput. Interact. 15, 4 (Dec. 2000), 263–322. https://doi.org/10.1207/S15327051HCI1504_1Google ScholarDigital Library
- Sharon Oviatt and Philip R. Cohen. 2015. The Paradigm Shift to Multimodality in Contemporary Computer Interfaces. Morgan & Claypool Publishers.Google Scholar
- S. Oviatt and R. VanGent. 1996. Error resolution during multimodal human-computer interaction. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, Vol. 1. 204–207 vol.1. https://doi.org/10.1109/ICSLP.1996.607077Google ScholarCross Ref
- Kseniia Palin, Anna Feit, Sunjun Kim, Per Ola Kristensson, and Antti Oulasvirta. 2019. How do People Type on Mobile Devices? Observations from a Study with 37,000 Volunteers.. In Proceedings of 21st International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI’19). ACM.Google ScholarDigital Library
- Radiah Rivu, Yasmeen Abdrabou, Ken Pfeuffer, Mariam Hassib, and Florian Alt. 2020. Gaze’N’Touch: Enhancing Text Selection on Mobile Devices Using Gaze. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3334480.3382802Google ScholarDigital Library
- Ritam Jyoti Sarmah, Yunpeng Ding, Di Wang, Cheuk Yin Phipson Lee, Toby Jia-Jun Li, and Xiang ’Anthony’ Chen. 2020. Geno: A Developer Tool for Authoring Multimodal Interaction on Existing Web Applications. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 1169–1181. https://doi.org/10.1145/3379337.3415848Google ScholarDigital Library
- Khe Chai Sim. 2010. Haptic Voice Recognition: Augmenting speech modality with touch events for efficient speech recognition. In 2010 IEEE Spoken Language Technology Workshop. 73–78. https://doi.org/10.1109/SLT.2010.5700825Google ScholarCross Ref
- Khe Chai Sim. 2012. Speak-as-you-swipe (SAYS): A Multimodal Interface Combining Speech and Gesture Keyboard Synchronously for Continuous Mobile Text Entry. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (Santa Monica, California, USA) (ICMI ’12). ACM, New York, NY, USA, 555–560. https://doi.org/10.1145/2388676.2388793Google ScholarDigital Library
- Shyamli Sindhwani, Christof Lutteroth, and Gerald Weber. 2019. ReType: Quick Text Editing with Keyboard and Gaze. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). ACM, New York, NY, USA, Article 203, 13 pages. https://doi.org/10.1145/3290605.3300433Google ScholarDigital Library
- Bernhard Suhm, Brad Myers, and Alex Waibel. 2001. Multimodal error correction for speech user interfaces. ACM transactions on computer-human interaction (TOCHI) 8, 1(2001), 60–98.Google Scholar
- Keith Vertanen, Haythem Memmi, Justin Emge, Shyam Reyal, and Per Ola Kristensson. 2015. VelociTap: Investigating Fast Mobile Text Entry Using Sentence-Based Decoding of Touchscreen Keyboard Input. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). ACM, New York, NY, USA, 659–668. https://doi.org/10.1145/2702123.2702135Google ScholarDigital Library
- Daniel Vogel and Patrick Baudisch. 2007. Shift: A Technique for Operating Pen-Based Interfaces Using Touch. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). Association for Computing Machinery, New York, NY, USA, 657–666. https://doi.org/10.1145/1240624.1240727Google ScholarDigital Library
- Klaus Weidner. 2018. Hackers Keyboard. http://code.google.com/p/hackerskeyboard/ [Online; accessed 22-August-2019].Google Scholar
- Jackie (Junrui) Yang, Monica S. Lam, and James A. Landay. 2020. DoThisHere: Multimodal Interaction to Improve Cross-Application Tasks on Mobile Devices. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 35–44. https://doi.org/10.1145/3379337.3415841Google ScholarDigital Library
- Mingrui Ray Zhang, He Wen, and Jacob O. Wobbrock. 2019. Type, Then Correct: Intelligent Text Correction Techniques for Mobile Text Entry Using Neural Networks. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (New Orleans, LA, USA) (UIST ’19). Association for Computing Machinery, New York, NY, USA, 843–855. https://doi.org/10.1145/3332165.3347924Google ScholarDigital Library
- Mingrui Ray Zhang and O. Jacob Wobbrock. 2020. Gedit: Keyboard gestures for mobile text editing. In Proceedings of Graphics Interface (GI ’20)(Toronto, Ontario) (GI ’20). Canadian Information Processing Society, Toronto, Ontario, 97–104.Google Scholar
- Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones
Recommendations
Model-based and empirical evaluation of multimodal interactive error correction
CHI '99: Proceedings of the SIGCHI conference on Human Factors in Computing SystemsOur research addresses the problem of error correction in speech user interfaces. Previous work hypothesized that switching modality could speed up interactive correction of recognition errors (so-called multimodal error correction). We present a user ...
Type, Then Correct: Intelligent Text Correction Techniques for Mobile Text Entry Using Neural Networks
UIST '19: Proceedings of the 32nd Annual ACM Symposium on User Interface Software and TechnologyCurrent text correction processes on mobile touch devices are laborious: users either extensively use backspace, or navigate the cursor to the error position, make a correction, and navigate back, usually by employing multiple taps or drags over small ...
Arrow2edit: A Technique for Editing Text on Smartphones
Human-Computer InteractionAbstractWe present Arrow2edit, a technique for efficient text editing on smartphones, based on the use of arrow soft buttons. Arrow2edit, after the user touches the text to place the cursor, displays keys with directional arrows that allow the user to ...
Comments