This week, the twenty third Annual Convention of the Worldwide Speech Communication Affiliation (INTERSPEECH 2022) is being held in Incheon, South Korea, representing one of many world’s most intensive conferences on analysis and know-how of spoken language understanding and processing. Over 2,000 specialists in speech-related analysis fields collect to participate in oral shows and poster periods and to collaborate with streamed occasions throughout the globe.
We’re excited to be a Diamond Sponsor of INTERSPEECH 2022, the place we will probably be showcasing practically 50 analysis publications and supporting a variety of workshops, particular periods and tutorials. We welcome in-person attendees to drop by the Google sales space to satisfy our researchers and take part in Q&As and demonstrations of a few of our newest speech applied sciences, which assist to enhance accessibility and supply comfort in communication for billions of customers. As well as, on-line attendees are inspired to go to our digital sales space in GatherTown the place you will get up-to-date info on analysis and alternatives at Google. You too can study extra in regards to the Google analysis being offered at INTERSPEECH 2022 under (Google affiliations in daring).
Organizing Committee
Business Liaisons embody: Bhuvana Ramabahdran
Space Chairs embody: John Hershey, Heiga Zen, Shrikanth Narayanan, Bastiaan Kleijn
ISCA Fellows
Embrace: Tara Sainath, Heiga Zen
Publications
Manufacturing Federated Key phrase Recognizing by way of Distillation, Filtering, and Joint Federated-Centralized Coaching
Andrew Exhausting, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun Jin Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio Lopez Moreno, Rajiv Mathews, Françoise Beaufays
Leveraging Unsupervised and Weakly-Supervised Knowledge to Enhance Direct Speech-to-Speech Translation
Ye Jia, Yifan Ding, Ankur Bapna, Colin Cherry, Yu Zhang, Alexis Conneau, Nobu Morioka
Sentence-Choose: Massive-Scale Language Mannequin Knowledge Choice for Uncommon-Phrase Speech Recognition
W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor Strohman, Shankar Kumar
UserLibri: A Dataset for ASR Personalization Utilizing Solely Textual content
Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey
SNRi Goal Coaching for Joint Speech Enhancement and Recognition
Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani
Flip-Taking Prediction for Pure Conversational Speech
Shuo-Yiin Chang, Bo Li, Tara Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He
Streaming Supposed Question Detection Utilizing E2E Modeling for Continued Dialog
Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Tara Sainath, Bo Li, Qiao Liang, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman
Bettering Distortion Robustness of Self-Supervised Speech Processing Duties with Area Adaptation
Kuan Po Huang, Yu-Kuan Fu, Yu Zhang, Hung-yi Lee
XLS-R: Self-Supervised Cross-Lingual Speech Illustration Studying at Scale
Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
Extracting Focused Coaching Knowledge from ASR Fashions, and How you can Mitigate It
Ehsan Amid, Om Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays
Detecting Unintended Memorization in Language-Mannequin-Fused ASR
W. Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews
AVATAR: Unconstrained Audiovisual Speech Recognition
Valentin Gabeur, Paul Hongsuck Search engine optimization, Arsha Nagrani, Chen Solar, Karteek Alahari, Cordelia Schmid
Finish-to-Finish Multi-talker Audio-Visible ASR Utilizing an Lively Speaker Consideration Module
Richard Rose, Olivier Siohan
Transformer-Based mostly Video Entrance-Ends for Audio-Visible Speech Recognition for Single and Multi-person Video
Dmitriy Serdyuk, Otavio Braga, Olivier Siohan
Unsupervised Knowledge Choice by way of Discrete Speech Illustration for ASR
Zhiyun Lu, Yongqiang Wang, Yu Zhang, Wei Han, Zhehuai Chen, Parisa Haghani
Non-parallel Voice Conversion for ASR Augmentation
Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Jesse Emond, Yinghui Huang, Pedro J. Moreno
Extremely-Low-Bitrate Speech Coding with Pre-trained Transformers
Ali Siahkoohi, Michael Chinen, Tom Denton, W. Bastiaan Kleijn, Jan Skoglund
Streaming Finish-to-Finish Multilingual Speech Recognition with Joint Language Identification
Chao Zhang, Bo Li, Tara Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-Yiin Chang, Parisa Haghani
Bettering Deliberation by Textual content-Solely and Semi-supervised Coaching
Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang
E2E Segmenter: Joint Segmenting and Decoding for Lengthy-Type ASR
W. Ronny Huang, Shuo-yiin Chang, David Rybach, Rohit Prabhavalkar, Tara N. Sainath, Cyril Allauzen, Cal Peyser, Zhiyun Lu
CycleGAN-Based mostly Unpaired Speech Dereverberation
Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson
TRILLsson: Distilled Common Paralinguistic Speech Representations (see weblog submit)
Joel Shor, Subhashini Venugopalan
Studying Neural Audio Options With out Supervision
Sarthak Yadav, Neil Zeghidour
SpeechPainter: Textual content-Conditioned Speech Inpainting
Zalan Borsos, Matthew Sharifi, Marco Tagliasacchi
SpecGrad: Diffusion Probabilistic Mannequin-Based mostly Neural Vocoder with Adaptive Noise Spectral Shaping
Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani
Distance-Based mostly Sound Separation
Katharine Patterson, Kevin Wilson, Scott Knowledge, John R. Hershey
Evaluation of Self-Consideration Head Range for Conformer-Based mostly Computerized Speech Recognition
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno
Bettering Uncommon Phrase Recognition with LM-Conscious MWER Coaching
Wang Weiran, Tongzhou Chen, Tara Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach
MAESTRO: Matched Speech Textual content Representations Via Modality Matching
Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Ankur Bapna, Heiga Zen
Pseudo Label is Higher Than Human Label
Dongseong Hwang, Khe Chai Sim, Zhouyuan Huo, Trevor Strohman
On the Optimum Interpolation Weights for Hybrid Autoregressive Transducer Mannequin
Ehsan Variani, Michael Riley, David Rybach, Cyril Allauzen, Tongzhou Chen, Bhuvana Ramabhadran
Streaming Align-Refine for Non-autoregressive Deliberation
Wang Weiran, Ke Hu, Tara Sainath
Federated Pruning: Bettering Neural Community Effectivity with Federated Studying
Rongmei Lin*, Yonghui Xiao, Tien-Ju Yang, Ding Zhao, Li Xiong, Giovanni Motta, Françoise Beaufays
A Unified Cascaded Encoder ASR Mannequin for Dynamic Mannequin Sizes
Shaojin Ding, Weiran Wang, Ding Zhao, Tara N Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman
4-Bit Conformer with Native Quantization Conscious Coaching for Speech Recognition
Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov
Visually-Conscious Acoustic Occasion Detection Utilizing Heterogeneous Graphs
Amir Shirian, Krishna Somandepalli, Victor Sanchez, Tanaya Guha
A Conformer-Based mostly Waveform-Area Neural Acoustic Echo Canceller Optimized for ASR Accuracy
Sankaran Panchapagesan, Arun Narayanan, Turaj Zakizadeh Shabestary, Shuai Shao, Nathan Howard, Alex Park, James Walker, Alexander Gruenstein
Decreasing Area Mismatch in Self-Supervised Speech Pre-training
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang, Nicolás Serrano
On-the-Fly ASR Corrections with Audio Exemplars
Golan Pundak, Tsendsuren Munkhdalai, Khe Chai Sim
A Language Agnostic Multilingual Streaming On-System ASR System
Bo Li, Tara Sainath, Ruoming Pang*, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani
XTREME-S: Evaluating Cross-Lingual Speech Representations
Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson
In the direction of Disentangled Speech Representations
Cal Peyser, Ronny Huang, Andrew Rosenberg, Tara Sainath, Michael Picheny, Kyunghyun Cho
Private VAD 2.0: Optimizing Private Voice Exercise Detection for On-System Speech Recognition
Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O’Malley, Ian McGraw
A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation
Tom O’Malley, Arun Narayanan, Quan Wang
Coaching Textual content-To-Speech Programs From Artificial Knowledge: A Sensible Method For Accent Switch Duties
Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alex Petelin, Jonathan Shen*, Vincent Wan, Yu Zhang, Yonghui Wu, Robert Clark
A Scalable Mannequin Specialization Framework for Coaching and Inference Utilizing Submodels and Its Utility to Speech Mannequin Personalization
Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew Rosenberg, Pedro Moreno
Textual content-Pushed Separation of Arbitrary Sounds
Kevin Kilgour, Beat Gfeller, Qingqing Huang, Aren Jansen, Scott Knowledge, Marco Tagliasacchi
Workshops, Tutorials & Particular Periods
The VoxCeleb Speaker Recognition Problem 2022 (VoxSRC-22)
Organizers embody: Arsha Nagrani
Self-Supervised Illustration Studying for Speech Processing
Organizers embody: Tara Sainath
Studying from Weak Labels
Organizers embody: Ankit Shah
RNN Transducers for Named Entity Recognition with Constraints on Alignment for Understanding Medical Conversations
Authors: Hagen Soltau, Izhak Shafran, Mingqiu Wang, Laurent El Shafey
Listening with Googlears: Low-Latency Neural Multiframe Beamforming and Equalization for Listening to Aids
Authors: Samuel Yang, Scott Knowledge, Chet Gnegy, Richard F. Lyon, Sagar Savla
Utilizing Rater and System Metadata to Clarify Variance within the VoiceMOS Problem 2022 Dataset
Authors: Michael Chinen, Jan Skoglund, Chandan Okay. A. Reddy, Alessandro Ragano, Andrew Hines
Incremental Layer-Smart Self-Supervised Studying for Environment friendly Unsupervised Speech Area Adaptation On System
Authors: Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays
Reliable Speech Processing
Organizers embody: Shrikanth Narayanan
*Work completed whereas at Google. ↩