META Workshop

Date/Time: 16:10-18:00, April 17, Wednesday
Location: Auditorium, 3F

Event Title: Meta industry workshop – open problems in Speech, Audio, Video and Signal Processing

Abstract: Meta has been working on multiple problems around the general area of signal processing and AI, and more specifically in Speech, Generative AI, Real-time Communications, Neural Interfaces and Video. This workshop will provide an opportunity for the broader research community to learn more about these problems, some of the proposed solutions and interact with Meta’s researchers on ways to work collaboratively.

Talks & Speakers:

Title: Audio/Visual processing at Meta – quality and efficiency challenges at scale
Speaker: Ioannis Katsavounidis, Research Scientist, Video Infrastructure, Meta
- Dr. Ioannis Katsavounidis is part of the Video Infrastructure team, leading technical efforts in improving video quality and quality of experience across all video products at Meta. Before joining Meta, he spent 3.5 years at Netflix, contributing to the development and popularization of VMAF, Netflix’s open-source video quality metric, as well as inventing the Dynamic Optimizer, a shot-based perceptual video quality optimization framework that brought significant bitrate savings across the whole video streaming spectrum. He was a professor for 8 years at the University of Thessaly’s Electrical and Computer Engineering Department in Greece, teaching video compression, signal processing and information theory. He was one of the cofounders of Cidana, a mobile multimedia software company in Shanghai, China. He was the director of software for advanced video codecs at InterVideo, the makers of the popular SW DVD player, WinDVD, in the early 2000’s and he has also worked for 4 years in high-energy experimental Physics in Italy. He is one of the co-chairs for the statistical analysis methods (SAM) and no-reference metrics (NORM) groups at the Video Quality Experts Group (VQEG). He is actively involved within the Alliance for Open Media (AOMedia) as co-chair of the software implementation working group (SWIG). He has over 150 publications, including 50 patents. His research interests lie in video coding, quality of experience, adaptive streaming, and energy efficient HW/SW multimedia processing.

Title: A peek into Meta’s audio processing stack for real-time communication
Speaker: Sriram Srinivasan, Director of Engineering, Remote Presence Audio, Meta
- Bio: Sriram Srinivasan leads the audio teams working on technologies for next-gen real-time audio communication across Meta’s family of apps such as Instagram, WhatsApp, Messenger and Facebook. He led the research, development and launch of Meta’s new echo removal solution, Beryl, and a new low bitrate audio codec MLow, that is now powering billions of calls across Meta’s family of apps. Prior to Meta, he was a Principal Group Engineering Manager at Microsoft leading the teams working on audio technologies for Microsoft Teams, Skype and Azure Communication Services, where he co-organized the first Deep Noise Suppression and Deep AEC challenges. He has 15+ years of real-time audio signal processing and ML experience in areas such as echo cancellation, noise suppression, low bitrate codecs (Satin, MLow), spatial audio and network resilience algorithms. He holds a PhD in audio signal processing, 25+ granted US patents and over 50 peer-reviewed publications.

Title: Augmented Hearing Research @ Meta: technology, challenges, and opportunities.
Speaker: Vladimir Tourbabin, Research Product Architect, Reality Labs Research Audio, Meta
- Bio: Vladimir Tourbabin holds a PhD in multichannel audio signal processing. Since 2017, he is with Reality Labs Research @ Meta (fka Facebook Reality Labs Research) working on research and advanced development of audio signal processing technologies for augmented and virtual reality applications. In particular, his research interests focus on microphone array processing for speech enhancement and spatial audio reproduction. Prior to that, Dr. Tourbabin served as an advanced development engineer at the General Motors’ Advanced Technical Center working on microphone array processing solutions for speech recognition. Dr. Tourbabin is actively contributing to the academic community, he holds 20 granted US patents and over 50 research publications.

Title: Building the keyboard for augmented reality using noninvasive neuromotor signals and speech technologies
Speaker: Michael Mandel, Research Scientist, AR Inputs & Interactions, Meta
- Bio: Michael I Mandel is a Research Scientist in Reality Labs at Meta Platforms, Inc building text interactions for neural interfaces using machine learning and signal processing. He earned his BSc in Computer Science from the Massachusetts Institute of Technology and his MS and PhD with distinction in Electrical Engineering from Columbia University as a Fu Foundation Presidential Scholar. He was an FQRNT Postdoctoral Research Fellow in the Machine Learning laboratory (LISA/MILA) at the Université de Montréal, an Algorithm Developer at Audience Inc, a company that has shipped over 500 million noise suppression chips for cell phones, and a Research Scientist in Computer Science and Engineering at the Ohio State University, and an Associate Professor of Computer and Information Science at Brooklyn College and the CUNY Graduate Center. His work has been supported by the National Science Foundation, including via a CAREER award, the Alfred P. Sloan Foundation, and Google, Inc.

Title: Speech at Meta: Powering Wearable Devices, Content Understanding and Generative AI
Speakers: Qing He, Research Science Manager, AI Speech, Meta & Christian Fuegen, Research Science Manager, AI Speech, Meta
- Bio: Qing He (Research Science Manager)
- Qing He is a Senior Research Scientist Manager at Meta. She joined Meta (formerly Facebook) as a research scientist in 2017. At Meta, Qing’s primary focus has been on speech synthesis and re-synthesis technologies encompassing areas including text-to-speech, voice cloning, speech enhancement, voice conversion, speech editing, speech processing. These technologies have been made powering in a wide range of Meta products including ARVR and apps such as Instagram and Facebook. Prior to joining Meta, Qing received her doctorate degree in EECS at MIT in 2016 focusing on low-power speech technologies.

Title: Speech at Meta: Powering Wearable Devices, Content Understanding and Generative AI
Speakers: Qing He, Research Science Manager, AI Speech, Meta & Christian Fuegen, Research Science Manager, AI Speech, Meta
- Bio: Qing He
- Christian Fuegen is a Senior Research Scientist Manager at Meta. He joined Meta (formerly Facebook) as a research scientist in 2013 due to the acquisition of Mobile Technologies. At Mobile Technologies, he was one of the core developers of “Jibbigo”, an on-device speech-to-speech translator for mobile devices, where he worked from 2007 until 2013, first as a research scientist and later as director of research. Christian received a doctorate for his work on simultaneous translation systems of lectures and speeches from University of Karlsruhe (TH) in 2008 in which he developed core components for a first-ever simultaneous lecture translation system, including innovations in speech recognition, segmentation, adaptation and work on real-time and latency requirements for simultaneous speech translation systems. At Meta, his team particularly focuses on the understanding of complex (egocentric) acoustic scenes across multiple speakers and languages using audio and other modalities with the goal of advancing voice interfaces, including speech technologies for RayBan | Meta, Oculus VR headsets, Augmented Reality, the Metaverse, and video understanding, including transcription, captioning, and content understanding.