Professor, Korea University
Head of the Institute of
of the Graduate Division
in ECE at the University
of California San Diego.
in Electrical and Computer
Engineering at Cornell
Global Head of AI Samsung
Fellow of the IEEE and AAAI
of EECS at UC Berkeley
Research Scientist Director
at FAIR, Meta Inc.
Dr. Seong-Whan Lee is a Distinguished University Professor at Korea University, where he is the head of the Institute of Artificial Intelligence. He received the B.S. degree in computer science and statistics from Seoul National University, Seoul, Korea, in 1984, and the M.S. and Ph.D. degrees in computer science from Korea Advanced Institute of Science and Technology in 1986 and 1989, respectively. In March 1995, he joined the faculty of the Department of Computer Science and Engineering at Korea University, Seoul, Korea, and now he is the tenured full professor. A Fellow of the IAPR (1998), the Korean Academy of Science and Technology (2009), and the IEEE (2010), he has served several professional societies as chairman or governing board member. His research interests include pattern recognition, artificial intelligence, and neural engineering. He has more than 600 publications in international journals and conference proceedings, and authored 10 books.
Brain-To-Speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. It refers to a brain signal-mediated communication method that converts brain activities of silent or imagined speech into audible speech. Brain-To-Speech technology directly connects neural activity to the means of human linguistic communication which may greatly enhance the naturalness of communication using brain signals.
With the current discoveries on neural features of imagined speech and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown significant promise. This talk introduces the current Brain-To-Speech technology with the possibility of speech synthesis from non-invasive brain signals, which may ultimately facilitate silent communication via brain signals.
Bhaskar D. Rao is a pioneer in the theory and use of sparsity in signal processing applications. Since co-authoring the first paper on the seminal FOCUSS algorithm in 1992, he has been driving the field of sparsity forward, including co-organizing the first special session on sparsity at ICASSP 1998 entitled "SPEC-DSP: Signal Processing with Sparseness Constraint”.
He received his B.Tech. degree in Electronics and Electrical Communication Engineering from the Indian Institute of Technology, Kharagpur, India, in 1979 and his M.S. and Ph.D. degrees from the University of Southern California, Los Angeles in 1981 and 1983, respectively. He has been teaching and conducting research at the University of California in San Diego, La Jolla since 1983, where he is currently a Professor Emeritus and Distinguished Professor of the Graduate Division in the Electrical and Computer Engineering department. He has also been the holder of the Ericsson Endowed Chair in Wireless Access Networks and Distinguished Professor until 2023 and the Director of the Center for Wireless Communications (2008-2011).
Professor Rao’s research interests are in the areas of digital signal processing, estimation theory, and optimization theory, with applications to digital communications, speech signal processing, and biomedical signal processing. His work has received several paper awards, including the 2012 Signal Processing Society (SPS) best paper award for the paper “An Empirical Bayesian Strategy for Solving the Simultaneous Sparse Approximation Problem,” with David P. Wipf and the Stephen O. Rice Prize paper award in the field of communication systems for the paper “Network Duality for Multiuser MIMO Beamforming Networks and Applications,” with B. Song and R. L. Cruz.
Professor Rao was elected fellow of IEEE in 2000 for his contributions to the statistical analysis of subspace algorithms for harmonic retrieval and received the IEEE Signal Processing Society Technical Achievement Award in 2016. He has been a member of the Statistical Signal and Array Processing Technical Committee, the Signal Processing Theory and Methods Technical Committee, the Communications Technical Committee of the IEEE Signal Processing Society, SPS Fellow Evaluation Committee (2023-2024) and was the chair of the Machine Learning for Signal Processing Technical Committee (2019-2020).
In this talk, I will discuss the evolution of signal processing algorithms over the years and the connections between modern approaches and classical approaches. Modern algorithms have better performance, at the expense of additional complexity, and often hard to connect to classical techniques. But, in fact, these modern algorithms are connected to the classical algorithms and understanding these connections can help us gain perspective and insight into the newer algorithms. I will discuss insights in two specific cases, one in sparse signal recovery and the other in deep learning.
The first problem we will discuss is the source localization problem in array processing where the Minimum Power Distortionless Response (MPDR) algorithm, also known as the Minimum Variance Distortionless Response (MVDR) algorithm, is a widely used approach. Recently, sparse signal recovery (SSR) algorithms have been developed to solve the linear inverse problem associated with the source localization problem. The sparse Bayesian learning (SBL) algorithm for sparse signal recovery is one such algorithm and the EM-SBL algorithm will be used as the basis for this discussion. The iterative approach used in the EM algorithm will be provided an MPDR beamformer interpretation making the algorithm more transparent. Additionally, it will enable understanding the new attributes that emerge from the SBL algorithm. Similar insights can be also obtained for the other SSR algorithms.
A second problem is how to estimate one random vector given observations of another random vector. Linear estimation techniques are widely used in many signal processing applications and extended to nonlinear estimation with linear estimation on data augmented with handcrafted nonlinear features. With the advent of deep neural networks (DNNs), nonlinear estimation techniques have become attractive. We will delve into ResNEst, a variant of ResNet, to understand the nonlinear feature learning process, and the linear estimator being constructed based on these features. This insight will allow for replacement of linear estimators with DNNs with the assurance of better estimation performance.
Dr. Daniel D. Lee s the Tisch University Professor in Electrical and Computer Engineering at Cornell Tech and recently served as Global Head of AI for Samsung Research.
He received his B.A. summa cum laude in Physics from Harvard University and his Ph.D. in Condensed Matter Physics from the Massachusetts Institute of Technology. He was also a researcher at Bell Labs in the Theoretical Physics and Biological Computation departments.
He is a Fellow of the IEEE and AAAI and has received the NSF CAREER award and the Lindback award for distinguished teaching. He was also a fellow of the Hebrew University Institute of Advanced Studies in Jerusalem, an affiliate of the Korea Advanced Institute of Science and Technology, and organized the US-Japan National Academy of Engineering Frontiers of Engineering symposium and Neural Information Processing Systems (NeurIPS)
His group focuses on understanding general computational principles in biological systems and on applying that knowledge to build autonomous systems.
The advent of deep neural networks has brought significant advancements in the development and deployment of novel AI technologies. Recent large-scale neural network architectures have demonstrated remarkable performance in object classification, scene
understanding, language processing, and multimodal generative AI.
How can we understand how the representations of input signals are transformed within deep neural networks? I will explain how statistical insights can be gained by analyzing the high-dimensional geometrical structure of these representations as they are reformatted by neural network hierarchies of basic perceptron units.
Jitendra Malik is Arthur J. Chick Professor of EECS at UC Berkeley, and Research Scientist Director at FAIR, Meta Inc. His group has conducted research on many different topics in computer vision, computer graphics, machine learning and robotics resulting in concepts such as anisotropic diffusion, high dynamic range imaging, normalized cuts and R-CNN.
His publications have received numerous best paper awards, including five test of time awards - the Longuet-Higgins Prize for papers published at CVPR (twice) and the Helmholtz Prize for papers published at ICCV (three times). He received the 2016 ACM/AAAI Allen Newell Award, 2018 IJCAI Award for Research Excellence in AI, and the 2019 IEEE Computer Society’s Computer Pioneer Award for “leading role in developing Computer Vision into a thriving discipline through pioneering research, leadership, and mentorship”. He is a member of the National Academy of Sciences, the National Academy of Engineering and Fellow, American Academy of Arts and Sciences.
Humans are social animals. Perhaps this is why we so enjoy watching movies, TV shows and YouTube videos, all of which show people in action. A central problem for artificial intelligence therefore is to develop techniques for analyzing and understanding human behavior from images and video.
I will present some recent results from our research group towards this grand challenge. We have developed highly accurate techniques for reconstructing 3D meshes of human bodies from single images using transformer neural networks. Given video input, we link these reconstructions over time by 3D tracking, thus producing "Humans in 4D" (3D in space + 1D in time). As a fun application, we can use this capability to transfer the 3D motion of one person to another e.g. to generate a video of you performing Michael Jackson's moonwalk or Michelle Kwan's skating routine.
The ability to do 4D reconstruction of hands is a source of imitation learning for robotics and we show examples of reconstructing human-object interactions. In addition to 4D reconstruction, we are also now able to recognize actions by attaching semantic labels such as "standing", "running", or "jumping". However, long range video understanding, such as the ability to follow characters' activities and understand movie plots over periods of minutes and hours, is still quite a challenge, and even the largest vision-language models struggle on such tasks. There has been substantial progress, but much remains to be done.