In a real-world scenario, a soundfield is, almost universally, caused by multiple desired and undesired sound sources. In various practical applications such as teleconference, studio/newsroom recording, audio surveillance, active noise cancellation (ANC), dereverberation, and noise suppression, it is desired, most of the time necessary, to create an isolated sound zone which allows uninterrupted sound recording in complex acoustic scenarios. In many cases, we need to go even further to separate individual sources from a sound mixture.
This source separation technique is a prerequisite in numerous acoustic signal processing tasks like automatic speech recognition, smart home applications, in the music industry to perform mixing/demixing, telecommunication, auditory scene analysis and so on. Historically, the research in this field mainly focused on applying beamforming along with time/frequency domain filtering to achieve such separation. In our work, we tried to formulate the solution from a different perspective by utilizing the spatial basis functions which offer intuitive and convenient ways to analyze, predict, and modify the behavior of a spatial soundfield based on desired characteristics.
Over the last decade, the spherical harmonics or higher-order ambisonics started gaining a lot of attention in the audio industry as an attractive tool in various fields of spatial acoustics such as virtual/augmented reality, digital entertainment, ANC technology. We used the spherical harmonic decomposition and developed a modal coherence model to dissect a complex acoustic scene into its primary components. We also proposed a novel multi-source direction of arrival (DOA) estimation technique using a convolutional neural network algorithm which learns the modal coherence patterns of an incident sound field through measured spherical harmonic coefficients.
In this talk, I shall touch three spatial acoustic problems and offer solution to them: (1) acoustic separation of a spatial zone, (2) PSD estimation and source separation in a noisy and reverberant environments, and (3) predicting multi-source DOA from the modal coherence model of a mixed sound field using a machine learning technique.
Abdullah Fahim received his B.Sc.(Hons.) degree in electrical and electronic engineering from Bangladesh University of Engineering and Technology, Dhaka, Bangladesh, in 2007.
From 2007-2015, he was involved in different projects with Ericsson and Nokia SN. He is currently working towards his Ph.D. degree in spatial acoustic signal processing with the Research School of Electrical, Energy and Materials Engineering, ANU. During his PhD, he did a 6-month internship at Apple in California, USA in 2018 where he worked with the spatial audio team of interactive media group.
His research interests include spatial audio and multichannel processing techniques, especially soundfield capturing, analysis and separation.