In this paper, we present a low-latency scheme for real-time blind source separation (BSS) based on online auxiliary-function-based independent vector analysis (AuxIVA). In many real-time audio ap- plications, especially hearing aids, low latency is highly desirable. Conventional frequency-domain BSS methods suffer from a delay caused by frame analysis. To reduce the delay, we implement sep- aration filters as multiple FIR filters in the time domain, which are converted from demixing matrices estimated by online AuxIVA in the frequency domain.

We propose a new sparse coding technique based on the power mean of phase-invariant cosine distances. Our approach is a generalization of sparse ﬁltering and K-hyperlines clustering. It offers a better sparsity enforcer than the L1/L2 norm ratio that is typically used in sparse ﬁltering. At the same time, the proposed approach scales better than the clustering counter parts for high-dimensional input. Our algorithm fully exploits the prior information obtained by preprocessing the observed data with whitening via an efﬁcient row-wise decoupling scheme.

Phase-aware signal processing has received increasing interest
in many speech applications. The success of phase-aware
processing depends strongly on the robustness of the clean
spectral phase estimates to be obtained from a noisy observation.
In this paper, we propose a novel harmonic phase estimator
relying on the phase invariance property exploiting
relations between harmonics using the phase structure. We
present speech quality results achieved in speech enhancement
to justify the effectiveness of the proposed phase estimator

This paper deals with the separation of music into individual instrument tracks which is known to be a challenging problem. We describe two different deep neural network architectures for this task, a feed-forward and a recurrent one, and show that each of them yields themselves state-of-the art results on the SiSEC DSD100 dataset. For the recurrent network, we use data augmentation during training and show that even simple separation networks are prone to overfitting if no data augmentation is used.

Sound source separation at low-latency requires that each in- coming frame of audio data be processed at very low de- lay, and outputted as soon as possible. For practical pur- poses involving human listeners, a 20 ms algorithmic delay is the uppermost limit which is comfortable to the listener. In this paper, we propose a low-latency (algorithmic delay ≤ 20 ms) deep neural network (DNN) based source sepa- ration method.