Separating multiple sources of audio is difficult task. Previous works mostly made mask for each source in time-fequency domain.
This paper formulated source separation task as getting mixture meight vector of multiple sources in wave form.
Time-domain Audio Separation Network(TasNet) tries to find which is relative contribution to each w while B is N basis signals of shape N x L.
Encoder find w for B by appling 1-D gated convolution layer.
Separation network uses LSTM and FC for masks() generation. With w and m found above, d can be found with decoder. The scale-invariant source-to-noise ratio(SI-SNR) is used for loss.
TasNet not only showed comparable performance in WSJ0-2mix dataset, but also showen to find its own basis.
Luo, Yi, and Nima Mesgarani. “Tasnet: time-domain audio separation network for real-time, single-channel speech separation.” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018.
Subscribe via RSS