A Framework for Music-Speech Segregation using Music Fingerprinting and Acoustic Echo Cancellation Principle

F. Hussain, H. A. Habib, M. J. Khan

Abstract


Background interference creates voice intelligibility issue for listener. This research work considers background music as interference for communication through smart phone in areas with loud background music. This paper proposes a novel framework for background music segregation from human speech using music fingerprinting and acoustic echo cancellation. Initially, background music is searched in the database by music fingerprinting. Identified background music is registered and segregated using acoustic echo cancellation. Proposed approach generates better quality music speech segregation than existing algorithms. The research work is novel and segregates background music completely in comparison to existing approaches where single instruments are segregated successfully.

Full Text:

PDF

References


Y. Fukayama, D. Tanaka and T. Kataoka, “Separation of individual

instrument sounds in monaural music signals by applying statistical

least-squares criterion”, International journal of Innovative

Computing, Information and Control (IJICIC), vol. 8, March 2275-

, 2012.

G. Hu and D.L. Wang, “Segregation of unvoiced speech from nonspeech interference,” J. Acoust. Soc. Am., vol. 124, pp. 1306–1319,

J.R. Hershey, S.J. Rennie, P.A. Olsen and T.T. Kristjansson, “Superhuman multi-talker speech recognition: a graphical model approach”,

Comput. Speech Lang., vol. 24, pp. 45–66, 2010.

A. Reddy and B. Raj, “Soft mask methods for single-channel speaker

separation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no.

, pp. 1766–1776, 2007.

P. Smaragdis, “Convolutive speech bases and their application to

supervised speech separation,” IEEE Trans. Audio, Speech, Lang.

Process., vol. 15, no. 1, pp. 1–12, 2007.

Hu, Ke, and D. Wang. "SVM-based separation of unvoiced-voiced

speech in cochannel conditions." Acoustics, Speech and Signal

Processing (ICASSP), 2012 IEEE International Conference on.

IEEE, Kyoto, Japan, pp. 4545-4548, 2012.

K. Hu and D.L. Wang, Unvoiced speech segregation from nonspeech interference via CASA and spectral subtraction”, IEEE Trans.

Audio, Speech, Lang. Process., vol. 19, pp. 1600– 1609, 2011.

G. Hu and D.L. Wang, “A tandem algorithm for pitch estimation and

voiced speech segregation,” IEEE Trans. Audio, Speech, Lang.

Process., vol. 18, pp. 2067–2079, 2010.

Z. Jin and D. L. Wang, “A supervised learning approach to monaural

segregation of reverberant speech,” IEEE Trans. Audio, Speech,

Lang. Process., vol. 17, pp. 625–638, 2009.

K. Han and D. L. Wang, “An SVM based classification approach to

speech separation”, ICASSP, 2011, pp. 4632–4635.

Y. Li and D.L. Wang, “Separation of singing voice from music

accompaniment for monaural recordings”, IEEE Trans. Audio,

Speech, Lang. Process., vol. 15, no. 4, pp. 1475–1487, May 2007.

A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, “Adaptation of

Bayesian models for single channel source separation and its

application to voice/music separation in popular songs”, IEEE Trans.

Audio, Speech, Lang. Process., vol. 15, no. 5, pp. 1564–1578, Jul.

B. Raj, P. Smaragdis, M. V. Shashanka, and R. Singh, “Separating a

foreground singer from background music”, Proc. Int. Symp.

Frontiers Res. Speech Music (FRSM), Mysore, India, 2007.

S. Vembu and S. Baumann, “Separation of vocals from polyphonic

audio recordings”, Proc. ISMIR, pp. 337–344, 2005.

T. Virtanen, A. Mesaros, and M. Ryynänen, “Combining pitch-based

inference and non-negative spectrogram factorization in separating

vocals from polyphonic music”, Proc. SAPA, Brisbane, Australia,

, pp. 17-22. 2008.

A. Bregman, “Auditory Scene Analysis”, Cambridge, MA: MIT

Press, 1990.

G.J. Brown and M. Cooke, “Computational auditory scene analysis”,

Comput. Speech Lang., vol. 8, pp. 297–336, 1994.

G. Hu and D.L. Wang, “Monaural speech segregation based on pitch

tracking and amplitude modulation,” IEEE Trans. Neural Network,

vol. 15, no. 5, pp. 1135–1150, Sept., 2004.

Z. Jin and D.L. Wang, “A supervised learning approach to monaural

segregation of reverberant speech,” IEEE Trans. Audio, Speech,

Lang. Process., vol. 17, no. 4, pp. 625–638, May 2009.

P.Li, Y. Guan, B.Xu, and W. Liu, “Monaural speech separation

based on computational auditory scene analysis and objective quality

assessment of speech,” IEEE Trans. Audio, Speech, Lang. Process.,

vol. 14, no. 6, pp. 2014–2023, Nov. 2006.

Loizou, Philipos C. “Speech enhancement: theory and practice”.

CRC press, 2013. ISBN 9781466504219

M.H. Radfar and R.M. Dansereau, “Single-channel speech separation

using soft masking filtering,” IEEE Trans. Audio, Speech, Lang.

Process., vol. 15, no. 8, pp. 2299–2310, Nov. 2007.

D. L.Wang and G. J. Brown, “Computational Auditory Scene

Analysis: Principles, Algorithms and Applications”, Eds. Hoboken,

NJ: Wiley-IEEE Press, 2006.

D.L. Wang, “On ideal binary mask as the computational goal of

auditory scene analysis,” Speech Separation by Humans and

Machines, P. Divenyi, Ed. Norwell, MA: Kluwer, 2005, pp. 181–

D.S. Brungart, P.S. Chang, B.D. Simpson and D.L. Wang, “Isolating

the energetic component of speech-on-speech masking with ideal

time–frequency segregation,” J. Acoust. Soc. Amer., vol. 120, pp.

–4018, 2006.

N. Li and P. C. Loizou, “Factors influencing intelligibility of ideal

binary- masked speech: Implications for noise reduction,” J. Acoust.

Soc. Amer., vol. 123, pp. 1673–1682, 2008.

D.L. Wang, U. Kjems, M. S. Pedersen, J. B. Boldt, and T. Lunner,

“Speech intelligibility in background noise with ideal binary time–

frequency masking,” J. Acoust. Soc. Amer., vol. 125, pp. 2336–2347,

K. Hu and D. L. Wang, “Incorporating spectral subtraction and noise

type for unvoiced speech segregation”, Proc. IEEE ICASSP, 2009,

pp. 4425–4428.

Hu, Ke, and D. Wang. "Incorporating spectral subtraction and noise

type for unvoiced speech segregation." Acoustics, Speech and Signal

Processing, (ICASSP), pp. 4425-4428, April 2009.

G. Hu, “Monaural speech organization and segregation”, Ph.D.

dissertation, Ph.D. dissertation, Biophysics Program, The Ohio State

University, 2006.

http://echoprint.me/

Daniel P.W. Ellis, B. Whitman, T. Jehan, and P. Lamere, “The Echo

Nest musical fingerprint”, Proceedings of the International

Symposium on Music Information Retrieval, Aug, 2010.

J. Haitsma and T. Kalker. A highly robust audio fingerprinting

system with an efficient search strategy. Journal of New Music

Research, 32, no. 2, pp. 211–221, 2003.

T. Jehan, “Creating music by listening”, PhD thesis, Massachusetts

Institute of Technology, 2005.

A. Wang, “An industrial strength audio search algorithm”

International Conference on Music Information Retrieval (ISMIR)

Baltimore, Oct. 26–30, 2003.

Ellis, Daniel PW, B. Whitman, and A. Porter, "Echoprint: An open

music identification service.” ISMIR 2011 Miami: 12th International

Society for Music Information Retrieval Conference, October 24-28.

International Society for Music Information Retrieval, 2011.


Refbacks

  • There are currently no refbacks.