Information retrieval from marine soundscape by using machine learning-based source separation
Tzu-Hao Lin 1, Tomonari Akamatsu 2, Yu Tsao 3, Katsunori Fujikura1
1 Department of Marine Biodiversity Research, Japan Agency for Marine-Earth Science and Technology, Japan
2 National Research Institute of Fisheries Science, Japan Fisheries Research and Education Agency, Japan
3 Research Center for Information Technology Innovation, Academia Sinica, Taiwan
In remote sensing of the marine ecosystem, visual information retrieval is limited by the low visibility in the ocean environment. Marine soundscape has been considered as an acoustic sensing platform of the marine ecosystem in recent years. By listening to environmental sounds, biological sounds, and human-made noises, it is possible to acoustically identify various geophysical events, soniferous marine animals, and anthropogenic activities. However, the sound detection and classification remain a challenging task due to the lack of underwater audio recognition database and the simultaneous interference of multiple sound sources. To facilitate the analysis of marine soundscape, we have employed information retrieval techniques based on non-negative matrix factorization (NMF) to separate different sound sources with unique spectral-temporal patterns in an unsupervised approach. NMF is a self-learning algorithm which decomposes an input matrix into a spectral feature matrix and a temporal encoding matrix. Therefore, we can stack two or more layers of NMF to learn the spectral-temporal modulation of k sound sources without any learning database . In this presentation, we will demonstrate the application of NMF in the separation of simultaneous sound sources appeared on a long-term spectrogram. In shallow water soundscape, the relative change of fish chorus can be effectively quantified even in periods with strong mooring noise . In deep-sea soundscape, cetacean vocalizations, an unknown biological chorus, environmental sounds, and systematic noises can be efficiently separated . In addition, we can use the features learned in procedures of blind source separation as the prior information for supervised source separation. The self-adaptation mechanism during iterative learning can help search the similar sound source from other acoustic dataset contains unknown noise types. Our results suggest that the NMF-based source separation can facilitate the analysis of the soundscape variability and the establishment of audio recognition database. Therefore, it will be feasible to investigate the acoustic interactions among geophysical events, soniferous marine animals, and anthropogenic activities from long-duration underwater recordings.
Improving acoustic monitoring of biodiversity using deep learning-based source separation algorithms
Mao-Ning Tuanmu1, Tzu-Hao Lin2, Joe Chun-Chia Huang1, Yu Tsao3, Chia-Yun Lee1
1Biodiversity Research Center, Academia Sinica, Taiwan
2Department of Marine Biodiversity Research, Japan Agency for Marine-Earth Science and Technology, Japan
3Research Center for Information Technology Innovation, Academia Sinica, Taiwan
Passive acoustic monitoring of the environment has been suggested as an effective tool for investigating the dynamics of biodiversity across spatial and temporal scales. Recent development in automatic recorders has allowed environmental acoustic data to be collected in an unattended way for a long duration. However, one of the major challenges for acoustic monitoring is to identify sounds of target taxa in recordings which usually contain undesired signals from non-target sources. In addition, high variation in the characteristics of target sounds, co-occurrence of sounds from multiple target taxa, and a lack of reference data make it even more difficult to separate acoustic signals from different sources. To overcome this issue, we developed an unsupervised source separation algorithm based on a multi-layer (deep) non-negative matrix factorization (NMF). Using reference echolocation calls of 13 bat species, we evaluated the performance of the multi-layer NMF in separating species-specific calls. Results showed that the multi-layer NMF, especially when being pre-trained with reference calls, outperformed the conventional supervised single-layer NMF. We also evaluated the performance of the multi-layer NMF in identifying different types of bat calls in recordings collected in the field. We found comparable performance in call types identification between the multi-layer NMF and human observers. These results suggest that the proposed multi-layer NMF approach can be used to effectively separate acoustic signals of different taxa from long-duration field recordings in an unsupervised manner. The approach can thus improve the applicability of passive acoustic monitoring as a tool to investigate the responses of biodiversity to the changing environment.