Publication date: Available online 20 July 2017
Source:Computer Speech & Language
Author(s): Alexey Sholokhov, Md Sahidullah, Tomi Kinnunen
We propose a simple speech activity detector (SAD) based on recording-specific Gaussian mixture modeling (GMM) of speech and non-speech frames. We extend the conventional expectation-maximization (EM) algorithm for GMM training using semi-supervised learning. It provides a methodology to incorporate unlabeled data into the SAD training process, leading to more accurate statistical models by exploiting the structure of data distribution. It fits naturally to off-line applications that may require partial human assistance, or applications that involve processing large quantities of audio data, such as text-independent speaker verification, speaker diarization or audio surveillance. The proposed SAD does not require any off-line training data as supervised SADs do. Rather, it employs initial labels produced from a tiny fraction of a given audio recording with the help of another simpler SAD (or a human operator). The proposed SAD is analyzed for the different covariance types, the initialization methods for speech and non-speech class, the amount of labeled data required for initialization, and the speech features. In experiments with a stand-alone SAD system, we observe increased accuracy on the challenging dataset from the recent NIST OpenSAD evaluation. Our extensive automatic speaker verification (ASV) experiments, including text-independent experiments with NIST 2010 speaker recognition evaluation (SRE) data and text-dependent experiments with RSR2015 and RedDots corpora, show benefits of the new approach for the long speech segments containing non-stationary noise. For the shorter data conditions in the text-dependent experiments, simpler unsupervised SADs perform however better. Further, we study the impact of SAD misses and false alarms to ASV performance on the NIST 2010 SRE data. By deriving an empirical cost function with the two SAD errors, we have observed that ASV error rate reaches a minimum value around the same SAD operating point irrespective of SAD method and signal-to-noise ratio (SNR). The optimum ASV performance occurs approximately at an SAD operating region where falsely included non-speech is considered 4 to 5 times more costly than missed speech. Importantly, the proposed semi-supervised SAD is relatively less dependent on the SAD decision threshold compared to the other contrastive SAD methods.
from # & - All via ola Kala on Inoreader http://ift.tt/2vpS1RT
Αρχειοθήκη ιστολογίου
-
►
2023
(269)
- ► Φεβρουαρίου (133)
- ► Ιανουαρίου (136)
-
►
2022
(2046)
- ► Δεκεμβρίου (165)
- ► Σεπτεμβρίου (161)
- ► Φεβρουαρίου (165)
-
►
2021
(3028)
- ► Δεκεμβρίου (135)
- ► Σεπτεμβρίου (182)
- ► Φεβρουαρίου (324)
-
►
2020
(1051)
- ► Δεκεμβρίου (292)
- ► Σεπτεμβρίου (60)
- ► Φεβρουαρίου (28)
-
►
2019
(2277)
- ► Δεκεμβρίου (18)
- ► Σεπτεμβρίου (54)
- ► Φεβρουαρίου (89)
-
►
2018
(26280)
- ► Δεκεμβρίου (189)
- ► Φεβρουαρίου (6130)
- ► Ιανουαρίου (7050)
-
▼
2017
(33948)
- ► Δεκεμβρίου (6715)
- ► Σεπτεμβρίου (6470)
-
▼
Ιουλίου
(643)
-
▼
Ιουλ 20
(21)
- Type 1 diabetes mellitus caused by treatment with ...
- Behind the smile: qualitative study of caregivers ...
- Differences in gadolinium retention after repeated...
- Editorial Board and Contents
- Semi-Supervised Speech Activity Detection with an ...
- RankUp: Enhancing Graph-Based Keyphrase Extraction...
- Electrochemical Degradation of Nonylphenol Ethoxyl...
- As and Cd Sorption on Selected Si-Rich Substances
- Synthesis and Reactivity of Iron– and Cobalt–Dinit...
- Switchable Dielectric Constant in the Cyanometalat...
- Front Cover: Synthesis and Reactivity of Iron– and...
- Front Cover: Switchable Dielectric Constant in the...
- Concurrent Chondrodysplasia Punctata Type 2 (Conra...
- Methotrexate for Severe Childhood Atopic Dermatiti...
- Concurrent electrical cervicomedullary stimulation...
- Autoimmunity and its association with regulatory T...
- New Genes Causing Hereditary Parkinson’s Disease o...
- The Effects of Biogeography and Biotic Interaction...
- Highly Polymorphic Microsatellite Markers for the ...
- Environmental impact of coal mining and coal seam ...
- Journal de Pharmacie Clinique
-
▼
Ιουλ 20
(21)
-
►
2016
(4179)
- ► Σεπτεμβρίου (638)
- ► Φεβρουαρίου (526)
- ► Ιανουαρίου (517)
Εγγραφή σε:
Σχόλια ανάρτησης (Atom)
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου