Artículo

Automatic speech recognizers for Mexican Spanish and its open resources

Mena, Carlos Daniel Hernández; Ruiz, Ivan V. Meza; Camacho, José Abel Herrera

Instituto de Ciencias Aplicadas y Tecnología, UNAM, publicado en Journal of Applied Research and Technology, y cosechado de Revistas UNAM

Licencia de uso

Procedencia del contenido

Entidad o dependencia

Instituto de Ciencias Aplicadas y Tecnología, UNAM

Revista

Journal of Applied Research and Technology

Repositorio

Revistas UNAM

Contacto

Revistas UNAM. Dirección General de Publicaciones y Fomento Editorial, UNAM en revistas@unam.mx

Cita

Mena, Carlos Daniel Hernández, et al. (2017). Automatic speech recognizers for Mexican Spanish and its open resources. Journal of Applied Research and Technology; Vol. 15 Núm. 3. Recuperado de https://repositorio.unam.mx/contenidos/4110119

Descripción del recurso

Autor(es)

Mena, Carlos Daniel Hernández; Ruiz, Ivan V. Meza; Camacho, José Abel Herrera

Tipo

Artículo de Investigación

Área del conocimiento

Ingenierías

Título

Automatic speech recognizers for Mexican Spanish and its open resources

Fecha

2019-06-07

Resumen

Development of automatic speech recognition systems relies on the availability of distinct language resources such as speech recordings, pronunciation dictionaries, and language models. These resources are scarce for the Mexican Spanish dialect. In this work, we present a revision of the CIEMPIESS corpus that is a resource for spontaneous speech recognition in Mexican Spanish of Central Mexico. It consists of 17 h of segmented and transcribed recordings, a phonetic dictionary composed by 53,169 unique words, and a language model composed by 1,505,491 words extracted from 2489 university newsletters. We also evaluate the CIEMPIESS corpus using three well known state of the art speech recognition engines, having satisfactory results. These resources are open for research and development in the field. Additionally, we present the methodology and the tools used to facilitate the creation of these resources which can be easily adapted to other variants of Spanish, or even other languages.

Tema

Automatic speech recognition; Mexican Spanish; Language resources; Language model; Acoustic model

Idioma

eng

ISSN

ISSN electrónico: 2448-6736; ISSN: 1665-6423

Enlaces

Ficha original

Contenido completo

No entro en nada

No entro en nada 2

Automatic speech recognizers for Mexican Spanish and its open resources

Licencia de uso

Procedencia del contenido

Cita

Descripción del recurso

Enlaces