NVIDIA releases 1 million hours of speech data set to help develop speech AI and also releases actual trained transcription AI model

NVIDIA has released Granary , a dataset containing approximately 1 million hours of audio. Granary includes 25 European languages, including Croatian and Estonian, and will contribute to the multilingualization of voice AI.
NVIDIA Releases Open Dataset, Models for Multilingual Speech AI | NVIDIA Blog
https://blogs.nvidia.com/blog/speech-ai-dataset-models/
While many voice AI models useful for transcription and machine translation have emerged, most of them are targeted at major languages such as English, and support for languages with fewer speakers tends to lag behind. This is due to the fact that there are fewer datasets for languages with fewer speakers, which are essential for AI training, making it a challenge to make voice AI multilingual.
Granary, released by NVIDIA, is a dataset containing speech in 25 European languages, including Croatian, Estonian, and Maltese. Granary contains a total of 1 million hours of speech data, of which 650,000 hours are optimized for speech recognition and 350,000 hours for speech translation. The dataset is available at the following link:
nvidia/Granary · Datasets at Hugging Face
https://huggingface.co/datasets/nvidia/Granary

Creating an audio dataset requires a lot of work: labeling recorded data with accurate transcriptions and other labels. NVIDIA is working to solve this problem by developing a pipeline that can label audio data without human intervention. Details of the pipeline developed by NVIDIA are available at the following link.
NeMo-speech-data-processor/dataset_configs/multilingual/granary at main · NVIDIA/NeMo-speech-data-processor · GitHub
https://github.com/NVIDIA/NeMo-speech-data-processor/tree/main/dataset_configs/multilingual/granary

NVIDIA also released the high-quality transcription model ' NVIDIA Canary-1b-v2 ' and the real-time transcription model ' NVIDIA Parakeet-tdt-0.6b-v3 ,' both trained using Granary. NVIDIA Canary-1b-v2 achieved the highest score in the multilingual performance category of the ' Open ASR Leaderboard ,' which measures the performance of transcription models.
nvidia/canary-1b-v2 · Hugging Face
https://huggingface.co/nvidia/canary-1b-v2

nvidia/parakeet-tdt-0.6b-v3 · Hugging Face
https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

Related Posts:
in Software, Posted by log1o_hf