View on GitHub

YFACC

Yorùbá Flickr Audio Caption Corpus

Download YFACC

The Yorùbá Flickr Audio Caption Corpus (YFACC) dataset extends the Flickr8k image-text dataset to Yorùbá with three modalities:

Yorùbá translations of 6k of the captions.
Corresponding spoken recordings of these translations, obtained from a single speaker.
Temporal alignments of 67 Yorùbá keywords for a subset of 500 of the captions.

The dataset is described in the following paper. Please cite the paper if you use the data:

K. Olaleye, D. Oneață, and H. Kamper, “A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding,” accepted to SLT, 2023. [arXiv]

YFACC (6.8 GB): yfacc_v6.tar.gz
MD5 checksum: 7e086f4424246e3dfc742abba488c429

© 2022 Stellenbosch University
This data is released under a Creative Commons Attribution-ShareAlike license (CC BY-SA 4.0).