The Yorùbá Flickr Audio Caption Corpus (YFACC) dataset extends the Flickr8k image-text dataset to Yorùbá with three modalities:
- Yorùbá translations of 6k of the captions.
- Corresponding spoken recordings of these translations, obtained from a single speaker.
- Temporal alignments of 67 Yorùbá keywords for a subset of 500 of the captions.
The dataset is described in the following paper. Please cite the paper if you use the data:
- K. Olaleye, D. Oneață, and H. Kamper, “A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding,” accepted to SLT, 2023. [arXiv]
Download
YFACC (6.8 GB):
yfacc_v6.tar.gz
MD5 checksum: 7e086f4424246e3dfc742abba488c429
License
© 2022 Stellenbosch University
This data is released under a Creative Commons Attribution-ShareAlike
license (CC BY-SA 4.0).