Publications
arXiv preprints
- Unsupervised word discovery: Boundary detection with clustering vs. dynamic programming
S. Malan, B. van Niekerk, and H. Kamper. arXiv preprint arXiv:2409.14486, 2024. [code1, code2] - Improved visually prompted keyword localisation in real low-resource settings
L. Nortje, D. Oneață, and H. Kamper. arXiv preprint arXiv:2409.06013, 2024. [code]
Journal articles
- Visually grounded speech models have a mutual exclusivity bias
L. Nortje, D. Oneață, Y. Matusevych, and H. Kamper. Transactions of the Association for Computational Linguistics, vol. 12, pp. 755-770, 2024. [arXiv, code] - Visually grounded few-shot word learning in low-resource settings
L. Nortje, D. Oneață, and H. Kamper. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 32, pp. 2544-2554, 2024. [arXiv, code] - Disentanglement in a GAN for unconditional speech synthesis
M. Baas and H. Kamper. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 32, pp. 1324-1335, 2024. [arXiv, code] - Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings
C. Jacobs and H. Kamper. IEEE Signal Processing Letters, vol. 31, pp. 311-315, 2024. [arXiv] - Rhythm modeling for voice conversion
B. van Niekerk, M-A. Carbonneau, and H. Kamper. IEEE Signal Processing Letters, vol. 30, pp. 1297-1301, 2023. [arXiv, code, samples] - Infant phonetic learning as perceptual space learning: A crosslinguistic evaluation of computational models
Y. Matusevych, H. Kamper, T. Schatz, N. H. Feldman, and S. Goldwater. Cognitive Science, vol. 47, 2023. [arXiv] - Word segmentation on discovered phone units with dynamic programming and self-supervised scoring
H. Kamper. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 31, pp. 684-694, 2023. [arXiv, code] - Semi-supervised machine learning for livestock threat classification using GPS data
U. J. de Swardt and H. Kamper. IEEE Access, vol. 11, pp. 27749-27758, 2023. - Voice conversion for stuttered speech, instruments, unseen languages and textually described voices
M. Baas and H. Kamper. Communications in Computer and Information Science, vol. 1976, pp. 136-150, 2023. [arXiv, code] - Keyword localisation in untranscribed speech using visually grounded speech models
K. Olaleye, D. Oneață, and H. Kamper. IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1454-1466, 2022. [arXiv, code] - Feature learning for efficient ASR-free keyword spotting in low-resource languages
E. van der Westhuizen, H. Kamper, R. Menon, J. Quinn, and T. R. Niesler. Computer Speech and Language, vol. 71, 2022. [arXiv] - TransFusion: Transcribing speech with multinomial diffusion
M. Baas, K. Eloff, and H. Kamper. Communications in Computer and Information Science, vol. 1734, pp. 231-245, 2022. [arXiv, preprint, code] - Improved acoustic word embeddings for zero-resource languages using multilingual transfer
H. Kamper, Y. Matusevych, and S. Goldwater. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 29, pp. 1107-1118, 2021. [arXiv, code] - Multilingual and unsupervised subword modeling for zero-resource languages
E. Hermann, H. Kamper, and S. Goldwater. Computer Speech and Language, vol. 65, 2021. [arXiv, preprint] - BINet: A binary inpainting network for deep patch-based image compression
A. Nortje, W. Brink, H. A. Engelbrecht, and H. Kamper. Signal Processing: Image Communication, vol. 92, 2021. [arXiv] - StarGAN-ZSVC: Towards zero-shot voice conversion in low-resource contexts
M. Baas and H. Kamper. Communications in Computer and Information Science, vol. 1342, pp. 69-84, 2020. [arXiv, preprint, code] - Unsupervised feature learning for speech using correspondence and Siamese networks
P-J. Last, H. A. Engelbrecht, and H. Kamper. IEEE Signal Processing Letters, vol. 27, pp. 421-425, 2020. [arXiv, preprint] - On the expected behaviour of noise regularised deep neural networks as Gaussian processes
A. Pretorius, H. Kamper, and S. Kroon. Pattern Recognition Letters, vol. 138, pp. 75-81, 2020. [arXiv, preprint] - If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks
A. Pretorius, E. van Biljon, B. van Niekerk, R. Eloff, M. Reynard, S. James, B. Rosman, H. Kamper, and S. Kroon. Pattern Recognition Letters, vol. 138, pp. 95-105, 2020. [arXiv, preprint] - Semantic speech retrieval with a visually grounded model of untranscribed speech
H. Kamper, G. Shakhnarovich, and K. Livescu. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 27, no. 1, pp. 89-98, 2019. [arXiv, preprint, code, data] - Teaching for the future
W. Brink, H. Kamper, S. Kroon, U. Paquet, and H. Touchette. Synapse, vol. 3, pp. 41-42, 2019. - A segmental framework for fully-unsupervised large-vocabulary speech recognition
H. Kamper, A. Jansen, and S. Goldwater. Computer Speech and Language, vol. 46, pp. 154-174, 2017. [ISCA best paper published in CSL 2016–2020] [arXiv, preprint, code] - Unsupervised word segmentation and lexicon discovery using acoustic word embeddings
H. Kamper, A. Jansen, and S. Goldwater. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 24, no. 4, pp. 669-679, 2016. [arXiv, preprint] - Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system
H. Kamper, F. de Wet, T. Hain, and T. R. Niesler. Computer Speech and Language, vol. 28, no. 6, pp. 1255-1268, 2014. [preprint] - The impact of accent identification errors on speech recognition of South African English
H. Kamper and T. R. Niesler. South African Journal of Science, vol. 110, no. 1, 2014. [preprint] - Multi-accent acoustic modelling of South African English
H. Kamper, F. J. Muamba Mukanya, and T. R. Niesler. Speech Communication, vol. 54, no. 6, pp. 801-813, 2012. [preprint]
Conference papers
2024
- Spoken-term discovery using discrete speech units
B. van Niekerk, J. Zaïdi, M-A. Carbonneau, and H. Kamper. In Proceedings of Interspeech, 2024. [arXiv, code] - Translating speech with just images
D. Oneață and H. Kamper. In Proceedings of Interspeech, 2024. [arXiv, code] - Revisiting speech segmentation and lexicon learning with better features
H. Kamper and B. van Niekerk. In arXiv, 2024. [arXiv]
2023
- Voice conversion with just nearest neighbors
M. Baas, B. van Niekerk, and H. Kamper. In Proceedings of Interspeech, 2023. [arXiv, code, samples] - Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili
C. Jacobs, N. C. Rakotonirina, E. A. Chimoto, B. A. Bassett, and H. Kamper. In Proceedings of Interspeech, 2023. [arXiv] - Visually grounded few-shot word acquisition with fewer shots
L. Nortje, B. van Niekerk, and H. Kamper. In Proceedings of Interspeech, 2023. [arXiv] - Mitigating catastrophic forgetting for few-shot spoken word classification through meta-learning
R. van der Merwe and H. Kamper. In Proceedings of Interspeech, 2023. [arXiv] - YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding
K. Olaleye, D. Oneață, and H. Kamper. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT), 2023. [arXiv, data] - Towards visually prompted keyword localisation for zero-resource spoken languages
L. Nortje and H. Kamper. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT), 2023. [arXiv] - GAN you hear me? Reclaiming unconditional speech synthesis from diffusion models
M. Baas and H. Kamper. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT), 2023. [arXiv, code] - Using machine learning to understand assessment practices of capstone projects in engineering
H. Kamper, C. Niehaus, and K. Wolff. In Proceedings of the World Engineering Education Forum (WEEF), 2023.
2022
- A temporal extension of latent Dirichlet allocation for unsupervised acoustic unit discovery
W. van der Merwe, H. Kamper, and J. A. du Preez. In Proceedings of Interspeech, 2022. [arXiv] - Voice conversion can improve ASR in very low-resource settings
M. Baas and H. Kamper. In Proceedings of Interspeech, 2022. [arXiv] - A comparison of discrete and soft speech units for improved voice conversion
B. van Niekerk, M-A. Carbonneau, J. Zaïdi, M. Baas, H. Seuté, and H. Kamper. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022. [arXiv] - How machine learning can aid South African farmers’ security: Unsupervised livestock trajectory embeddings
U. J. de Swardt and H. Kamper. In Proceedings of the Southern African Conference on AI Research (SACAIR), 2022.
2021
- Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks
H. Kamper and B. van Niekerk. In Proceedings of Interspeech, 2021. [arXiv, code, video] - Analyzing speaker information in self-supervised models to improve zero-resource speech processing
B. van Niekerk, L. Nortje, M. Baas, and H. Kamper. In Proceedings of Interspeech, 2021. [arXiv] - Attention-based keyword localisation in speech using visual grounding
K. Olaleye and H. Kamper. In Proceedings of Interspeech, 2021. [arXiv] - Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language
C. Jacobs and H. Kamper. In Proceedings of Interspeech, 2021. [best student paper nominee] [arXiv, code] - Direct multimodal few-shot learning of speech and images
L. Nortje and H. Kamper. In Proceedings of Interspeech, 2021. [arXiv, code] - A phonetic model of non-native spoken word processing
Y. Matusevych, H. Kamper, T. Schatz, N. H. Feldman, and S. Goldwater. In Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), 2021. [honourable mention] [arXiv] - Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation
C. Jacobs, Y. Matusevych, and H. Kamper. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT), 2021. [arXiv, code] - A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings
L. van Staden and H. Kamper. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT), 2021. [arXiv] - Towards learning to speak and hear through multi-agent communication over a continuous acoustic channel
K. Eloff, O. Räsänen, H. A. Engelbrecht, A. Pretorius, and H. Kamper. In arXiv, 2021. [arXiv]
2020
- Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge
B. van Niekerk, L. Nortje, and H. Kamper. In Proceedings of Interspeech, 2020. [co-winners of Interspeech challenge] [arXiv, slides] - Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images
L. Nortje and H. Kamper. In Proceedings of Interspeech, 2020. [arXiv, slides] - Evaluating computational models of infant phonetic learning across languages
Y. Matusevych, T. Schatz, H. Kamper, N. H. Feldman, and S. Goldwater. In Proceedings of CogSci, 2020. [arXiv, video] - Multilingual acoustic word embedding models for processing zero-resource languages
H. Kamper, Y. Matusevych, and S. Goldwater. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020. [arXiv, slides, code, video] - Cross-lingual topic prediction for speech using translations
S. Bansal, H. Kamper, A. Lopez, and S. Goldwater. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020. [arXiv, slides] - Training neural networks for plant estimation, control and disturbance rejection
H. Kotzé, H. Kamper, and H. W. Jordaan. In Proceedings of the International Federation of Automatic Control (IFAC), 2020. - Participatory research for low-resourced machine translation: A case study in African languages
W. Nekoto and many others. In Findings of the Association for Computational Linguistics: EMNLP (Findings of EMNLP), 2020. [arXiv] - Towards localisation of keywords in speech using weak supervision
K. Olaleye, B. van Niekerk, H. Kamper. In NeurIPS Workshop on Self-Supervised Learning for Speech and Audio Processing (NeurIPS-SAS), 2020. [arXiv, slides] - A correspondence variational autoencoder for unsupervised acoustic word embeddings
P. Peng, H. Kamper, K. Livescu. In NeurIPS Workshop on Self-Supervised Learning for Speech and Audio Processing (NeurIPS-SAS), 2020. [arXiv, slides] - Analyzing autoencoder-based acoustic word embeddings
Y. Matusevych, H. Kamper, and S. Goldwater. In ICLR Workshop on Bridging AI and Cognitive Science (BAICS), 2020. [arXiv, slides] - Masakhane – Machine translation for Africa
I. Orife and many others. In ICLR AfricaNLP Workshop, 2020. [arXiv, video] - Improving unsupervised acoustic word embeddings using speaker and gender information
L. van Staden and H. Kamper. In Proceedings of the Annual Symposium of the Pattern Recognition of South Africa (PRASA), 2020. [slides] - Towards improving human arithmetic learning using machine learning
T. Hall and H. Kamper. In Proceedings of the Annual Symposium of the Pattern Recognition of South Africa (PRASA), 2020. [poster] - Combining primitive DQNs for improved reinforcement learning in Minecraft
M. Reynard, H. Kamper, B. Rosman, and H. A. Engelbrecht. In Proceedings of the Annual Symposium of the Pattern Recognition of South Africa (PRASA), 2020. [poster]
2019
- Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks
R. Eloff, A. Nortje, B. van Niekerk, A. Govender, L. Nortje, A. Pretorius, E. van Biljon, E. van der Westhuizen, L. van Staden, and H. Kamper. In Proceedings of Interspeech, 2019. [arXiv, slides] - On the contributions of visual and textual supervision in low-resource semantic speech retrieval
A. Pasad, B. Shi, H. Kamper, and K. Livescu. In Proceedings of Interspeech, 2019. [arXiv, poster] - Feature exploration for almost zero-resource ASR-free keyword spotting using a multilingual bottleneck extractor and correspondence autoencoders
R. Menon, H. Kamper, E. van der Westhuizen, J. Quinn, and T. R. Niesler. In Proceedings of Interspeech, 2019. [arXiv, poster] - Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models
H. Kamper. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. [arXiv, poster, code] - Semantic query-by-example speech search using visual grounding
H. Kamper, A. Anastassiou, and K. Livescu. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. [arXiv, poster, code] - Multimodal one-shot learning of speech and images
R. Eloff, H. A. Engelbrecht, and H. Kamper. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. [arXiv, poster, code] - Pre-training on high-resource speech recognition improves low-resource speech-to-text translation
S. Bansal, H. Kamper, K. Livescu, A. Lopez, and S. Goldwater. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019. [arXiv, slides, code]
2018
- Critical initialisation for deep signal propagation in noisy rectifier neural networks
A. Pretorius, E. Van Biljon, S. Kroon, and H. Kamper. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2018. [arXiv, poster, code, video] - Visually grounded cross-lingual keyword spotting in speech
H. Kamper and M. Roth. In Proceedings of the Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), 2018. [arXiv, slides] - ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource languages
R. Menon, H. Kamper, E. Yilmaz, J. Quinn, and T. R. Niesler. In Proceedings of the Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), 2018. [arXiv, slides] - Fast ASR-free and almost zero-resource keyword spotting using DTW and CNNs for humanitarian monitoring
R. Menon, H. Kamper, J. Quinn, and T. R. Niesler. In Proceedings of Interspeech, 2018. [arXiv, poster] - Low-resource speech-to-text translation
S. Bansal, H. Kamper, K. Livescu, A. Lopez, and S. Goldwater. In Proceedings of Interspeech, 2018. [arXiv, poster] - Learning dynamics of linear denoising autoencoders
A. Pretorius, S. Kroon, and H. Kamper. In Proceedings of the International Conference on Machine Learning (ICML), 2018. [arXiv, slides, poster, code] - Phoneme based embedded segmental K-means for unsupervised term discovery
S. Bhati, H. Kamper, and K. S. R. Murty. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018. [poster]
2017
- An embedded segmental K-means model for unsupervised segmentation and clustering of speech
H. Kamper, K. Livescu, and S. Goldwater. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2017. [best paper nominee] [arXiv, poster, code] - Visually grounded learning of keyword prediction from untranscribed speech
H. Kamper, S. Settle, G. Shakhnarovich, and K. Livescu. In Proceedings of Interspeech, 2017. [arXiv, slides, code] - Query-by-example search with discriminative neural acoustic word embeddings
S. Settle, K. Levin, H. Kamper, and K. Livescu. In Proceedings of Interspeech, 2017. [arXiv, poster] - Towards speech-to-text translation without speech recognition
S. Bansal, H. Kamper, A. Lopez, and S. Goldwater. In Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), 2017. [arXiv, slides] - Weakly supervised spoken term discovery using cross-lingual side information
S. Bansal, H. Kamper, S. Goldwater, and A. Lopez. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017. [arXiv, poster]
2016
- Deep convolutional acoustic word embeddings using word-pair side information
H. Kamper, W. Wang, and K. Livescu. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016. [arXiv, slides, code]
2015
- Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model
H. Kamper, A. Jansen, and S. Goldwater. In Proceedings of Interspeech, 2015. [poster] - A comparison of neural network methods for unsupervised representation learning on the Zero Resource Speech Challenge
D. Renshaw, H. Kamper, A. Jansen, and S. Goldwater. In Proceedings of Interspeech, 2015. [slides] - Unsupervised neural network based feature extraction using weak top-down constraints
H. Kamper, M. Elsner, A. Jansen, and S. Goldwater. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015. [slides, code]
2014
- Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings
H. Kamper, A. Jansen, S. King, and S. Goldwater. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT), 2014. [best poster presentation award] [poster, code]
2012
- Resource development and experiments in automatic South African broadcast news transcription
H. Kamper, F. de Wet, T. Hain, and T. R. Niesler. In Proceedings of the Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), 2012. [slides] - Optimisation of acoustic models for a target accent using decision-tree state clustering
H. Kamper and T. R. Niesler. In Proceedings of the Annual Symposium of the Pattern Recognition of South Africa (PRASA), 2012. [best paper award] [slides]
Pre-2012
- Multi-accent speech recognition of Afrikaans, Black and White varieties of South African English
H. Kamper and T. R. Niesler. In Proceedings of Interspeech, 2011. [poster] - Accent reclassification and speech recognition of Afrikaans, Black and White South African English
H. Kamper and T. R. Niesler. In Proceedings of the Annual Symposium of the Pattern Recognition of South Africa (PRASA), 2011. [slides] - Acoustic modelling of English-accented and Afrikaans-accented South African English
H. Kamper, F. J. Muamba Mukanya, and T. R. Niesler. In Proceedings of the Annual Symposium of the Pattern Recognition of South Africa (PRASA), 2010. [slides] - Characterisation and simulation of telephone channels using the TIMIT and NTIMIT databases
H. Kamper and T. R. Niesler. In Proceedings of the Annual Symposium of the Pattern Recognition of South Africa (PRASA), 2009. [slides]
Other publications
- Unsupervised neural and Bayesian models for zero-resource speech processing
H. Kamper. PhD dissertation, University of Edinburgh, UK, 2016. [arXiv] - Speech recognition of South African English accents
H. Kamper. Master’s thesis, Stellenbosch University, South Africa, 2012.