Abstract: With the emergence of audio-language models, constructing large-scale paired audio-language datasets has become essential yet challenging for model development, primarily due to the ...
Abstract: The visualization model using LLM primarily operates based on pre-trained materials. However, these models may experience hallucination problem when dealing with knowledge not found in the ...
This repository provides the code for "Improving Query-by-Vocal Imitation with Contrastive Learning and Audio Pretraining", presented at DCASE 2024. The paper addresses the challenge of audio ...
Pull requests help you collaborate on code with other people. As pull requests are created, they’ll appear here in a searchable and filterable list. To get started, you should create a pull request.