Abstract: The integration of electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) can facilitate the advancement of brain-computer interfaces (BCIs). However, existing ...
Abstract: Neural vocoders often struggle with aliasing in latent feature spaces, caused by time-domain nonlinear operations and resampling layers. Aliasing folds high-frequency components into the low ...
Diffusion Speech is a diffusion-based text-to-speech model. Our speech synthesis pipeline is quite simple. We use a diffusion transformer model (DiT) to predict the duration of each phoneme. Then we ...
This repository contains the implementation of (MQGAN) for audio synthesis. The project is structured to facilitate the entire workflow from data preparation to model deployment.