Ara-BEST-RQ-600M-14k

Ara-BEST-RQ-600M-14k is a 600M-parameter self-supervised speech representation model for Arabic and Arabic dialects. It is part of the Ara-BEST-RQ family introduced in Ara-Best-RQ: Multi Dialectal Arabic SSL.

This model was pretrained on the combined Ara-BEST-RQ dataset: 13,723h 08m 43s of speech, combining the crawled Ara-BEST-RQ data with other publicly available datasets.

Paper: Ara-Best-RQ: Multi Dialectal Arabic SSL
Dataset: Elyadata/Ara-Best-RQ_dataset
Implementation: elyadata/AraBEST-RQ

Model Details

Model Description

Ara-BEST-RQ is a family of Arabic-focused self-supervised learning (SSL) speech models based on the BEST-RQ framework. The models are designed to learn speech representations that transfer well to Arabic speech processing tasks, including automatic speech recognition (ASR) and dialect identification (DID).

This checkpoint corresponds to the 600M variant pretrained on the combined 14k-hour dataset.

Model type: Self-supervised speech representation model
Architecture: Conformer-based BEST-RQ encoder
Parameters: ~600M (611.6M)
Training data: combined Ara-BEST-RQ dataset
Languages: Arabic, including multiple dialects
Primary use: Speech representation learning / downstream fine-tuning

Architecture

The 600M Ara-BEST-RQ model uses:

24 Conformer encoder layers
Model dimension: 1024
8 attention heads
Feed-forward dimension: 4096
GELU activations
Layer normalization before attention
Relative position multi-head attention
Convolutional front-end with two blocks
Random projection quantizer with 4096 codebook entries of dimension 16

Training Data

The model was pretrained on the combined Ara-BEST-RQ dataset: 13,723h 08m 43s of speech data. The combined set includes the crawled Ara-BEST-RQ data together with other publicly available datasets described in the paper.

The released dataset on Hugging Face provides metadata only: YouTube video identifiers and audio segment boundaries. No audio or video files are distributed as part of the dataset.

Dataset link: Elyadata/Ara-Best-RQ_dataset

Pretraining

The paper reports the following pretraining losses after 300k updates for this model:

Training set	Train loss	Validation loss
Combined	3.57	3.40

Evaluation

The paper evaluates Ara-BEST-RQ models on automatic speech recognition and dialect identification tasks. The following results are reported for the Ara-BEST-RQ-600M-14k model.

Automatic Speech Recognition

WER scores on ASR benchmarks:

Dataset	WER
Common Voice 19.0 Arabic	18.59
MGB-3	28.78
MGB-5	54.54
TARIC-SLU	21.14
Average	30.76

Dialect Identification

Results on ADI-20:

Split	Accuracy	Weighted F1
Validation	94.66	94.71
Test	92.05	92.07

Usage

This is a self-supervised pretrained model intended to be used as a speech encoder or as an initialization checkpoint for downstream fine-tuning.

For training and fine-tuning recipes, please refer to the official implementation:

git clone https://github.com/elyadata/AraBEST-RQ
cd AraBEST-RQ

You can download the checkpoint from Hugging Face using:

from huggingface_hub import snapshot_download

model_dir = snapshot_download("Elyadata/AraBEST-RQ-600M-14k")
print(model_dir)

Please refer to the repository configuration and SpeechBrain recipes for the correct model-loading interface.

Fine-tuning with SpeechBrain

To fine-tune this pretrained Ara-BEST-RQ checkpoint in a SpeechBrain recipe, adapt the pretrainer section of your YAML configuration so that it loads both the pretrained model checkpoint and the corresponding normalizer.

Example:

pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
    collect_in: !ref <save_folder>
    loadables:
        pt_model: !ref <pt_model>
        normalize: !ref <normalize>
    paths:
        pt_model: !ref <pt_model_path>/model.ckpt
        normalize: !ref <pt_model_path>/normalizer.ckpt

In your downstream recipe, make sure that:

<pt_model> points to the Ara-BEST-RQ pretrained model object used in your training graph.
<normalize> points to the normalization module used by the recipe.
<pt_model_path> points to the local directory containing model.ckpt and normalizer.ckpt.
<save_folder> is the experiment directory where SpeechBrain should collect and manage pretrained components.

This setup allows SpeechBrain to initialize the downstream model from the Ara-BEST-RQ SSL checkpoint before fine-tuning on task-specific data.

Citation

If you use this model, please cite the Ara-BEST-RQ paper:

@misc{elleuch2026arabestrqmultidialectalarabic,
      title={Ara-Best-RQ: Multi Dialectal Arabic SSL}, 
      author={Haroun Elleuch and Ryan Whetten and Salima Mdhaffar and Yannick Estève and Fethi Bougares},
      year={2026},
      eprint={2603.21900},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.21900}, 
}

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Elyadata/AraBEST-RQ-600M-14k

Paper for Elyadata/AraBEST-RQ-600M-14k

Ara-Best-RQ: Multi Dialectal Arabic SSL

Paper • 2603.21900 • Published Mar 23