| --- |
| inference: false |
| datasets: |
| - answerdotai/MMARCO-japanese-32-scored-triplets |
| - unicamp-dl/mmarco |
| language: |
| - ja |
| pipeline_tag: sentence-similarity |
| tags: |
| - ColBERT |
| base_model: |
| - cl-tohoku/bert-base-japanese-v3 |
| - bclavie/JaColBERT |
| license: mit |
| library_name: RAGatouille |
| --- |
| |
| Model weights for the JaColBERTv2.4 checkpoint, which is the pre-post-training version of JaColBERTv2.5, using an entirely overhauled training recipe and trained on just 40% of the data of JaColBERTv2. |
|
|
| This model largely outperforms all previous approaches, including JaColBERTV2 multilingual models such as BGE-M3, on all datasets. |
|
|
| This page will be updated with the full details and the model report in the next few days. |
|
|
| ``` |
| @misc{clavié2024jacolbertv25optimisingmultivectorretrievers, |
| title={JaColBERTv2.5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources}, |
| author={Benjamin Clavié}, |
| year={2024}, |
| eprint={2407.20750}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.IR}, |
| url={https://arxiv.org/abs/2407.20750}, |
| } |
| ``` |
|
|