answerdotai
/

JaColBERTv2.4

Sentence Similarity

Model card Files Files and versions

JaColBERTv2.4 / README.md

IAMJB's picture

Update README.md

4737d9d verified almost 2 years ago

|

1.1 kB

	---
	inference: false
	datasets:
	- answerdotai/MMARCO-japanese-32-scored-triplets
	- unicamp-dl/mmarco
	language:
	- ja
	pipeline_tag: sentence-similarity
	tags:
	- ColBERT
	base_model:
	- cl-tohoku/bert-base-japanese-v3
	- bclavie/JaColBERT
	license: mit
	library_name: RAGatouille
	---

	Model weights for the JaColBERTv2.4 checkpoint, which is the pre-post-training version of JaColBERTv2.5, using an entirely overhauled training recipe and trained on just 40% of the data of JaColBERTv2.

	This model largely outperforms all previous approaches, including JaColBERTV2 multilingual models such as BGE-M3, on all datasets.

	This page will be updated with the full details and the model report in the next few days.

	```
	@misc{clavié2024jacolbertv25optimisingmultivectorretrievers,
	title={JaColBERTv2.5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources},
	author={Benjamin Clavié},
	year={2024},
	eprint={2407.20750},
	archivePrefix={arXiv},
	primaryClass={cs.IR},
	url={https://arxiv.org/abs/2407.20750},
	}
	```