Our collection of large-scale datasets for fine-tuning embedding-based text retrieval and re-ranking models