Add CodeSearchNet MRR benchmark results
Browse files
README.md
CHANGED
|
@@ -87,6 +87,22 @@ Trained on a single NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory).
|
|
| 87 |
- Stage 1: ~130 min (391 steps)
|
| 88 |
- Stage 2: ~37 min (117 steps)
|
| 89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
## Usage
|
| 91 |
|
| 92 |
```python
|
|
|
|
| 87 |
- Stage 1: ~130 min (391 steps)
|
| 88 |
- Stage 2: ~37 min (117 steps)
|
| 89 |
|
| 90 |
+
## Benchmark Results
|
| 91 |
+
|
| 92 |
+
### CodeSearchNet MRR (500 queries per language, 500 candidates)
|
| 93 |
+
|
| 94 |
+
| Language | GTE-ModernColBERT (base) | **Reason-Code-ModernColBERT (ours)** | Δ |
|
| 95 |
+
|------------|:---:|:---:|:---:|
|
| 96 |
+
| Python | 0.991 | 0.989 | -0.002 |
|
| 97 |
+
| Java | 0.829 | **0.866** | +0.037 |
|
| 98 |
+
| JavaScript | 0.802 | **0.839** | +0.037 |
|
| 99 |
+
| PHP | 0.841 | **0.862** | +0.021 |
|
| 100 |
+
| Go | 0.879 | **0.887** | +0.008 |
|
| 101 |
+
| Ruby | 0.773 | **0.831** | +0.058 |
|
| 102 |
+
| **Average** | 0.853 | **0.879** | **+0.026** |
|
| 103 |
+
|
| 104 |
+
Improves on the base model in 5 of 6 languages. Largest gains in Ruby (+5.8pp), Java (+3.7pp), and JavaScript (+3.7pp) — languages that benefited most from reasoning-enhanced training data. Python is near-ceiling at 0.99.
|
| 105 |
+
|
| 106 |
## Usage
|
| 107 |
|
| 108 |
```python
|