ctrltokyo commited on
Commit
b9a34e5
·
verified ·
1 Parent(s): 18981cb

Add CodeSearchNet MRR benchmark results

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -87,6 +87,22 @@ Trained on a single NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory).
87
  - Stage 1: ~130 min (391 steps)
88
  - Stage 2: ~37 min (117 steps)
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ## Usage
91
 
92
  ```python
 
87
  - Stage 1: ~130 min (391 steps)
88
  - Stage 2: ~37 min (117 steps)
89
 
90
+ ## Benchmark Results
91
+
92
+ ### CodeSearchNet MRR (500 queries per language, 500 candidates)
93
+
94
+ | Language | GTE-ModernColBERT (base) | **Reason-Code-ModernColBERT (ours)** | Δ |
95
+ |------------|:---:|:---:|:---:|
96
+ | Python | 0.991 | 0.989 | -0.002 |
97
+ | Java | 0.829 | **0.866** | +0.037 |
98
+ | JavaScript | 0.802 | **0.839** | +0.037 |
99
+ | PHP | 0.841 | **0.862** | +0.021 |
100
+ | Go | 0.879 | **0.887** | +0.008 |
101
+ | Ruby | 0.773 | **0.831** | +0.058 |
102
+ | **Average** | 0.853 | **0.879** | **+0.026** |
103
+
104
+ Improves on the base model in 5 of 6 languages. Largest gains in Ruby (+5.8pp), Java (+3.7pp), and JavaScript (+3.7pp) — languages that benefited most from reasoning-enhanced training data. Python is near-ceiling at 0.99.
105
+
106
  ## Usage
107
 
108
  ```python