kk014
/

mistral-7b-docstring

@@ -1,66 +1,123 @@
 ---
 license: apache-2.0
-library_name: peft
 tags:
-- trl
-- sft
-- generated_from_trainer
 base_model: mistralai/Mistral-7B-v0.1
-model-index:
-- name: mistral-7b-docstring
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # mistral-7b-docstring
-This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.9943
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 2
-- eval_batch_size: 2
-- seed: 42
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 16
-- optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.03
-- num_epochs: 1
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch | Step | Validation Loss |
-|:-------------:|:-----:|:----:|:---------------:|
-| 8.0055        | 0.4   | 200  | 1.0115          |
-| 7.9363        | 0.8   | 400  | 0.9943          |
-### Framework versions
-- PEFT 0.11.1
-- Transformers 4.46.0
-- Pytorch 2.10.0+cu128
-- Datasets 2.19.0
-- Tokenizers 0.20.3

 ---
+language: en
 license: apache-2.0
 tags:
+  - code
+  - python
+  - docstring
+  - mistral
+  - qlora
+  - peft
+  - code-generation
 base_model: mistralai/Mistral-7B-v0.1
+datasets:
+  - code_search_net
 ---
 # mistral-7b-docstring
+Mistral 7B fine-tuned with QLoRA on Python docstring generation from CodeSearchNet.
+Outperforms Llama 3.3 70B — a model 10x larger — on both ROUGE-L and BERTScore on domain-specific NumPy-style docstring generation.
+## Evaluation results
+Evaluated on 100 held-out Python functions from CodeSearchNet (never seen during training).
+| Model | ROUGE-L | BERTScore F1 |
+|---|---|---|
+| **Mistral 7B fine-tuned (this model)** | **0.2033** | **0.7739** |
+| Llama 3.3 70B via Groq | 0.1715 | 0.7594 |
+| Mistral 7B base (no fine-tuning) | 0.1102 | 0.7118 |
+The fine-tuned 7B model beats Llama 3.3 70B on ROUGE-L (+18.5%) and BERTScore (+1.9%) while being 10x smaller and running at a fraction of the inference cost.
+## How to use
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+from peft import PeftModel
+import torch
+BASE_MODEL = "mistralai/Mistral-7B-v0.1"
+# Load in 4-bit for efficient inference
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+)
+tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
+base_model = AutoModelForCausalLM.from_pretrained(
+    BASE_MODEL,
+    quantization_config=bnb_config,
+    device_map="auto",
+)
+model = PeftModel.from_pretrained(base_model, "kk014/mistral-7b-docstring")
+model.eval()
+# Generate a docstring
+function_code = """
+def calculate_bmi(weight_kg, height_m):
+    return weight_kg / (height_m ** 2)
+""".strip()
+prompt = (
+    "You are a Python documentation expert. "
+    "Write a clear, concise NumPy-style docstring for the following Python function.\n\n"
+    f"### Function:\n{function_code}\n\n"
+    "### Docstring:"
+)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=150,
+        temperature=0.1,
+        do_sample=True,
+        pad_token_id=tokenizer.eos_token_id,
+    )
+generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
+docstring  = generated[len(prompt):].strip()
+print(docstring)
+```
+## Training details
+| Parameter | Value |
+|---|---|
+| Base model | mistralai/Mistral-7B-v0.1 |
+| Dataset | CodeSearchNet (Python split) |
+| Training samples | 8,000 |
+| Method | QLoRA (4-bit NF4 quantisation) |
+| LoRA rank | 16 |
+| LoRA alpha | 32 |
+| Epochs | 1 |
+| Batch size | 2 (effective 16 with grad accum) |
+| Learning rate | 2e-4 |
+| Hardware | Kaggle T4 x2 (free tier) |
+| Training time | ~4 hours |
+| Framework | HuggingFace PEFT + TRL |
+## Limitations
+- Trained on NumPy-style docstrings specifically — output style may differ for Google or Sphinx style
+- Best on standalone functions under ~50 lines
+- May repeat examples in generated output at very low temperatures
+- Evaluated on CodeSearchNet Python split only — performance on other codebases may vary
+## Citation
+If you use this model, please cite the original QLoRA paper:
+```
+@article{dettmers2023qlora,
+  title={QLoRA: Efficient Finetuning of Quantized LLMs},
+  author={Dettmers, Tim and others},
+  journal={arXiv preprint arXiv:2305.14314},
+  year={2023}
+}
+```