CodeLlama-7b Text-to-SQL (QLoRA Fine-tune)

A CodeLlama-7b model fine-tuned on the Spider dataset using QLoRA (4-bit quantization + LoRA adapters) for the task of converting natural language questions into SQL queries.

Model Details

Model Description

This model is a parameter-efficient fine-tune of CodeLlama-7b using QLoRA on the Spider Text-to-SQL benchmark dataset. It takes a database schema (as DDL statements) and a natural language question as input, and generates the corresponding SQL query.

Developed by: [Yash Shukla]
Model type: Causal Language Model (CodeLlama-7b + LoRA adapters)
Language(s) (NLP): English
License: Llama 2 Community License
Finetuned from model: codellama/CodeLlama-7b-hf

Model Sources

Repository: [https://huggingface.co/yash-164/codellama-7b-text2sql/tree/main]
Base Model: https://huggingface.co/codellama/CodeLlama-7b-hf
Dataset: https://huggingface.co/datasets/spider

Uses

Direct Use

This model is intended for converting natural language questions into SQL queries given a database schema. It is suitable for:

Building natural language database interfaces
SQL query auto-completion tools
Educational tools for learning SQL

Downstream Use

The LoRA adapter can be merged into the base model or used directly with PEFT for further fine-tuning on domain-specific SQL dialects or private schemas.

Out-of-Scope Use

Not suitable for production databases without human review of generated queries
Not tested on non-English questions
Not designed for NoSQL or non-relational query languages

Bias, Risks, and Limitations

The model is trained only on Spider which covers ~200 databases — it may struggle with unseen schema patterns
Generated SQL is not guaranteed to be syntactically or semantically correct
The model may hallucinate column or table names not present in the provided schema
All column types are inferred as TEXT/INTEGER/REAL — nuanced type handling may be incorrect

Recommendations

Always validate generated SQL against the actual database before execution. Do not run generated queries directly on production systems without review.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

BASE_MODEL  = "codellama/CodeLlama-7b-hf"
ADAPTER     = "your-username/codellama-7b-text2sql"  # this repo

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base      = AutoModelForCausalLM.from_pretrained(BASE_MODEL, torch_dtype=torch.float16, device_map="auto")
model     = PeftModel.from_pretrained(base, ADAPTER)

schema   = "CREATE TABLE employees (id INTEGER, name TEXT, salary REAL, dept_id INTEGER);\nCREATE TABLE departments (id INTEGER, dept_name TEXT);"
question = "List all employees in the Engineering department ordered by salary descending."

prompt = (
    f"<s>[INST] You are an expert SQL assistant. "
    f"Given the database schema below, write a SQL query that answers the question.\n\n"
    f"### Schema:\n{schema}\n\n"
    f"### Question:\n{question} [/INST]\n"
)

inputs  = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=False)
sql     = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(sql)

Training Details

Training Data

Dataset: Spider — a large-scale human-labeled Text-to-SQL dataset
Train split: ~7,000 examples
Validation split: ~1,034 examples
Filtering: Examples exceeding 1024 tokens after prompt formatting were excluded

Training Procedure

Preprocessing

Each example was formatted using an Alpaca-style instruction template with the database DDL schema and natural language question as input, and the SQL query as output. SQL queries were normalized by collapsing whitespace and stripping trailing semicolons.

Training Hyperparameters

Training regime: fp16 mixed precision
Quantization: 4-bit NF4 (bitsandbytes)
LoRA rank (r): 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: q_proj, v_proj
Epochs: 3
Batch size: 4 per device + 4 gradient accumulation steps (effective batch = 16)
Learning rate: 2e-4 with cosine schedule
Warmup ratio: 0.03
Optimizer: paged_adamw_32bit
Max sequence length: 1024

Speeds, Sizes, Times

Hardware: Kaggle T4 x2 GPU
Training time: ~2.5 hours for 3 epochs
Adapter size: ~84 MB

Evaluation

Testing Data, Factors & Metrics

Testing Data

Spider validation split (~1,034 examples across unseen databases).

Metrics

Eval loss monitored during training via evaluation_strategy="steps" every 200 steps
Best checkpoint selected by lowest eval loss

Results

Metric	Value
Eval Loss	[0.160100]

Summary

The model learns to ground SQL generation in the provided schema, producing syntactically valid queries for common SQL patterns including SELECT, WHERE, JOIN, GROUP BY, and ORDER BY.

Environmental Impact

Hardware Type: NVIDIA Tesla T4 x2
Hours used: ~2.5
Cloud Provider: Kaggle (Google Cloud)
Compute Region: us-central1
Carbon Emitted: ~0.2 kg CO2eq (estimated via ML Impact Calculator)

Technical Specifications

Model Architecture and Objective

Base: CodeLlama-7b (decoder-only transformer, 7B parameters)
Adapter: LoRA applied to q_proj and v_proj attention layers
Objective: Causal language modeling (next token prediction) on formatted SQL instruction examples

Compute Infrastructure

Hardware

2x NVIDIA Tesla T4 (16 GB VRAM each) on Kaggle free tier

Software

Python 3.12
PyTorch 2.2.0
Transformers 4.40.0
PEFT 0.10.0
TRL 0.9.6
bitsandbytes 0.43.1
Accelerate 0.29.3

Citation

BibTeX:

@inproceedings{yu2018spider,
  title     = {Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
  author    = {Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Yao, Qingning and Roman, Shanelle and others},
  booktitle = {EMNLP},
  year      = {2018}
}

Model Card Authors

[Yash Shukla]

Model Card Contact

[yashshuklbhadohi@gmail.com]

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support