Price tag extractor model
This model was developed to extract information from price tag images, as part of the Open Prices project. A detailed report of the model development can be found here.
The most important extraction information are:
- the type of the price tag (
CATEGORYorPRODUCT) - the product name
- all prices present on the price tags
- the currency used
- the barcode (if
type=PRODUCT) - the category (if
type=CATEGORY)
This model is a fine-tuned version of unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit.
Only the LoRA weights are saved here. To use the model, please load the base model first, and then apply the LoRA weights. This can be efficiently done with vLLM.
The price-tag-extraction dataset with version v1.1 was used during training.
Training procedure
The values of the most important hyperparameters used during training are the following:
lora_r: 32lora_alpha: 32epochs: 1batch_size: 32 (8 per GPU * 4 gradient accumulation steps)learning_rate: 2e-4weight_decay: 0.01ẁarmup_ratio: 0.05
LoRA adapters were added to the language, attention and MLP layers, but not to the vision layers. The model was trained on 1 RTX 4090 24GB GPU for about 24 hours.
This training script was used for training.
Metrics
The model was evaluated using llm-evals framework on the price-tag-extraction benchmark v2.0.
price: 500/523 (95.60% accuracy)
barcode: 360/408 (88.24% accuracy)
uncertain_barcode_or_product_name: 491/528 (92.99% accuracy)
category: 34/59 (57.63% accuracy)
For reference, the original model without fine-tuning obtained the following scores on the same benchmark:
price: 466/523 (89.10% accuracy)
barcode: 345/408 (84.56% accuracy)
uncertain_barcode_or_product_name: 497/528 (94.13% accuracy)
category: 19/66 (28.79% accuracy)
Fine-tuning the model on a custom dataset significantly improved the extraction accuracy.
Serving
To serve with vLLM, you can use the following command:
OMP_NUM_THREADS=1 vllm serve --model Qwen/Qwen3-VL-8B-Instruct \
--limit-mm-per-prompt.video 0 \
--max-model-len 8192 \
--mm-processor-cache-gb 0 \
--enable-lora \
--lora-modules price_tag_extractor=openfoodfacts/price-tag-extractor \
--max-lora-rank 32
The following script can be used to send requests to the server (ex: uv run chat.py https://prices.openfoodfacts.org/img/price-tags/000/143/000143456.webp)
# /// script
# dependencies = [
# "openai==2.15.0",
# "typer",
# "requests"
# ]
# ///
import json
import time
from typing import Annotated
import requests
import typer
from openai import OpenAI
def main(
image_url: Annotated[
str, typer.Argument(help="URL of the price tag image to extract from")
],
base_url: Annotated[
str, typer.Argument(help="Base URL for OpenAI-compatible server")
] = "http://localhost:8000/v1",
api_key: Annotated[str, typer.Option(help="API key for authentication")] = "",
model_name: Annotated[
str,
typer.Option(
help="Model name to use, here it is the name of the registered LoRA adapters."
),
] = "price_tag_extractor",
):
typer.echo(f"Extracting price tag information from image '{image_url}'", err=True)
client = OpenAI(base_url=base_url, api_key=api_key)
typer.echo("Fetching generation configuration...", err=True)
config = requests.get(
"https://huggingface.co/datasets/openfoodfacts/price-tag-extraction/resolve/v1.1/config.json",
).json()
json_schema = config["json_schema"]
instructions = config["instructions"]
json_schema_str = json.dumps(json_schema)
full_instructions = f"{instructions}\n\nResponse must be formatted as JSON, and follow this JSON schema:\n{json_schema_str}"
typer.echo("Sending request to model...", err=True)
start_time = time.monotonic()
response = client.chat.completions.create(
model=model_name,
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": full_instructions,
},
{"type": "image_url", "image_url": {"url": image_url}},
],
}
],
)
end_time = time.monotonic()
typer.echo(f"Request completed in {end_time - start_time:.2f} seconds", err=True)
typer.echo(response.choices[0].message.content)
if __name__ == "__main__":
typer.run(main)
Framework versions
- TRL: 0.24.0
- Transformers: 4.57.3
- Pytorch: 2.9.0
- Datasets: 4.3.0
- Tokenizers: 0.22.2
Acknowledgements
This project was funded by the NLnet Foundation as part of the NGI0 Commons Funds. Many thanks to them for their support!
Model tree for openfoodfacts/price-tag-extractor
Base model
Qwen/Qwen3-VL-8B-Instruct