Price tag extractor model

This model was developed to extract information from price tag images, as part of the Open Prices project. A detailed report of the model development can be found here.

The most important extraction information are:

the type of the price tag (CATEGORY or PRODUCT)
the product name
all prices present on the price tags
the currency used
the barcode (if type=PRODUCT)
the category (if type=CATEGORY)

This model is a fine-tuned version of unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit. Only the LoRA weights are saved here. To use the model, please load the base model first, and then apply the LoRA weights. This can be efficiently done with vLLM. The price-tag-extraction dataset with version v1.1 was used during training.

Training procedure

The values of the most important hyperparameters used during training are the following:

lora_r: 32
lora_alpha: 32
epochs: 1
batch_size: 32 (8 per GPU * 4 gradient accumulation steps)
learning_rate: 2e-4
weight_decay: 0.01
ẁarmup_ratio: 0.05

LoRA adapters were added to the language, attention and MLP layers, but not to the vision layers. The model was trained on 1 RTX 4090 24GB GPU for about 24 hours.

This training script was used for training.

Metrics

The model was evaluated using llm-evals framework on the price-tag-extraction benchmark v2.0.

  price: 500/523 (95.60% accuracy)
  barcode: 360/408 (88.24% accuracy)
  uncertain_barcode_or_product_name: 491/528 (92.99% accuracy)
  category: 34/59 (57.63% accuracy)

For reference, the original model without fine-tuning obtained the following scores on the same benchmark:

  price: 466/523 (89.10% accuracy)
  barcode: 345/408 (84.56% accuracy)
  uncertain_barcode_or_product_name: 497/528 (94.13% accuracy)
  category: 19/66 (28.79% accuracy)

Fine-tuning the model on a custom dataset significantly improved the extraction accuracy.

Serving

To serve with vLLM, you can use the following command:

OMP_NUM_THREADS=1 vllm serve --model Qwen/Qwen3-VL-8B-Instruct \
  --limit-mm-per-prompt.video 0 \
  --max-model-len 8192 \
  --mm-processor-cache-gb 0 \
  --enable-lora \
  --lora-modules price_tag_extractor=openfoodfacts/price-tag-extractor \
  --max-lora-rank 32

The following script can be used to send requests to the server (ex: uv run chat.py https://prices.openfoodfacts.org/img/price-tags/000/143/000143456.webp)

# /// script
# dependencies = [
#     "openai==2.15.0",
#     "typer",
#     "requests"
# ]
# ///
import json
import time
from typing import Annotated

import requests
import typer
from openai import OpenAI


def main(
    image_url: Annotated[
        str, typer.Argument(help="URL of the price tag image to extract from")
    ],
    base_url: Annotated[
        str, typer.Argument(help="Base URL for OpenAI-compatible server")
    ] = "http://localhost:8000/v1",
    api_key: Annotated[str, typer.Option(help="API key for authentication")] = "",
    model_name: Annotated[
        str,
        typer.Option(
            help="Model name to use, here it is the name of the registered LoRA adapters."
        ),
    ] = "price_tag_extractor",
):
    typer.echo(f"Extracting price tag information from image '{image_url}'", err=True)

    client = OpenAI(base_url=base_url, api_key=api_key)

    typer.echo("Fetching generation configuration...", err=True)
    config = requests.get(
        "https://huggingface.co/datasets/openfoodfacts/price-tag-extraction/resolve/v1.1/config.json",
    ).json()
    json_schema = config["json_schema"]
    instructions = config["instructions"]
    json_schema_str = json.dumps(json_schema)
    full_instructions = f"{instructions}\n\nResponse must be formatted as JSON, and follow this JSON schema:\n{json_schema_str}"

    typer.echo("Sending request to model...", err=True)

    start_time = time.monotonic()
    response = client.chat.completions.create(
        model=model_name,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": full_instructions,
                    },
                    {"type": "image_url", "image_url": {"url": image_url}},
                ],
            }
        ],
    )
    end_time = time.monotonic()
    typer.echo(f"Request completed in {end_time - start_time:.2f} seconds", err=True)
    typer.echo(response.choices[0].message.content)


if __name__ == "__main__":
    typer.run(main)

Framework versions

TRL: 0.24.0
Transformers: 4.57.3
Pytorch: 2.9.0
Datasets: 4.3.0
Tokenizers: 0.22.2

Acknowledgements

This project was funded by the NLnet Foundation as part of the NGI0 Commons Funds. Many thanks to them for their support!

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for openfoodfacts/price-tag-extractor

Base model

Qwen/Qwen3-VL-8B-Instruct

Quantized

unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit

Finetuned

(140)

this model

openfoodfacts
/

price-tag-extractor