Image-to-Image
Diffusers
Safetensors
QwenImageEditPlusPipeline

Model Overview

Description:

Qwen-Image-Edit-NVPCB-OVSL2SL transforms synthetic solder-light printed-circuit-board (PCB) component crops β€” produced in NVIDIA Omniverse β€” into the photographic solder-light style captured at NVIDIA PCB inspection stations, so that downstream PCB inspection models trained on real solder-light photographs can be augmented with Omniverse-generated synthetic data. The release is an NVIDIA fine-tuned version of the Qwen-Image-Edit image-to-image diffusion pipeline (diffusion transformer, Qwen2.5-VL text encoder, Qwen-Image VAE, tokenizer, image processor, and scheduler configuration), specialized for the Omniverse β†’ NVPCB solder-light style transfer. Qwen-Image-Edit-NVPCB-OVSL2SL v1.0.0 was developed by NVIDIA as part of the NVPCB inspection-data harmonization pipeline. This model is ready for commercial use.

License/Terms of Use:

Governing Terms: Use of this model is governed by the NVIDIA Open Model Agreement. Additional Information: Apache License, Version 2.0.

Deployment Geography:

Global

Use Case:

NVIDIA engineers and researchers building PCB inspection / automated optical inspection (AOI) systems that need to be augmented with Omniverse-generated synthetic data. The model converts Omniverse-rendered solder-light PCB component crops into the photographic solder-light style produced by NVIDIA's physical inspection stations, closing the sim-to-real style gap so that inspection models trained on real photographs can be evaluated or augmented with synthetic Omniverse data. This model is not intended to be the primary inspection decision-maker; it is a sim-to-real data-translation step. Inspection pass/fail decisions must come from a downstream inspection model with human review.

Release Date:

Github 06/02/2026 via https://github.com/NVIDIA/paidf-augmentation

References(s):

Model Architecture:

Architecture Type: Transformer (diffusion transformer with cross-modal conditioning)

Network Architecture: This release is a self-contained HuggingFace diffusers pipeline directory. All of the following components are redistributed as part of the release artifact:

  • transformer/ (NVIDIA fine-tuned, redistributed): the upstream Qwen-Image-Edit flow-matching image-to-image diffusion transformer, fine-tuned by NVIDIA on the attention and feed-forward projections of QwenImageTransformerBlock (to_q, to_k, to_v, to_out, the cross-attention add_{q,k,v}_proj / to_add_out, and img_mlp / txt_mlp).
  • text_encoder/ (redistributed unmodified): Qwen2.5-VL; attends over both the input image and the instruction prompt.
  • vae/ (redistributed unmodified): Qwen-Image VAE.
  • tokenizer/, processor/, scheduler/, and model_index.json (redistributed unmodified): Qwen-Image-Edit tokenizer, image processor, scheduler configuration, and pipeline entry point.
  • Fine-tuning methodology: NVIDIA fine-tuned the transformer using LoRA (rank 16, ~1.7 Γ— 10^8 parameters introduced during training); the resulting weights were then merged back into the transformer for release, so the released artifact is a standalone diffusers pipeline that requires no separate adapter file at inference time.
  • The released pipeline directory can be loaded directly with diffusers.QwenImageEditPipeline.from_pretrained(...).

This model was developed based on Qwen-Image-Edit.

Number of model parameters: Approximately ~2.0 Γ— 10^10 (20B) total parameters in the released checkpoint. Of these, ~1.7 Γ— 10^8 (170M) parameters in the diffusion transformer were updated by NVIDIA fine-tuning; the remaining parameters come from the upstream Qwen-Image-Edit pipeline (transformer + Qwen2.5-VL text encoder + Qwen-Image VAE) and are redistributed unmodified.

Cumulative Compute: 0.6 GPU-hour total on a single NVIDIA H100 SXM (0.5 GPU-hour for the 1500-step fine-tuning run + ~5 GPU-minutes for the latent/embedding cache build).

Estimated Energy and Emissions for Model Training: ~0.4 kWh and ~0.16 kgCO2e total. Methodology: GPU energy = 0.6 GPU-hour Γ— 0.7 kW (H100 SXM rated TDP) Γ— 0.6 average utilization (typical for LoRA fine-tuning, which is not consistently GPU-bound) β‰ˆ 0.25 kWh; multiplied by an assumed datacenter PUE of 1.5 to account for cooling and facility overhead β‰ˆ 0.38 kWh; multiplied by 0.4 kgCO2e/kWh (U.S. national-grid average) β‰ˆ 0.16 kgCO2e. Estimates use rated TDP rather than measured wall-power and are therefore conservative upper bounds; actual emissions depend on the specific datacenter's PUE and regional grid carbon intensity at training time.

Input(s):

Input Type(s): Image, Text

Input Format(s):

  • Image: PNG / JPG, Red, Green, Blue (RGB)
  • Text: UTF-8 instruction prompt (English)

Input Parameters:

  • Image: Two-Dimensional (2D)
  • Text: One-Dimensional (1D)

Other Properties Related to Input: Fine-tuned at target area 262,144 pixels (~512Γ—512); other resolutions are accepted by the underlying diffusers pipeline but NVIDIA fine-tuning was not performed at them, so style fidelity may degrade. Input must be a single Omniverse-rendered PCB component crop on an approximately black background, similar to the synthetic solder-light style the model was fine-tuned on. The accompanying instruction prompt is a fixed English sentence and is not user-configurable. The prompt is:

"Render this PCB component crop as a real NVPCB inspection-line solder-light photograph: dark photographic board surface with bright orange and blue specular highlights on the solder pads, sharp realistic textures."

The instruction prompt is fixed and not user-configurable. The only input a user provides is the PCB image to be processed; the model performs an image-constrained relighting edit of that image, not free-text text-to-image generation. This is a deliberate guardrail: the model was fine-tuned on this single instruction only, so the prompt is locked and cannot be a vector for misuse.


Output(s)

Output Type(s): Image

Output Format(s): PNG; Red, Green, Blue (RGB)

Output Parameters: Two-Dimensional (2D)

Other Properties Related to Output: Output resolution matches the input target area used during caching (~512Γ—512). The output preserves component identity and board layout while transferring the rendering style from Omniverse-synthetic solder-light to NVPCB photographic solder-light.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engine(s):

  • PyTorch (via HuggingFace diffusers QwenImageEditPipeline)

Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere (A100)
NVIDIA Hopper (H100)
NVIDIA Lovelace (RTX 40-series)

Supported Operating System(s):

  • Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s):

v1.0.0 β€” qwen-image-edit-nvpcb-OVSL2SL (NVIDIA fine-tune of the upstream Qwen-Image-Edit pipeline, 1500 training steps). The released artifact is a self-contained HuggingFace diffusers pipeline directory containing transformer/, text_encoder/, vae/, tokenizer/, processor/, scheduler/, and model_index.json.

Training, Testing, and Evaluation Datasets:

Dataset Overview

  • Total Number of Datasets: 1 (paired Omniverse-synthetic / NVPCB-photographic solder-light component crops; NVIDIA-internal)
  • Total Size: 228 paired component crops (228 Omniverse-rendered synthetic solder-light inputs + 228 NVPCB photographic solder-light targets), ~512Γ—512 each
  • Dataset partition: Training ~95%, Validation ~5% (held-out by filename stem; no separate test split β€” evaluation is qualitative side-by-side plus CLIP-style embedding distance against held-out targets)
  • Time period for data collection: H1 2026 (January – June 2026)

Training Dataset:

Data Modality:

  • Image
  • Text (fixed instruction prompt)

** Image Training Data Size

  • [Less than a Million Images]

** Text Training Data Size

  • [Less than a Billion Tokens]

** Data Collection Method by dataset

  • Hybrid: Synthetic, Human (Omniverse synthetic rendering for the input side + manually-collected photography at NVIDIA inspection stations for the target side)

** Labeling Method by dataset

  • Not Applicable β€” paired-image translation is supervised by the target image itself; no per-image label is required beyond the fixed instruction prompt.

Properties: 228 paired component crops with Omniverse-rendered synthetic solder-light as input and NVPCB photographic solder-light as target. Modality: image + a fixed English instruction prompt. Content nature: synthetic renders and photographs of inanimate PCBs (NVIDIA-internal). No personal data, no copyright-protected web content, no machine-generated text/speech. No human subjects are depicted.

Testing Dataset:

Data Collection Method by dataset:

  • [Not Applicable]

Labeling Method by dataset:

  • [Not Applicable]

Properties Not Applicable β€” the model is qualitatively evaluated on the held-out validation split (~5%) via side-by-side HTML reports rather than against a separate test split.

Evaluation Dataset:

** Data Collection Method by dataset

  • Hybrid: Synthetic, Human (Omniverse synthetic rendering + manually-collected photography)

** Labeling Method by dataset

  • Not Applicable

** Properties: Held-out validation pairs (~5%) from the same paired set, used to produce qualitative side-by-side comparison HTML reports and to compute CLIP-style embedding distance between generated and ground-truth target images.

Inference:

Acceleration Engine: PyTorch (native, bf16 weights) via diffusers.QwenImageEditPipeline.from_pretrained(...), loaded directly from the released pipeline directory.

Test Hardware:

  • NVIDIA H100
  • NVIDIA A100

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please make sure you have proper rights and permissions for all input image content used as PCB component crops. PCB boards are inanimate objects, but users should still verify that any incidentally captured personally identifiable content (e.g., visible serial numbers, hand-written part identifiers) is handled in accordance with applicable privacy laws prior to use.

Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.

For more detailed information on ethical considerations for this model, please see the Bias, Explainability, Privacy, and Safety & Security subcards alongside this overview in this directory: bias.md, explainability.md, safety.md, privacy.md.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Bias

Field Response
Participation considerations from adversely impacted groups protected classes in model design and testing: Not Applicable. The model operates on synthetic renders and photographs of inanimate printed circuit boards; no human subjects are depicted in the training, validation, or evaluation data, and no human-attribute label is used as a supervision signal.
Measures taken to mitigate against unwanted bias: Training data was screened to ensure it contains only PCB component crops and no incidental images of people, identifying documents, or workspaces. The fixed instruction prompt is reviewed and describes a lighting/style transformation only; it contains no demographic or human descriptors.
Bias Metric (If Measured): Not Applicable.

Explainability

Field Response
Intended Task/Domain: Sim-to-real paired image-to-image translation for printed-circuit-board (PCB) inspection. The model converts Omniverse-rendered synthetic solder-light component crops into the photographic solder-light style captured at NVIDIA PCB inspection stations, so that inspection models trained on real photographs can be augmented with synthetic Omniverse data.
Model Type: Diffusion transformer (LoRA fine-tune of Qwen-Image-Edit) with a Qwen2.5-VL text encoder that conditions on the input image as well as the instruction prompt.
Intended Users: NVIDIA engineers and researchers building PCB inspection / AOI systems that need to be augmented with Omniverse-generated synthetic data.
Output: Image (RGB, PNG). The output is a re-rendered version of the input Omniverse PCB crop in the NVPCB photographic solder-light style; component identity and board layout are preserved.
Describe how the model works: The input Omniverse-rendered PCB photograph is encoded into latents by the Qwen-Image VAE. In parallel, the fixed instruction prompt is encoded by Qwen2.5-VL conditioned on the input image. The released diffusion transformer β€” the upstream Qwen-Image-Edit transformer, fine-tuned by NVIDIA on its attention and feed-forward projections β€” iteratively denoises a noise tensor toward a latent that, when decoded by the VAE, reproduces the same scene under the target NVPCB photographic solder-light style described in the prompt. Flow-matching with Qwen-Image's resolution-aware time-shift schedule is used during training; Classifier-Free Guidance (CFG) is used at inference.
Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: Not Applicable.
Technical Limitations & Mitigation: (1) Fine-tuning was performed at one resolution (target area ~512Γ—512); large deviations may degrade style fidelity. (2) Generalization outside the Omniverse-synthetic distribution it was trained on (e.g., real photographs as input, synthetic renders with non-black backgrounds, extreme close-ups or wide shots of full boards) is not guaranteed. (3) The model is a sim-to-real data-translation step and is not itself a defect-detection model; defect-detection decisions must be made by a downstream inspection model. (4) Style strength is fixed at release time and cannot be tuned at inference; if a different style strength is required, a new fine-tuned checkpoint must be released. Mitigation: (a) the instruction prompt is fixed and not user-configurable (see Input properties in the overview), so the model is constrained to the trained relighting task and is not exposed as a general-purpose image generator β€” the prompt cannot be a vector for misuse; (b) sampling controls (true-CFG, seed) may be tuned at the serving layer to address artifacts when they appear; (c) the model must not be used as the sole signal for any inspection pass/fail decision β€” defect-detection decisions must come from a downstream inspection model with human review.
Verified to have met prescribed NVIDIA quality standards: Yes
Performance Metrics: Qualitative side-by-side comparison of generated vs. held-out target images; CLIP-style embedding distance between generated and ground-truth target on held-out pairs; visual inspection of component-identity preservation.
Potential Known Risks: The model may hallucinate solder-pad highlights or component textures not present on the underlying board, especially on out-of-distribution inputs or at low LoRA scales; it should be used as a data augmentation aid, and downstream inspection models trained on these images should be validated against real photographs before deployment in production QA lines.

This model can generate synthetic images and may produce content that is offensive, unsafe, misleading, indecent, or unsuitable for a target deployment. Users should implement robust safety guardrails β€” including content filtering, abuse monitoring, and access controls β€” to reduce the risk of harmful outputs. Users are responsible for ensuring that their use of the model complies with all applicable laws and regulations, and for regularly reviewing and updating their guardrails as risks evolve.
Licensing: Governing Terms: Use of this model is governed by the NVIDIA Open Model Agreement. Additional Information: Apache License, Version 2.0.

Privacy

Field Response
Generatable or reverse engineerable personal data? No.
Personal data used to create this model? No.
Was consent obtained for any personal data used? Not Applicable.
How often is dataset reviewed? Before Release. The PCB image set is reviewed at the time of capture / render by the team that produced it and again before being used for training.
Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? No.
Is there provenance for all datasets used in training? Yes.
Does data labeling (annotation, metadata) comply with privacy laws? Yes.
Is data compliant with data subject requests for data correction or removal, if such a request was made? Not Applicable.
Applicable Privacy Policy https://www.nvidia.com/en-us/about-nvidia/privacy-policy/

Safety

Field Response
Model Application Field(s): Industrial / Machinery and Robotics β€” specifically printed-circuit-board (PCB) inspection sim-to-real data augmentation for automated optical inspection (AOI) workflows.
Describe the life critical impact (if present). Not Applicable.
Use Case Restrictions: Governing Terms: Use of this model is governed by the NVIDIA Open Model Agreement. Additional Information: Apache License, Version 2.0.
Model and dataset restrictions: The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
Downloads last month
235
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for nvidia/Qwen-Image-Edit-NVPCB-OVSL2SL