Instructions to use LiconStudio/LTX-2.3-Multiple-Subject-Reference with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use LiconStudio/LTX-2.3-Multiple-Subject-Reference with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("LiconStudio/LTX-2.3-Multiple-Subject-Reference", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Overview
This model implements a novel approach to multi-reference video generation using Multiple Subject Reference (MSR). Instead of introducing additional encoder branches or fusion modules, we transform multiple static reference images into a pseudo-video sequence that shares the same representation space as the target video.
Usage
This LoRA requires the ComfyUI-Licon-MSR plugin for ComfyUI. A sample workflow is included in the model files for easy testing and experimentation.
Key Features
Multi-Reference Visual Memory
- Token-level reference preservation: Multiple reference images are encoded as video latents, preserving fine-grained visual information at token level rather than compressing into a single embedding
- Native self-attention retrieval: The target video tokens directly access reference tokens through the model's existing self-attention mechanism—no new architectural components needed
- In-context conditioning: References serve as "visual memory" within the main token sequence, not as external conditioning inputs
Flexible Reference Composition
- 2 to 5 reference images: Supports varying numbers of reference inputs with increasing complexity
- Complementary semantic roles: Each reference image can carry different information:
- Subject identity
- Object/prop details
- Scene/background
- Local textures
- Multiple viewpoints
What It Can Do
Identity Preservation Across References
Generate videos where multiple reference identities are simultaneously preserved:
- Multiple characters from different reference images
- Character + object combinations
- Object + scene compositions
Relation-Based Composition
Beyond mere identity preservation, the model can compose references based on textual relation descriptions:
- Action interactions (handing, picking up, pushing)
- Spatial relationships (left-right, foreground-background)
- Temporal event structures (start → process → result)
Cross-Reference Attribute Selection
The model learns to selectively retrieve attributes from different references:
- Face from reference A, clothing from reference B
- Object identity from one reference, pose/position from another
- Background elements from scene references
Usage Tips (V1 Version)
- Prompt description: Requires concise but accurate description of reference images. Over-description or under-description both lead to consistency degradation
- High-motion scenes: 50fps recommended to ensure smooth motion coherence
- Generation reliability: Typically requires 2-3 sampling runs to achieve accurate results
Results Showcase
V1 Version
- Downloads last month
- 2,013
Model tree for LiconStudio/LTX-2.3-Multiple-Subject-Reference
Base model
Lightricks/LTX-2.3