howlin
/

SVG

Model card Files Files and versions

SVG / README.md

howlin's picture

Update README.md

c9f46f1 verified 6 months ago

|

history blame contribute delete

1.12 kB

	---
	language: en
	license: mit
	tags:
	- diffusion
	- autoencoder
	- feature-space
	- svg
	references:
	- https://arxiv.org/abs/2510.15301
	---

	# SVG: Latent Diffusion Model without Variational Autoencoder

	## Model Description

	SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.

	Key features:

	- Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
	- Includes a lightweight residual encoder for refining fine-grained details.
	- Enables strong generation and perception performance.


	## How to Use

	For code, and instructions, see the GitHub repository:

	[https://github.com/shiml20/SVG](https://github.com/shiml20/SVG)


	Official project page:

	[https://howlin-wang.github.io/svg/](https://howlin-wang.github.io/svg/)

	Arxiv paper:

	[https://arxiv.org/abs/2510.15301](https://arxiv.org/abs/2510.15301)