SykoSLM
/

SykoDiffusion-V1.0

latent-diffusion

Model card Files Files and versions

SykoDiffusion-V1.0 / README.md

SykoSLM's picture

Create README.md

1857212 verified 17 days ago

|

history blame contribute delete

2.71 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- diffusion
	- text-to-image
	- latent-diffusion
	- pytorch
	pipeline_tag: text-to-image
	---

	# SykoDiffusion V1.0

	İlk versiyon latent diffusion modelim. CLIP text encoder ve VAE kullanarak metinden görüntü üretir.

	## Model Detayları

	\| Özellik \| Değer \|
	\|---\|---\|
	\| Parametre \| ~100M \|
	\| Mimari \| Latent Diffusion (U-Net) \|
	\| Eğitim Verisi \| CC3M (~100k görsel) \|
	\| Eğitim Adımı \| 20.000 step \|
	\| Çözünürlük \| 256×256 \|
	\| Donanım \| 2× NVIDIA T4 \|

	## Kullanım

	```python
	import torch
	from diffusers import UNet2DConditionModel, AutoencoderKL, DDIMScheduler
	from transformers import CLIPTextModel, CLIPTokenizer
	from PIL import Image
	import numpy as np

	device = "cuda" if torch.cuda.is_available() else "cpu"

	unet = UNet2DConditionModel.from_pretrained("SykoSLM/SykoDiffusion-V1.0").to(device).half()
	vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse").to(device).half()
	clip = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device).half()
	tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
	scheduler = DDIMScheduler(num_train_timesteps=1000, beta_start=0.00085,
	beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False)

	@torch.no_grad()
	def generate(prompt, steps=30, cfg=7.5, seed=42):
	torch.manual_seed(seed)
	tokens = tokenizer(prompt, padding="max_length", truncation=True, max_length=77, return_tensors="pt").to(device)
	text_emb = clip(**tokens).last_hidden_state
	neg_tokens = tokenizer("", padding="max_length", truncation=True, max_length=77, return_tensors="pt").to(device)
	neg_emb = clip(**neg_tokens).last_hidden_state
	emb = torch.cat([neg_emb, text_emb])
	latents = torch.randn(1, 4, 32, 32, device=device, dtype=torch.float16)
	scheduler.set_timesteps(steps)
	for t in scheduler.timesteps:
	pred = unet(torch.cat([latents]*2), t, encoder_hidden_states=emb).sample
	neg_p, text_p = pred.chunk(2)
	pred = neg_p + cfg * (text_p - neg_p)
	latents = scheduler.step(pred, t, latents).prev_sample
	image = vae.decode(latents / vae.config.scaling_factor).sample
	image = (image.clamp(-1,1)+1)/2
	image = (image[0].permute(1,2,0).cpu().float().numpy()*255).astype("uint8")
	return Image.fromarray(image)

	img = generate("a cat sitting on a chair")
	img.save("output.png")
	```

	## Notlar

	- Bu model deneysel bir ilk versiyondur, üretim kalitesi sınırlı olabilir.
	- En iyi sonuç için `cfg` değerini 5–10 arasında deneyin.
	- İngilizce prompt önerilir.

	## Geliştirici

	[@SykoSLM](https://huggingface.co/SykoSLM)