Instructions to use SykoSLM/SykoDiffusion-V1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use SykoSLM/SykoDiffusion-V1.0 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("SykoSLM/SykoDiffusion-V1.0", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
| license: apache-2.0 | |
| language: | |
| - en | |
| tags: | |
| - diffusion | |
| - text-to-image | |
| - latent-diffusion | |
| - pytorch | |
| pipeline_tag: text-to-image | |
| # SykoDiffusion V1.0 | |
| İlk versiyon latent diffusion modelim. CLIP text encoder ve VAE kullanarak metinden görüntü üretir. | |
| ## Model Detayları | |
| | Özellik | Değer | | |
| |---|---| | |
| | Parametre | ~100M | | |
| | Mimari | Latent Diffusion (U-Net) | | |
| | Eğitim Verisi | CC3M (~100k görsel) | | |
| | Eğitim Adımı | 20.000 step | | |
| | Çözünürlük | 256×256 | | |
| | Donanım | 2× NVIDIA T4 | | |
| ## Kullanım | |
| ```python | |
| import torch | |
| from diffusers import UNet2DConditionModel, AutoencoderKL, DDIMScheduler | |
| from transformers import CLIPTextModel, CLIPTokenizer | |
| from PIL import Image | |
| import numpy as np | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| unet = UNet2DConditionModel.from_pretrained("SykoSLM/SykoDiffusion-V1.0").to(device).half() | |
| vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse").to(device).half() | |
| clip = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device).half() | |
| tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14") | |
| scheduler = DDIMScheduler(num_train_timesteps=1000, beta_start=0.00085, | |
| beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False) | |
| @torch.no_grad() | |
| def generate(prompt, steps=30, cfg=7.5, seed=42): | |
| torch.manual_seed(seed) | |
| tokens = tokenizer(prompt, padding="max_length", truncation=True, max_length=77, return_tensors="pt").to(device) | |
| text_emb = clip(**tokens).last_hidden_state | |
| neg_tokens = tokenizer("", padding="max_length", truncation=True, max_length=77, return_tensors="pt").to(device) | |
| neg_emb = clip(**neg_tokens).last_hidden_state | |
| emb = torch.cat([neg_emb, text_emb]) | |
| latents = torch.randn(1, 4, 32, 32, device=device, dtype=torch.float16) | |
| scheduler.set_timesteps(steps) | |
| for t in scheduler.timesteps: | |
| pred = unet(torch.cat([latents]*2), t, encoder_hidden_states=emb).sample | |
| neg_p, text_p = pred.chunk(2) | |
| pred = neg_p + cfg * (text_p - neg_p) | |
| latents = scheduler.step(pred, t, latents).prev_sample | |
| image = vae.decode(latents / vae.config.scaling_factor).sample | |
| image = (image.clamp(-1,1)+1)/2 | |
| image = (image[0].permute(1,2,0).cpu().float().numpy()*255).astype("uint8") | |
| return Image.fromarray(image) | |
| img = generate("a cat sitting on a chair") | |
| img.save("output.png") | |
| ``` | |
| ## Notlar | |
| - Bu model deneysel bir ilk versiyondur, üretim kalitesi sınırlı olabilir. | |
| - En iyi sonuç için `cfg` değerini 5–10 arasında deneyin. | |
| - İngilizce prompt önerilir. | |
| ## Geliştirici | |
| [@SykoSLM](https://huggingface.co/SykoSLM) |