ProgramerSalar
/

causal_vae_checkpoint

video-generation

Model card Files Files and versions

causal_vae_checkpoint / README.md

ProgramerSalar's picture

Update README.md

6c42947 verified 13 days ago

|

history blame contribute delete

2.57 kB

	---
	license: apache-2.0
	tags:
	- vae
	- video-generation
	- education
	- fine-tuning
	- pytorch
	---

	# 🎓 Causal VAE Fine-Tuning Experiments (Indian Math Curriculum)

	Developing the "Imagination Engine" for [Zulense](https://huggingface.co/zulense)

	This repository contains experimental checkpoints for a Causal VAE (Variational Autoencoder) fine-tuned specifically on Indian educational content (NCERT Math).

	The goal of these experiments is to adapt standard video generation VAEs to better reconstruct "blackboard style" line art, diagrams, and text-heavy educational videos, which often suffer from artifacts in general-purpose models.

	## 📂 Checkpoint Manifest

	We are releasing two distinct checkpoints representing different stages of our training curriculum.

	### 1. `FineTune_2_checkpoint.pth` (Recommended)
	* Target Domain: Class 5 Numeracy & Foundation
	* Status: ✅ Improved Stability
	* Experiment Notes: * This run focused on simpler, foundational concepts (Class 5 curriculum) to stabilize the loss.
	* Improvements: Significantly reduced `kl_divergence` and reconstruction loss compared to the V1 baseline.
	* Use Case: Better at handling basic shapes and slower temporal movements typical in primary education teaching.

	### 2. `checkpoint-0.pth` (Legacy / Research Artifact)
	* Target Domain: Class 8 Geometry & Algebra
	* Status: ⚠️ Unstable / High Loss
	* Experiment Notes: * This was our initial attempt at modeling complex Class 8 geometry.
	* Known Issues: The model struggled with high-frequency details (text/grid lines), resulting in higher `vae_loss` and unstable KL divergence.
	* Why we kept it: Retained for comparative analysis to show the difficulty jump between primary and middle school visual complexity.

	## 🔬 Technical Context

	Standard video VAEs are optimized for photorealism. Our experiments suggest that for educational video synthesis:
	1. Text Preservation: Standard VAEs struggle to reconstruct the sharp text found in math explanations.
	2. Curriculum Learning: Fine-tuning on simpler visual concepts (Class 5) before complex ones (Class 8) yields better convergence.

	## 💻 Usage (PyTorch)

	```python
	import torch

	# Load the Causal VAE checkpoint
	checkpoint_path = "FineTune_2_checkpoint.pth" # Use the stable Class 5 checkpoint
	state_dict = torch.load(checkpoint_path, map_location="cpu")

	print(f"Loaded checkpoint: {checkpoint_path}")
	# Note: This requires the specific Causal VAE architecture definition to load state_dict