| | --- |
| | license: apache-2.0 |
| | tags: |
| | - vae |
| | - video-generation |
| | - education |
| | - fine-tuning |
| | - pytorch |
| | --- |
| | |
| | # ๐ Causal VAE Fine-Tuning Experiments (Indian Math Curriculum) |
| |
|
| | **Developing the "Imagination Engine" for [Zulense](https://huggingface.co/zulense)** |
| |
|
| | This repository contains experimental checkpoints for a **Causal VAE (Variational Autoencoder)** fine-tuned specifically on Indian educational content (NCERT Math). |
| |
|
| | The goal of these experiments is to adapt standard video generation VAEs to better reconstruct "blackboard style" line art, diagrams, and text-heavy educational videos, which often suffer from artifacts in general-purpose models. |
| |
|
| | ## ๐ Checkpoint Manifest |
| |
|
| | We are releasing two distinct checkpoints representing different stages of our training curriculum. |
| |
|
| | ### 1. `FineTune_2_checkpoint.pth` (Recommended) |
| | * **Target Domain:** **Class 5 Numeracy & Foundation** |
| | * **Status:** โ
**Improved Stability** |
| | * **Experiment Notes:** * This run focused on simpler, foundational concepts (Class 5 curriculum) to stabilize the loss. |
| | * **Improvements:** Significantly reduced `kl_divergence` and reconstruction loss compared to the V1 baseline. |
| | * **Use Case:** Better at handling basic shapes and slower temporal movements typical in primary education teaching. |
| |
|
| | ### 2. `checkpoint-0.pth` (Legacy / Research Artifact) |
| | * **Target Domain:** **Class 8 Geometry & Algebra** |
| | * **Status:** โ ๏ธ **Unstable / High Loss** |
| | * **Experiment Notes:** * This was our initial attempt at modeling complex Class 8 geometry. |
| | * **Known Issues:** The model struggled with high-frequency details (text/grid lines), resulting in higher `vae_loss` and unstable KL divergence. |
| | * **Why we kept it:** Retained for comparative analysis to show the difficulty jump between primary and middle school visual complexity. |
| |
|
| | ## ๐ฌ Technical Context |
| |
|
| | Standard video VAEs are optimized for photorealism. Our experiments suggest that for **educational video synthesis**: |
| | 1. **Text Preservation:** Standard VAEs struggle to reconstruct the sharp text found in math explanations. |
| | 2. **Curriculum Learning:** Fine-tuning on simpler visual concepts (Class 5) before complex ones (Class 8) yields better convergence. |
| |
|
| | ## ๐ป Usage (PyTorch) |
| |
|
| | ```python |
| | import torch |
| | |
| | # Load the Causal VAE checkpoint |
| | checkpoint_path = "FineTune_2_checkpoint.pth" # Use the stable Class 5 checkpoint |
| | state_dict = torch.load(checkpoint_path, map_location="cpu") |
| | |
| | print(f"Loaded checkpoint: {checkpoint_path}") |
| | # Note: This requires the specific Causal VAE architecture definition to load state_dict |