| | --- |
| | language: en |
| | license: apache-2.0 |
| | tags: |
| | - vision |
| | - image-to-code |
| | - cad |
| | - cadquery |
| | - vision-encoder-decoder |
| | - vit |
| | - gpt2 |
| | datasets: |
| | - CADCODER/GenCAD-Code |
| | metrics: |
| | - rouge |
| | widget: |
| | - src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg |
| | example_title: Example CAD Image |
| | --- |
| | |
| | # VIT-CodeGPT CAD Code Generator |
| |
|
| | This model generates CADQuery Python code from images of 3D CAD objects. It uses a Vision Transformer (ViT) encoder and CodeGPT decoder in a vision-encoder-decoder architecture. |
| |
|
| | ## Model Details |
| |
|
| | - **Architecture**: Vision Encoder-Decoder (ViT + CodeGPT) |
| | - **Encoder**: google/vit-base-patch16-224 |
| | - **Decoder**: microsoft/CodeGPT-small-py |
| | - **Task**: Image-to-Code Generation (CAD) |
| | - **Dataset**: CADCODER/GenCAD-Code |
| | - **Training Samples**: 10,000 (8,500 train / 1,500 val) |
| | - **Training Time**: ~4 hours 12 minutes |
| |
|
| | ## Training Configuration |
| |
|
| | - **Batch Size**: 4 (effective: 16 with gradient accumulation) |
| | - **Learning Rate**: 3e-5 |
| | - **Epochs**: 3 |
| | - **Max Length**: 256 tokens |
| | - **Optimizer**: AdamW with warmup |
| | - **Mixed Precision**: FP16 |
| |
|
| | ## Performance |
| |
|
| | Final training metrics: |
| | - **ROUGE-1**: 0.0944 |
| | - **ROUGE-2**: 0.0040 |
| | - **ROUGE-L**: 0.0863 |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTokenizer |
| | from PIL import Image |
| | import torch |
| | |
| | # Load the model |
| | model = VisionEncoderDecoderModel.from_pretrained("Thehunter99/vit-codegpt-cadcoder") |
| | feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224") |
| | tokenizer = AutoTokenizer.from_pretrained("microsoft/CodeGPT-small-py") |
| | |
| | # Load and process image |
| | image = Image.open("path/to/your/cad_image.png") |
| | pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values |
| | |
| | # Generate CAD code |
| | with torch.no_grad(): |
| | generated_ids = model.generate( |
| | pixel_values, |
| | max_length=256, |
| | num_beams=4, |
| | early_stopping=True, |
| | pad_token_id=tokenizer.eos_token_id |
| | ) |
| | |
| | generated_code = tokenizer.decode(generated_ids[0], skip_special_tokens=True) |
| | print(generated_code) |
| | ``` |
| |
|
| | ## Example Output |
| |
|
| | Input: Image of a 3D cube |
| | Output: |
| | ```python |
| | import cadquery as cq |
| | |
| | # Create a simple cube |
| | result = cq.Workplane("XY").box(10, 10, 10) |
| | ``` |
| |
|
| | ## Training Data |
| |
|
| | The model was trained on the CADCODER/GenCAD-Code dataset, which contains pairs of 3D CAD images and their corresponding CADQuery Python code. |
| |
|
| | ## Limitations |
| |
|
| | - Limited to CADQuery syntax |
| | - Best performance on geometric shapes similar to training data |
| | - May struggle with very complex or unusual CAD designs |
| | - Maximum output length: 256 tokens |
| |
|
| | ## Citation |
| |
|
| | If you use this model, please cite: |
| |
|
| | ```bibtex |
| | @misc{vit-codegpt-cadcoder, |
| | title={VIT-CodeGPT CAD Code Generator}, |
| | author={Your Name}, |
| | year={2024}, |
| | publisher={Hugging Face}, |
| | url={https://huggingface.co/Thehunter99/vit-codegpt-cadcoder} |
| | } |
| | ``` |
| |
|