What settingsand hardware did you use?

by TeeZee - opened Dec 5, 2025

Dec 5, 2025

Hi,

I'm trying to replicate abliteration of oss-120B using 2xB200 (180 GB VRAM each) and it fails during reloading model with CUDA OOM every single time, even for batch size 2.
What hardware and batch size did you use?

kldzj

Owner Dec 5, 2025

•

edited Dec 5, 2025

The changes for multi GPU haven't been released yet. You'd want to clone the GitHub repo and use it instead of installing the package using pip. In theory 2x B200 should be enough I think.

I used 4x B200 with a batch size of 128.

TeeZee

Dec 5, 2025

Any plans to release, is there some PR i could look into?

kldzj

Owner Dec 5, 2025

•

edited Dec 5, 2025

Well, I'm not the creator of heretic, but as far as I know they plan to release v1.1.0 soon. In the meantime it's just a matter of cloning the GitHub repository and installing its dependencies.

TeeZee

Dec 5, 2025

Sure, thanks. I did fork it myself to fix errors when the model is split between multiple GPUs. I thought you also had to do some fixes before running it on 4 GPUs, and maybe you pushed a PR that hasn’t been merged yet. It seems that for GPT-OSS-120B, 4× B200 is the sweet spot.

TeeZee

Dec 5, 2025

Finally this worked for me:
4xB200 on Vast.ai
my own fork with some fixes: https://github.com/teezeerc/heretic-multigpu
heretic --model openai/gpt-oss-120b --batch-size 64
2.5h total time

result: https://huggingface.co/TeeZee/gpt-oss-120b-heretic-v1

TeeZee changed discussion status to closed Dec 5, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment