What settingsand hardware did you use?

#5
by TeeZee - opened

Hi,

I'm trying to replicate abliteration of oss-120B using 2xB200 (180 GB VRAM each) and it fails during reloading model with CUDA OOM every single time, even for batch size 2.
What hardware and batch size did you use?

The changes for multi GPU haven't been released yet. You'd want to clone the GitHub repo and use it instead of installing the package using pip. In theory 2x B200 should be enough I think.

I used 4x B200 with a batch size of 128.

Any plans to release, is there some PR i could look into?

Well, I'm not the creator of heretic, but as far as I know they plan to release v1.1.0 soon. In the meantime it's just a matter of cloning the GitHub repository and installing its dependencies.

Sure, thanks. I did fork it myself to fix errors when the model is split between multiple GPUs. I thought you also had to do some fixes before running it on 4 GPUs, and maybe you pushed a PR that hasn’t been merged yet. It seems that for GPT-OSS-120B, 4× B200 is the sweet spot.

Finally this worked for me:
4xB200 on Vast.ai
my own fork with some fixes: https://github.com/teezeerc/heretic-multigpu
heretic --model openai/gpt-oss-120b --batch-size 64
2.5h total time

result: https://huggingface.co/TeeZee/gpt-oss-120b-heretic-v1

TeeZee changed discussion status to closed

Sign up or log in to comment