YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Architecture & Training Configuration:

Base Model Configuration: This variant is built upon the Llama2-7B configuration, ensuring a robust foundation that aligns with the latest advancements in model architecture.
Sequence Length Adaptation: Originally processed data for a sequence length of 2048 was detokenized and re-encoded to fit a sequence length of 4096. This step follows the preprocessing strategy of Megatron-LM, enhancing our model's capacity to understand and generate more complex sequences.
Batch Size & Token Management: We adopted a batch size capable of managing 4 million tokens, tailored to accommodate the increased sequence length and ensure efficient data processing.
Integration of GQA Technologies: To boost training efficiency, our configuration includes the integration of Gradient Quantization and Aggregation technologies. With 32 attention heads and a group size of 4, this feature significantly enhances the model's learning and processing capabilities.

Safetensors

Model size

6B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including m-a-p/Amber-Reproduce-79.69B