Instructions to use Surpem/Supertron-embedding-300M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Surpem/Supertron-embedding-300M with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Surpem/Supertron-embedding-300M") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use Surpem/Supertron-embedding-300M with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("Surpem/Supertron-embedding-300M") model = AutoModel.from_pretrained("Surpem/Supertron-embedding-300M") - Notebooks
- Google Colab
- Kaggle
Supertron-embedding-300M: High-Efficiency Semantic Representation Model
Model Description
Supertron-embedding-300M is a high-performance, compact embedding model fine-tuned from the google/embeddinggemma-300m architecture. It is specifically designed to provide state-of-the-art semantic representations for Retrieval-Augmented Generation (RAG), semantic search, and document clustering applications while maintaining a low computational footprint suitable for production environments.
- Developed by: Surpem
- Model Type: Sentence Transformer
- Architecture: Gemma-based Dense Transformer
- Base Model: google/embeddinggemma-300m
- License: Apache 2.0
- Language: English (en)
Results
Supertron-embedding-300M demonstrates competitive performance across the Massive Text Embedding Benchmark (MTEB). It is particularly effective in Semantic Textual Similarity (STS) tasks, outperforming many larger models in its weight class.
| Task Category | Task Name | Metric | Score |
|---|---|---|---|
| Semantic Similarity | STSBenchmark | cos_sim_spearman | 87.10 |
| Semantic Similarity | STS12 | cos_sim_spearman | 80.18 |
| Semantic Similarity | BIOSSES | cos_sim_spearman | 82.98 |
| Retrieval | NFCorpus | NDCG@10 | 37.07 |
| Classification | AmazonCounterfactual | Accuracy | 83.34 |
| Clustering | TwentyNewsgroups | V-Measure | 50.01 |
Get Started
This model can be easily integrated using the sentence-transformers library.
from sentence_transformers import SentenceTransformer
model_id = "surpem/Supertron-embedding-300M"
# Load the model
model = SentenceTransformer(model_id)
# Define target text
sentences = [
"The financial results exceeded market expectations.",
"The company reported better than expected quarterly earnings."
]
# Compute embeddings
embeddings = model.encode(sentences)
# Calculate cosine similarity
similarity = model.similarity(embeddings[0], embeddings[1])
print(f"Semantic Similarity: {similarity.item():.4f}")
Training Procedure
Hyperparameters
Precision: bfloat16
Max Sequence Length: 256 tokens
Optimizer: AdamW
Batch Size: 256
Learning Rate: 2e-5
Citation
Code-Snippet
@misc{surpem2026supertron,
title={Supertron-embedding-300M: High-Efficiency Semantic Representation Model},
author={Surpem},
year={2026},
url={[https://huggingface.co/surpem/Supertron-embedding-300M](https://huggingface.co/surpem/Supertron-embedding-300M)},
}
- Downloads last month
- 212
Model tree for Surpem/Supertron-embedding-300M
Base model
google/embeddinggemma-300mCollection including Surpem/Supertron-embedding-300M
Evaluation results
- cos_sim_spearman on MTEB STSBenchmarktest set self-reported87.101
- cos_sim_spearman on MTEB STS12test set self-reported80.177
- cos_sim_spearman on MTEB BIOSSEStest set self-reported82.978
- ndcg_at_10 on MTEB NFCorpustest set self-reported37.074
- accuracy on MTEB AmazonCounterfactualClassificationtest set self-reported83.342
- v_measure on MTEB TwentyNewsgroupsClustering.v2test set self-reported50.011