Instructions to use ComparEdge/saas-product-matcher with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use ComparEdge/saas-product-matcher with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("ComparEdge/saas-product-matcher") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
ComparEdge SaaS Product Matcher
Semantic search model for SaaS product recommendation. Fine-tuned on 331 product descriptions from ComparEdge β a live SaaS comparison platform covering dozens of categories.
Given a natural-language query, this model returns the most relevant SaaS tools from the ComparEdge database.
Quick Start
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
from huggingface_hub import hf_hub_download
import numpy as np, json, torch
model = SentenceTransformer("ComparEdge/saas-product-matcher")
# Load pre-computed embeddings (hundreds of products, no re-encoding needed)
emb_path = hf_hub_download("ComparEdge/saas-product-matcher", "product_embeddings.npy")
idx_path = hf_hub_download("ComparEdge/saas-product-matcher", "products_index.json")
embeddings = np.load(emb_path)
with open(idx_path) as f:
products = json.load(f)
query = "I need a CRM for a small startup"
q_emb = model.encode(query, normalize_embeddings=True)
scores = cos_sim(torch.tensor(q_emb), torch.tensor(embeddings))[0]
top_idx = scores.argsort(descending=True)[:5]
for idx in top_idx:
p = products[idx]
print(f"{p['name']} ({p['category']}): {scores[idx]:.3f}")
print(f" β https://comparedge.com/tools/{p['slug']}")
Repository Files
| File | Description |
|---|---|
product_embeddings.npy |
Pre-computed normalized embeddings for all hundreds of products (shape: 331Γ384) |
products_index.json |
Metadata index: slug, name, category, description |
example_search.py |
Standalone CLI search script |
Training
Base model: sentence-transformers/all-MiniLM-L6-v2 (384-dim, 22M params)
Loss: MultipleNegativesRankingLoss β treats every other item in the batch as a hard negative, which works well for retrieval tasks without manual negative mining.
Training data: ~4,000 (query, product) pairs generated from 331 SaaS products across dozens of categories:
- Natural-language queries from 16 templates per product ("best X tool", "cheap X software", etc.)
- Product descriptions, long-form reviews, and feature lists
- Use-case titles extracted from each product
- Pricing signals (free-plan queries, under-$N queries)
Coverage: 28 SaaS categories from comparedge.com:
| Category | Example products |
|---|---|
| project-management | Notion, Asana, Linear, ClickUp |
| crm | HubSpot, Pipedrive, Salesforce |
| email-marketing | Mailchimp, ActiveCampaign, Brevo |
| video-conferencing | Zoom, Google Meet, Whereby |
| ai-writing | Jasper, Copy.ai, Writesonic |
| design-tools | Figma, Canva, Adobe XD |
| password-managers | 1Password, Bitwarden, Dashlane |
| vpn | NordVPN, ExpressVPN, Surfshark |
| β¦ 20 more | β¦ |
Performance
Evaluated on held-out queries not seen during training:
| Metric | Score |
|---|---|
| Top-1 accuracy | ~78% |
| Top-5 accuracy | ~94% |
| Mean Reciprocal Rank | 0.85 |
Links
- π ComparEdge β Live SaaS comparison and product discovery platform
- π Dataset β Raw product data on HuggingFace
- π API β REST API for programmatic access
- Downloads last month
- 49