ComparEdge SaaS Product Matcher

Semantic search model for SaaS product recommendation. Fine-tuned on 331 product descriptions from ComparEdge — a live SaaS comparison platform covering dozens of categories.

Given a natural-language query, this model returns the most relevant SaaS tools from the ComparEdge database.

Quick Start

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
from huggingface_hub import hf_hub_download
import numpy as np, json, torch

model = SentenceTransformer("ComparEdge/saas-product-matcher")

# Load pre-computed embeddings (hundreds of products, no re-encoding needed)
emb_path = hf_hub_download("ComparEdge/saas-product-matcher", "product_embeddings.npy")
idx_path  = hf_hub_download("ComparEdge/saas-product-matcher", "products_index.json")

embeddings = np.load(emb_path)
with open(idx_path) as f:
    products = json.load(f)

query  = "I need a CRM for a small startup"
q_emb  = model.encode(query, normalize_embeddings=True)
scores = cos_sim(torch.tensor(q_emb), torch.tensor(embeddings))[0]
top_idx = scores.argsort(descending=True)[:5]

for idx in top_idx:
    p = products[idx]
    print(f"{p['name']} ({p['category']}): {scores[idx]:.3f}")
    print(f"  → https://comparedge.com/tools/{p['slug']}")

Repository Files

File	Description
`product_embeddings.npy`	Pre-computed normalized embeddings for all hundreds of products (shape: 331×384)
`products_index.json`	Metadata index: slug, name, category, description
`example_search.py`	Standalone CLI search script

Training

Base model: sentence-transformers/all-MiniLM-L6-v2 (384-dim, 22M params)

Loss: MultipleNegativesRankingLoss — treats every other item in the batch as a hard negative, which works well for retrieval tasks without manual negative mining.

Training data: ~4,000 (query, product) pairs generated from 331 SaaS products across dozens of categories:

Natural-language queries from 16 templates per product ("best X tool", "cheap X software", etc.)
Product descriptions, long-form reviews, and feature lists
Use-case titles extracted from each product
Pricing signals (free-plan queries, under-$N queries)

Coverage: 28 SaaS categories from comparedge.com:

Category	Example products
project-management	Notion, Asana, Linear, ClickUp
crm	HubSpot, Pipedrive, Salesforce
email-marketing	Mailchimp, ActiveCampaign, Brevo
video-conferencing	Zoom, Google Meet, Whereby
ai-writing	Jasper, Copy.ai, Writesonic
design-tools	Figma, Canva, Adobe XD
password-managers	1Password, Bitwarden, Dashlane
vpn	NordVPN, ExpressVPN, Surfshark
… 20 more	…

Performance

Evaluated on held-out queries not seen during training:

Metric	Score
Top-1 accuracy	~78%
Top-5 accuracy	~94%
Mean Reciprocal Rank	0.85

ComparEdge
/

saas-product-matcher

ComparEdge SaaS Product Matcher

Quick Start

Repository Files

Training

Performance

Links

Dataset used to train ComparEdge/saas-product-matcher

Space using ComparEdge/saas-product-matcher 1