ComparEdge SaaS Product Matcher

Semantic search model for SaaS product recommendation. Fine-tuned on 331 product descriptions from ComparEdge β€” a live SaaS comparison platform covering dozens of categories.

Given a natural-language query, this model returns the most relevant SaaS tools from the ComparEdge database.

Quick Start

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
from huggingface_hub import hf_hub_download
import numpy as np, json, torch

model = SentenceTransformer("ComparEdge/saas-product-matcher")

# Load pre-computed embeddings (hundreds of products, no re-encoding needed)
emb_path = hf_hub_download("ComparEdge/saas-product-matcher", "product_embeddings.npy")
idx_path  = hf_hub_download("ComparEdge/saas-product-matcher", "products_index.json")

embeddings = np.load(emb_path)
with open(idx_path) as f:
    products = json.load(f)

query  = "I need a CRM for a small startup"
q_emb  = model.encode(query, normalize_embeddings=True)
scores = cos_sim(torch.tensor(q_emb), torch.tensor(embeddings))[0]
top_idx = scores.argsort(descending=True)[:5]

for idx in top_idx:
    p = products[idx]
    print(f"{p['name']} ({p['category']}): {scores[idx]:.3f}")
    print(f"  β†’ https://comparedge.com/tools/{p['slug']}")

Repository Files

File Description
product_embeddings.npy Pre-computed normalized embeddings for all hundreds of products (shape: 331Γ—384)
products_index.json Metadata index: slug, name, category, description
example_search.py Standalone CLI search script

Training

Base model: sentence-transformers/all-MiniLM-L6-v2 (384-dim, 22M params)

Loss: MultipleNegativesRankingLoss β€” treats every other item in the batch as a hard negative, which works well for retrieval tasks without manual negative mining.

Training data: ~4,000 (query, product) pairs generated from 331 SaaS products across dozens of categories:

  • Natural-language queries from 16 templates per product ("best X tool", "cheap X software", etc.)
  • Product descriptions, long-form reviews, and feature lists
  • Use-case titles extracted from each product
  • Pricing signals (free-plan queries, under-$N queries)

Coverage: 28 SaaS categories from comparedge.com:

Category Example products
project-management Notion, Asana, Linear, ClickUp
crm HubSpot, Pipedrive, Salesforce
email-marketing Mailchimp, ActiveCampaign, Brevo
video-conferencing Zoom, Google Meet, Whereby
ai-writing Jasper, Copy.ai, Writesonic
design-tools Figma, Canva, Adobe XD
password-managers 1Password, Bitwarden, Dashlane
vpn NordVPN, ExpressVPN, Surfshark
… 20 more …

Performance

Evaluated on held-out queries not seen during training:

Metric Score
Top-1 accuracy ~78%
Top-5 accuracy ~94%
Mean Reciprocal Rank 0.85

Links

  • 🌐 ComparEdge β€” Live SaaS comparison and product discovery platform
  • πŸ“Š Dataset β€” Raw product data on HuggingFace
  • πŸ”— API β€” REST API for programmatic access
Downloads last month
49
Safetensors
Model size
22.7M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train ComparEdge/saas-product-matcher

Space using ComparEdge/saas-product-matcher 1