EnYa32's picture
Update README.md
360f4b3 verified
---
title: UnsupervisedCustumerPrediction
emoji: 🧩
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Streamlit app that predicts cluster labels from uploaded CSV
license: mit
---
# 🧩 Clustering Predictor (KMeans / GMM)
This Space predicts cluster labels for uploaded tabular data using a saved preprocessing pipeline:
- **StandardScaler**
- **PCA (95% explained variance)**
- A clustering model (**KMeans** or **Gaussian Mixture Model**)
## ✅ What this app does
- Upload a CSV file
- The app checks required feature columns
- Applies **scaler + PCA**
- Outputs **Predicted** cluster label for each row
- Lets you download the predictions as a CSV
## 📦 Required files (must be in the repo root)
Place these files next to `app.py`:
- `feature_names.pkl`
- `scaler.pkl`
- `pca.pkl`
- `kmeans_model_k9.pkl` *(optional, if you want KMeans)*
- `gmm_model_k9.pkl` *(optional, if you want GMM)*
## 🧾 Input format
Your CSV must include all feature columns stored in `feature_names.pkl`.
Optional:
- You may include an `id` or `Id` column.
If present, it will be included in the output as `Id`.
## ▶️ Run locally
```bash
pip install -r requirements.txt
streamlit run app.py
📝 Notes
This is an unsupervised project, so cluster quality is evaluated on Kaggle using the leaderboard score.
Visual separation in 2D does not always reflect the Kaggle metric.