| # Milestone Summaries |
|
|
| This document provides a comprehensive overview of all six project milestones, documenting the evolution of the Hopcroft Skill Classification system from requirements engineering through production monitoring. |
|
|
| --- |
|
|
| ## Milestone 1: Requirements Engineering |
|
|
| **Objective:** Define the problem space, stakeholders, and success criteria using the Machine Learning Canvas framework. |
|
|
| ### Key Deliverables |
|
|
| | Component | Description | |
| |-----------|-------------| |
| | **Prediction Task** | Multi-label classification of 217 technical skills from GitHub issue/PR text | |
| | **Stakeholders** | Project managers, team leads, developers | |
| | **Data Source** | SkillScope DB with 7,245 merged PRs from 11 Java repositories | |
| | **Success Metrics** | Micro-F1 score improvement over baseline, precision/recall balance | |
|
|
| ### ML Canvas Framework |
|
|
| The complete ML Canvas is documented in [ML Canvas.md](./ML%20Canvas.md), covering: |
|
|
| - **Value Proposition**: Automated task assignment optimization |
| - **Decisions**: Resource allocation for issue resolution |
| - **Data Collection**: Automated labeling via API call detection |
| - **Impact Simulation**: Outperform SkillScope RF + TF-IDF baseline |
| - **Monitoring**: Continuous evaluation with drift detection |
|
|
| ### Identified Risks & Mitigations |
|
|
| | Risk | Mitigation Strategy | |
| |------|---------------------| |
| | Label imbalance (217 classes) | SMOTE, MLSMOTE, ADASYN oversampling | |
| | Text noise (URLs, HTML, code) | Custom preprocessing pipeline | |
| | Multi-label complexity | MultiOutputClassifier with stratified splits | |
|
|
| --- |
|
|
| ## Milestone 2: Data Management & Experiment Tracking |
|
|
| **Objective:** Establish end-to-end infrastructure for reproducible ML experiments. |
|
|
| ### Data Pipeline |
|
|
| ``` |
| data/raw/ β dataset.py β data/processed/ |
| (SkillScope SQLite) (HuggingFace) (Clean CSV) |
| β |
| features.py |
| β |
| data/processed/ |
| (TF-IDF/Embeddings) |
| ``` |
|
|
| ### Key Components |
|
|
| 1. **Data Management** |
| - DVC setup with DagsHub remote storage |
| - Git-ignored data and model directories |
| - Version-controlled `.dvc` files for reproducibility |
|
|
| 2. **Data Ingestion** |
| - `dataset.py`: Downloads SkillScope from Hugging Face |
| - Extracts SQLite database with cleanup |
|
|
| 3. **Feature Engineering** |
| - `features.py`: Text cleaning pipeline |
| - URL/HTML/Markdown removal |
| - Normalization and Porter stemming |
| - TF-IDF vectorization (uni+bi-grams) |
| - Sentence embedding generation |
|
|
| 4. **Configuration** |
| - `config.py`: Centralized paths, hyperparameters, MLflow URI |
|
|
| 5. **Experiment Tracking** |
| - MLflow with DagsHub remote |
| - Logged metrics: precision, recall, F1-score |
| - Artifact storage: models, vectorizers, scalers |
|
|
| ### Training Actions |
|
|
| | Action | Description | |
| |--------|-------------| |
| | `baseline` | Random Forest with TF-IDF | |
| | `mlsmote` | Multi-label SMOTE oversampling | |
| | `ros` | Random Oversampling | |
| | `adasyn-pca` | ADASYN + PCA dimensionality reduction | |
| | `lightgbm` | LightGBM classifier | |
|
|
| --- |
|
|
| ## Milestone 3: Quality Assurance |
|
|
| **Objective:** Implement comprehensive testing and validation framework for data quality and model robustness. |
|
|
| ### Data Cleaning Pipeline |
|
|
| | Metric | Before | After | Resolution | |
| |--------|--------|-------|------------| |
| | Total Samples | 7,154 | 6,673 | -481 duplicates | |
| | Duplicates | 481 | 0 | Exact match removal | |
| | Label Conflicts | 640 | 0 | Majority voting | |
| | Data Leakage | Present | 0 | Train/test separation | |
|
|
| ### Validation Frameworks |
|
|
| #### Great Expectations (10 Tests) |
|
|
| | Test | Purpose | Status | |
| |------|---------|--------| |
| | Database Schema | Validate SQLite structure | β
Pass | |
| | TF-IDF Matrix | No NaN/Inf, sparsity checks | β
Pass | |
| | Binary Labels | Values in {0,1} | β
Pass | |
| | Feature-Label Alignment | Row count consistency | β
Pass | |
| | Label Distribution | Min 5 occurrences per label | β
Pass | |
| | SMOTE Compatibility | Min 10 non-zero features | β
Pass | |
| | Multi-Output Format | >50% multi-label samples | β
Pass | |
| | Duplicate Detection | No duplicate features | β
Pass | |
| | Train-Test Separation | Zero intersection | β
Pass | |
| | Label Consistency | Same features β same labels | β
Pass | |
|
|
| #### Deepchecks (24 Checks) |
|
|
| - **Data Integrity Suite**: 92% score (12 checks) |
| - **Train-Test Validation Suite**: 100% score (12 checks) |
| - **Overall Status**: Production-ready (96% combined) |
|
|
| #### Behavioral Testing (36 Tests) |
|
|
| | Category | Tests | Description | |
| |----------|-------|-------------| |
| | Invariance | 9 | Typo, case, punctuation robustness | |
| | Directional | 10 | Keyword addition effects | |
| | Minimum Functionality | 17 | Basic skill predictions | |
|
|
| ### Code Quality |
|
|
| - **Ruff Analysis**: 28 minor issues (100% fixable) |
| - **Standards**: PEP 8 compliant, Black compatible |
|
|
| Full details: [testing_and_validation.md](./testing_and_validation.md) |
|
|
| --- |
|
|
| ## Milestone 4: API Development |
|
|
| **Objective:** Implement production-ready REST API for skill prediction with MLflow integration. |
|
|
| ### Endpoints |
|
|
| | Method | Endpoint | Description | |
| |--------|----------|-------------| |
| | `POST` | `/predict` | Single issue prediction | |
| | `POST` | `/predict/batch` | Batch predictions (max 100) | |
| | `GET` | `/predictions/{run_id}` | Retrieve by MLflow Run ID | |
| | `GET` | `/predictions` | List recent predictions | |
| | `GET` | `/health` | Service health check | |
| | `GET` | `/metrics` | Prometheus metrics | |
|
|
| ### Features |
|
|
| - **FastAPI Framework**: Async request handling, auto-generated OpenAPI docs |
| - **MLflow Integration**: All predictions logged with metadata |
| - **Pydantic Validation**: Request/response schema enforcement |
| - **Prometheus Metrics**: Request counters, latency histograms, gauges |
|
|
| ### Documentation Access |
|
|
| - Swagger UI: `/docs` |
| - ReDoc: `/redoc` |
| - OpenAPI JSON: `/openapi.json` |
|
|
| --- |
|
|
| ## Milestone 5: Deployment & Containerization |
|
|
| **Objective:** Implement containerized deployment with CI/CD pipeline for production delivery. |
|
|
| ### Docker Architecture |
|
|
| ``` |
| docker/docker-compose.yml |
| βββ hopcroft-api (FastAPI Backend) |
| β βββ Port: 8080 |
| β βββ Health Check: /health |
| β βββ Volumes: source code, logs |
| β |
| βββ hopcroft-gui (Streamlit Frontend) |
| β βββ Port: 8501 |
| β βββ Depends on: hopcroft-api |
| β |
| βββ hopcroft-net (Bridge Network) |
| ``` |
|
|
| ### Hugging Face Spaces Deployment |
|
|
| | Component | Configuration | |
| |-----------|---------------| |
| | SDK | Docker | |
| | Port | 7860 | |
| | Startup Script | `docker/scripts/start_space.sh` | |
| | Secrets | `DAGSHUB_USERNAME`, `DAGSHUB_TOKEN` | |
|
|
| **Startup Flow:** |
| 1. Configure DVC with secrets |
| 2. Pull models from DagsHub |
| 3. Start FastAPI (port 8000) |
| 4. Start Streamlit (port 8501) |
| 5. Start Nginx reverse proxy (port 7860) |
|
|
| ### CI/CD Pipeline (GitHub Actions) |
|
|
| ```yaml |
| Triggers: push/PR to main, feature/* |
| Jobs: |
| 1. unit-tests |
| - Ruff linting |
| - Pytest unit tests |
| - HTML report generation |
| |
| 2. build-image (requires unit-tests) |
| - DVC model pull |
| - Docker image build |
| ``` |
|
|
| --- |
|
|
| ## Milestone 6: Monitoring & Observability |
|
|
| **Objective:** Implement comprehensive monitoring infrastructure with drift detection. |
|
|
| ### Prometheus Metrics |
|
|
| | Metric | Type | Description | |
| |--------|------|-------------| |
| | `hopcroft_requests_total` | Counter | Total requests by method/endpoint | |
| | `hopcroft_request_duration_seconds` | Histogram | Request latency distribution | |
| | `hopcroft_in_progress_requests` | Gauge | Currently processing requests | |
| | `hopcroft_prediction_processing_seconds` | Summary | Model inference time | |
|
|
| ### Grafana Dashboards |
|
|
| - **Request Rate**: Real-time requests per second |
| - **Request Latency (p50, p95)**: Response time percentiles |
| - **In-Progress Requests**: Currently processing requests |
| - **Error Rate (5xx)**: Failed request percentage |
| - **Model Prediction Time**: Inference latency |
| - **Requests by Endpoint**: Traffic distribution |
|
|
| ### Data Drift Detection |
|
|
| | Component | Details | |
| |-----------|---------| |
| | Algorithm | Kolmogorov-Smirnov Two-Sample Test | |
| | Baseline | 1000 samples from training data | |
| | Threshold | p-value < 0.05 (Bonferroni corrected) | |
| | Metrics | `drift_detected`, `drift_p_value`, `drift_distance` | |
|
|
| ### Alerting Rules |
|
|
| | Alert | Condition | |
| |-------|-----------| |
| | `ServiceDown` | Target unreachable for 5m | |
| | `HighErrorRate` | 5xx rate > 10% for 5m | |
| | `SlowRequests` | P95 latency > 2s | |
|
|
| ### Load Testing (Locust) |
|
|
| | Task | Weight | Endpoint | |
| |------|--------|----------| |
| | Single Prediction | 60% | `POST /predict` | |
| | Batch Prediction | 20% | `POST /predict/batch` | |
| | Monitoring | 20% | `GET /health`, `/predictions` | |
|
|
| ### HF Spaces Monitoring Access |
|
|
| Both Prometheus and Grafana are available on the production deployment: |
|
|
| | Service | URL | |
| |---------|-----| |
| | Prometheus | https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/ | |
| | Grafana | https://dacrow13-hopcroft-skill-classification.hf.space/grafana/ | |
|
|
| ### Uptime Monitoring (Better Stack) |
|
|
| - External monitoring from multiple locations |
| - Email notifications on failures |
| - Tracked endpoints: `/health`, `/openapi.json`, `/docs` |
|
|