This notebook provides the final, demo-ready stage of the StreamSense pipeline — showcasing how a trained Machine Learning model can predict whether a Netflix title is likely to become a hit, based purely on metadata.
It also includes supporting visual insights, feature importance analysis, and reusable Delta tables for dashboarding.
Demonstrate the StreamSense Random Forest model through:
- Interactive What-If predictions
- Hit-rate analytics (category, rating, release year)
- Feature importance visualisation
- Model loading + inference pipeline using MLflow
This notebook represents the final part of the project workflow:
- Data ingestion
- Feature engineering
- Model training & MLflow tracking
- Hit predictor demo ← this notebook
Loads netflix_clean (Delta table) created earlier in the pipeline and previews schema + sample rows.
Retrieves the most recent run from the StreamSense experiment and loads the stored model via:
runs:/<latest_run_id>/model
Ensures reproducibility and traceability.
Allows users to test hypothetical titles by specifying:
- Category
- Rating
- Release year
- Duration
- Movie vs TV
- Country
Returns:
- Predicted hit probability
- Predicted class (HIT / NON-HIT)
Perfect for demos, dashboards, or UI integration.
Breaks down hit-rate patterns across:
- Category (Movie vs TV Show)
- Rating (e.g. TV-MA, PG, TV-Y7…)
- Release year (hit rates over time)
Images:
Extracts encoded categorical features from the model's preprocessing pipeline and ranks all signals by their contribution to prediction.
Includes three demo-ready examples:
- A modern, mature-rated movie
- An older children’s TV show
- A recent family film
These illustrate how metadata changes alter prediction confidence.
Stores the following as Delta tables for downstream visualisation:
streamsense_hit_by_categorystreamsense_hit_by_ratingstreamsense_hit_by_year
These feed perfectly into:
- Databricks SQL dashboards
- Power BI
- Streamlit
This notebook delivers:
- Fully working What-If predictor
- All key hit-rate analytics (with PNG exports)
- End-to-end MLflow model retrieval
- Feature importance ranking
- Clean dashboard-ready aggregates
- Build a Streamlit UI around the What-If predictor
- Replace the heuristic
is_hitlabel with one derived from IMDb or TMDb - Enrich model inputs using text embeddings (descriptions, cast, director)
- Add confidence intervals, SHAP explainability, or scenario comparison tools



