- Role: SDEâ3 building and operating productionâgrade cloudânative platforms on OpenShift (OCP) and Rancher.
- GenAI/LLM: Delivered LLMâpowered applications to production (RAG, observability, guardrails, cost/latency tuning).
- Stack: Python (Django/Flask), Docker, Kubernetes, ML (TensorFlow/PyTorch), AWS.
- Focus: Reliability, performance, and developer experience at scale.
- CloudâNative
- Cluster and app lifecycle on OCP/Rancher, GitOps, secure CI/CD, and observability.
- Containerized microservices with clear SLOs and production readiness.
- LLM in Production
- Building retrievalâaugmented systems, prompt and embedding pipelines, and monitoring.
- Ensuring safety, evaluation, and rollback strategies for GenAI features.
- cloakprompt-cli: CLI to redact secrets before sending data to AI models (LLM safety, prompt security).
- tf-inference-devops: TensorFlow image classification inference pipeline with Flask + Docker.
- scb-deployments: Helm deployment manifests for cloudânative workloads.
- Diabetic_retinopathy: Flask + ML app to detect diabetic retinopathy (â starred).
- Real-Fake-Image-Classification: CIFAKE datasetâclassify real vs AIâgenerated images.
More here: All repositories
- Email: [email protected]
- LinkedIn: https://www.linkedin.com/in/kushagratandon124/




