A Professional Machine Learning Engineer designs, builds, and productionizes ML models to solve business challenges using Google Cloud technologies and knowledge of proven ML models and techniques. The ML Engineer is proficient in all aspects of model architecture, data pipeline interaction, and metrics interpretation and needs familiarity with application development, infrastructure management, data engineering, and security.
The Professional Machine Learning Engineer exam assesses your ability to:
- Frame ML problems
- Architect ML solutions
- Prepare and process data
- Develop ML models
- Automate & orchestrate ML pipelines
- Monitor, optimize, and maintain ML solutions
This repository contains some Resources and Tips that will help you prepare for the Exam
I will follow the same architect exam guide from Google Professional Machine Learning Engineer Certification Exam Guide
-Defining business problems
-Identifying nonML solutions
-Defining output use
-Managing incorrect results
-Identifying data sources
-Defining problem type (classification, regression, clustering, etc.)
-Defining outcome of model predictions
-Defining the input (features) and predicted output format
-Success metrics
-Key results
-Determination of when a model is deemed unsuccessful
-Assessing and communicating business impact
-Assessing ML solution readiness
-Assessing data readiness
-Aligning with Google AI principles and practices (e.g. different biases)
-Optimizing data use and storage
-Data connections
-Automation of data preparation and model training/deployment
-SDLC best practices
-A variety of component types - data collection; data management
-Exploration/analysis
-Feature engineering
-Logging/management
-Automation
-Monitoring
-Serving
-Selection of quotas and compute/accelerators with components
-Building secure ML systems
-Privacy implications of data usage
-Identifying potential regulatory issues
-Ingestion of various file types (e.g. Csv, json, img, parquet or databases, Hadoop/Spark)
-Database migration
-Streaming data (e.g. from IoT devices)
-Visualization
-Statistical fundamentals at scale
-Evaluation of data quality and feasibility
-Batching and streaming data pipelines at scale
-Data privacy and compliance
-Monitoring/changing deployed pipelines
-Data validation
-Handling missing data
-Handling outliers
-Managing large samples (TFRecords)
-Transformations (TensorFlow Transform)
-Data leakage and augmentation
-Encoding structured data types
-Feature selection
-Class imbalance
-Feature crosses
-Choice of framework and model
-Modeling techniques given interpretability requirements
-Transfer learning
-Model generalization
-Overfitting
-Productionizing
-Training a model as a job in different environments
-Tracking metrics during training
-Retraining/redeployment evaluation
-Unit tests for model training and serving
-Model performance against baselines, simpler models, and across the time dimension
-Model explainability on Cloud AI Platform
-Distributed training
-Hardware accelerators
-Scalable model analysis (e.g. Cloud Storage output files, Dataflow, BigQuery, Google Data Studio)
-Identification of components, parameters, triggers, and compute needs
-Orchestration framework
-Hybrid or multi-cloud strategies
-Decoupling components with Cloud Build
-Constructing and testing of parameterized pipeline definition in SDK
-Tuning compute performance
-Performing data validation
-Storing data and generated artifacts
-Model binary options
-Google Cloud serving options
-Testing for target performance
-Setup of trigger and pipeline schedule
-Organization and tracking experiments and pipeline runs
-Hooking into model and dataset versioning
-Model/dataset lineage
-Hooking modes into existing CI/CD deployment system
-AB and Canary testing
-Performance and business quality of ML model predictions
-Logging strategies
-Establishing continuous evaluation metrics
-Permission issues (IAM)
-Common training and serving errors (TensorFlow)
-ML system failure and biases
-Optimization and simplification of input pipeline for training
-Simplification techniques
-Identification of appropriate retraining policy