Introduction to Data Engineering
Understand the role of a data engineer
Discuss benefits of doing data engineering in the cloud
Discuss challenges of data engineering practice and how building data pipelines in the cloud helps to address these
Review and understand the purpose of a data lake versus a data warehouse, and when to use which
Building a Data Lake
Building a Data Warehouse
Discuss requirements of a modern warehouse
Understand why BigQuery is the scalable data warehousing solution on Google Cloud
Understand core concepts of BigQuery and review options of loading data into BigQuery
Introduction to Building Batch Data Pipelines
Review different methods of loading data into your data lakes and warehouses: EL, ELT, and ETL
Discuss data quality considerations and when to use ETL instead of EL and ELT
Executing Spark on Dataproc
Review the parts of the Hadoop ecosystem
Learn how to lift and shift your existing Hadoop workloads to the cloud using Dataproc
Understand considerations around using Cloud Storage instead of HDFS for storage
Learn how to optimize Dataproc jobs
Serverless Data Processing with Dataflow
Understand how to decide between Dataflow and Dataproc for processing data pipelines
Understand the features that customers value in Dataflow
Discuss core concepts in Dataflow
Review the use of Dataflow templates and SQL
Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
Discuss how to manage your data pipelines with Data Fusion and Cloud Composer
Understand Data Fusion’s visual design capabilities
Learn how Cloud Composer can help to orchestrate the work across multiple Google Cloud services
Introduction to Processing Streaming Data
Describe the Pub/Sub service
Understand how Pub/Sub works
Gain hands-on Pub/Sub experience with a lab that simulates real-time streaming sensor data
Dataflow Streaming Features
Learn how to perform ad hoc analysis on streaming data using BigQuery and dashboards
Understand how Cloud Bigtable is a low-latency solution
Describe how to architect for Bigtable and how to ingest data into Bigtable
Highlight performance considerations for the relevant services
Advanced BigQuery Functionality and Performance
Introduction to Analytics and AI
Understand the proposition that ML adds value to your data
Understand the relationship between ML, AI, and Deep Learning
Identify ML options on Google Cloud
Prebuilt ML Model APIs for Unstructured Data
Big Data Analytics with Notebooks
Production ML Pipelines
Custom Model Building with SQL in BigQuery ML
Custom Model Building with AutoML
This class is intended for developers who are responsible for:
Extracting, loading, transforming, cleaning, and validating data.
Designing pipelines and architectures for data processing.
Integrating analytics and machine learning capabilities into data pipelines.
Querying datasets, visualizing query results, and creating reports.