Module 1: Introduction to Data Engineering
Explore the role of a data engineer.
Analyze data engineering challenges.
Introduction to BigQuery.
Compare data lakes and data warehouses.
Manage data access and governance.
Build production-ready pipelines.
Module 2: Building a Data Lake
Introduction to data lakes.
Storage and ETL options in Google Cloud.
Building a data lake with Cloud Storage.
Cloud Storage security.
Using Cloud SQL as a relational data lake.
Module 3: Building a Data Warehouse
Introduction to modern data warehouses.
BigQuery fundamentals and data loading.
Optimization with partitioning and clustering.
Module 4: Building Batch Data Pipelines
Differences between EL, ELT, and ETL.
Data quality considerations.
Data loading methods for data lakes and data warehouses.
Module 5: Running Spark on Dataproc
Hadoop ecosystem overview.
Migrating Hadoop workloads to Dataproc.
Using Cloud Storage instead of HDFS.
Optimizing Dataproc jobs.
Module 6: Serverless Data Processing with Dataflow
Introduction to Dataflow.
Building Dataflow pipelines.
Using Dataflow templates and SQL.
Module 7: Managing Pipelines with Cloud Data Fusion and Cloud Composer
Module 8: Introduction to Streaming Data Processing
Module 9: Serverless Messaging with Pub/Sub
Module 10: Streaming Features in Dataflow
Processing streaming data with Dataflow.
Handling late data with watermarks, triggers, and accumulation.
Module 11: Streaming in BigQuery and Bigtable
Module 12: Advanced BigQuery Features
Module 13: Introduction to Analytics and Artificial Intelligence
Module 14: ML APIs for Unstructured Data
Module 15: Big Data Analytics with Notebooks
Module 16: Production ML Pipelines
Module 17: Building Models with SQL in BigQuery ML
Module 18: Building Models with AutoML
Introduction to AutoML and its applications.
Using AutoML Vision, NLP, and Tables.
This class is intended for developers who are responsible for:
Extracting, loading, transforming, cleaning, and validating data.
Designing pipelines and architectures for data processing.
Integrating analytics and machine learning capabilities into data pipelines.
Querying datasets, visualizing query results, and creating reports.
Prerequisite:
Basic proficiency with a common query language such as SQL.
Experience with data modeling and ETL (extract, transform, load) activities.
Experience with developing applications using a common programming language such as Python.
Familiarity with machine learning and/or statistics.