Google Cloud

Data Engineering on Google Cloud

Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data.

28 hours · Virtual
28 hours
Virtual

Design and build data processing systems on Google Cloud. Process batch and streaming data by implementing autoscaling data pipelines on Dataflow. Derive business insights from extremely large datasets using BigQuery. Leverage unstructured data using Spark and ML APIs on Dataproc. Enable instant insights from streaming data. Understand ML APIs and BigQuery ML, and learn to use AutoML to create powerful models without coding.

Module 1: Introduction to Data EngineeringExplore the role of a data engineer.Analyze data engineering challenges.Introduction to BigQuery.Compare data lakes and data warehouses.Manage data access and governance.Build production-ready pipelines.Module 2: Building a Data LakeIntroduction to data lakes.Storage and ETL options in Google Cloud.Building a data lake with Cloud Storage.Cloud Storage security.Using Cloud SQL as a relational data lake.Module 3: Building a Data WarehouseIntroduction to modern data warehouses.BigQuery fundamentals and data loading.Optimization with partitioning and clustering.Module 4: Building Batch Data PipelinesDifferences between EL, ELT, and ETL.Data quality considerations.Data loading methods for data lakes and data warehouses.Module 5: Running Spark on DataprocHadoop ecosystem overview.Migrating Hadoop workloads to Dataproc.Using Cloud Storage instead of HDFS.Optimizing Dataproc jobs.Module 6: Serverless Data Processing with DataflowIntroduction to Dataflow.Building Dataflow pipelines.Using Dataflow templates and SQL.Module 7: Managing Pipelines with Cloud Data Fusion and Cloud ComposerVisually building pipelines with Data Fusion.Orchestrating workflows with Cloud Composer.Module 8: Introduction to Streaming Data ProcessingExplanation and challenges of streaming data processing.Google Cloud tools to address these challenges.Module 9: Serverless Messaging with Pub/SubIntroduction to Pub/Sub.Publishing and subscribing to Pub/Sub.Simulating real-time sensor data.Module 10: Streaming Features in DataflowProcessing streaming data with Dataflow.Handling late data with watermarks, triggers, and accumulation.Module 11: Streaming in BigQuery and BigtableStreaming ingestion and analytics in BigQuery.Using Bigtable for low-latency storage.Module 12: Advanced BigQuery FeaturesUsing advanced analytical functions.Optimizing query performance.Module 13: Introduction to Analytics and Artificial IntelligenceAI and ML concepts.Options for ML models in Google Cloud.Module 14: ML APIs for Unstructured DataChallenges of unstructured data.Using ML APIs to enrich data.Module 15: Big Data Analytics with NotebooksUsing notebooks for ML prototyping.Running BigQuery commands from notebooks.Module 16: Production ML PipelinesOptions for building custom ML models.Using tools like Vertex AI and AI Hub.Module 17: Building Models with SQL in BigQuery MLCreating ML models with SQL in BigQuery.Building regression and recommendation models.Module 18: Building Models with AutoMLIntroduction to AutoML and its applications.Using AutoML Vision, NLP, and Tables.

This class is intended for developers who are responsible for:Extracting, loading, transforming, cleaning, and validating data.Designing pipelines and architectures for data processing.Integrating analytics and machine learning capabilities into data pipelines.Querying datasets, visualizing query results, and creating reports.Prerequisite:Basic proficiency with a common query language such as SQL. Experience with data modeling and ETL (extract, transform, load) activities. Experience with developing applications using a common programming language such as Python. Familiarity with machine learning and/or statistics.

Upcoming Sessions

Contact us for upcoming dates

There are currently no upcoming sessions scheduled for this course.

Request Information