
Data Integration with Cloud Data Fusion
This 2-day course introduces learners to Google Cloud’s data integration capability using Cloud Data Fusion. In this course, we discuss challenges with data integration and the need for a data integration platform (middleware). We then discuss how Cloud Data Fusion can help to effectively integrate data from a variety of sources and formats and generate insights. We take a look at Cloud Data Fusion’s main components and how they work, how to process batch data and real time streaming data with visual pipeline design, rich tracking of metadata and data lineage, and how to deploy data pipelines on various execution engines.
Identify the need of data integrationUnderstand the capabilities Cloud Data Fusion provides as a data integration platformIdentify use cases for possible implementation with Cloud Data FusionList the core components of Cloud Data FusionDesign and execute batch and real time data processing pipelinesWork with Wrangler to build data transformationsUse connectors to integrate data from various sources and formatsConfigure execution environment; Monitor and Troubleshoot pipeline executionUnderstand the relationship between metadata and data lineage
Module 1: Introduction to data integration and Cloud Data FusionUnderstand the need for data integrationList the situations/cases where data integration can help businessesList the available data integration platforms and toolsIdentify the challenges with data integration Understand the use of Cloud Data Fusion as a data integration platform Create a Cloud Data Fusion instanceFamiliarize with core framework and major components in Cloud Data FusionModule 2: Building pipelinesUnderstand Cloud Data Fusion architecture Define what a data pipeline is Understand the DAG representation of a data pipelineLearn to use Pipeline Studio and its componentsDesign a simple pipeline using Pipeline StudioDeploy and execute a pipeline Module 3: Designing complex pipelinesPerform branching, merging, and join operationsExecute pipeline with runtime arguments using macrosWork with error handlersExecute pre- and post-pipeline executions with help of actions and notificationsSchedule pipelines for executionImport and export existing pipelines.Module 4: Pipeline execution environmentUnderstand the composition of an execution environmentConfigure your pipeline’s execution environment, logging, and metrics. Understand concepts like compute profile and provisioner. Create a compute profile. Create pipeline alerts. Monitor the pipeline under execution.Module 5: Building Transformations and Preparing Data with WranglerUnderstand the use of Wrangler and its main components. Transform data using Wrangler UI. Transform data using directives/CLI methods. Create and use user-defined directives.Module 6: Connectors and streaming pipelinesConnectors DLP Reference architecture for streaming applications Building streaming pipelinesModule 7: Metadata and data lineageList types of metadata. Differentiate between business, technical, and operational metadata. Understand what data lineage is. Understand the importance of maintaining data lineage. Differentiate between metadata and data lineage.
This course is primarily intended for the following participants: Data Engineer Data Analysts
Upcoming Sessions
Contact us for upcoming dates
There are currently no upcoming sessions scheduled for this course.
Request Information


