Cloudera

DANA-262: Analyzing with Cloudera Data Warehouse

This Analyzing with Data Warehouse course will teach you to apply traditional data analytics and business intelligence skills to big data. This course presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.

28 hours · Virtual
28 hours
Virtual

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the ecosystem, learning how to:Use Apache Hive and Apache Impala to access data through queriesIdentify distinctions between Hive and Impala, such as differences in syntax, data formats, and supported featuresWrite and execute queries that use functions, aggregate functions, and subqueriesUse joins and unions to combine datasetsCreate, modify, and delete tables, views, and databasesLoad data into tables and store query resultsSelect file formats and develop partitioning schemes for better performanceUse analytic and windowing functions to gain insight into their dataStore and query complex or nested data structuresProcess and analyze semi-structured and unstructured dataOptimize and extend the capabilities of Hive and ImpalaDetermine whether Hive, Impala, an RDBMS, or a mix of these is the best choice for a given taskUtilize the benefits of CDP Public Cloud Data Warehouse

Foundations for Big Data AnalyticsBig Data Analytics OverviewData Storage: HDFSDistributed Data Processing: YARN,MapReduce, and SparkData Processing and Analysis: Hive and ImpalaDatabase Integration: SqoopOther Data ToolsExercise Scenario ExplanationIntroduction to Hive and ImpalaWhat Is Hive?What Is Impala?Why Use Hive and Impala?Schema and Data StorageComparing Hive to Traditional DatabasesUse CasesQuerying with Hive and ImpalaDatabases and TablesBasic Hive and Impala Query Language SyntaxData TypesUsing Hue to Execute QueriesUsing Beeline (Hive’s Shell)Using the Impala ShellCommon Operators and Built-in functionsOperatorsScalar FunctionsAggregate FunctionsData ManagementSimplifying Queries with ViewsStoring Query ResultsData Storage and PerformancePartitioning TablesLoading Data into Partitioned TablesWhen to Use PartitioningChoosing a File FormatUsing Avro and Parquet File FormatsWorking with Multiple DatasetsUNION and JoinsHandling NULL Values in JoinsAdvanced JoinsAnalytic Functions and WindowingUsing Common Analytic FunctionsOther Analytic FunctionsSliding WindowsComplex DataComplex Data with HiveComplex Data with ImpalaAnalyzing TextUsing Regular Expressions with Hive and ImpalaProcessing Text Data with SerDes in HiveSentiment Analysis and n-grams in HiveApache Hive OptimizationUnderstanding Query PerformanceCost-Based Optimization and statisticsBucketingORC File OptimizationsApache Impala OptimizationHow Impala Executes QueriesImproving Impala PerformanceExtending Hive and ImpalaUser-Defined FunctionsParameterized QueriesChoosing the Best Tool for the JobComparing MapReduce, Hive, Impala and Relational DatabasesWhich to Choose?CDP Public Cloud Data WarehouseData Warehouse OverviewAuto-ScalingManaging Virtual WarehousesQuerying Data Using CLI and Third-Party IntegrationAppendix: Apache KuduWhat Is Kudu?Kudu TablesUsing Impala with Kudu

This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Some knowledge of SQL is assumed, as is basic Linux command-line familiarity.

Upcoming Sessions

09/05/2026 – 15/05/2026
English · Online
€2,970.00