
ARCH-492: Architecting Cloudera Edge to AI
Architecting Cloudera from Edge to AI is a 4-day learning event that addresses advanced big data architecture topics for building edge to AI applications to cover streaming, operational data processing, analytics, and machine learning. The workshop brings together technical contributors into a group setting to design and architect solutions to a challenging business problem. The workshop addresses big data architecture problems in general, and then applies them to the design of a challenging system.
Throughout the highly interactive workshop, participants apply concepts to real-world examples resulting in detailed synergistic discussions. The workshop is conducive for participants to learn techniques for architecting big data systems, not only from Cloudera's experience but also from the experiences of fellow participants.More specifically, this workshop addresses advanced big data architecture topics, including, data formats, transformation, transactions, real-time, batch and machine learning processing, scalability, fault tolerance, security, and privacy, minimizing the risk of an unsound architecture and technology selection.
IntroductionTeam activity: Team IntroductionsTechnology ReviewData HubsImportant Architecture trendsData LifecycleData Flow & StreamingSpark Streaming, Flink, Kafka Streams/ConnectComparing Streaming solutionsData EngineeringSparkHDFS, OzoneYARN, Kubernetes, YunikornData WarehouseHive, Impala, DataVizReal Time Data warehouse architecturesComparing Databases and storage enginesOperational DatabaseHbase, Phoenix, Kudu, SolrCloudera AIMachine LearningCloudera ObservabilityReplication ManagerWorkshop Application Use CasesOz MetropolitanArchitectural questionsTeam activity: Review Metroz Use Cases and Logical ArchitectureApplication Vertical SliceDefinitionMinimizing risk of an unsound architectureSelecting a vertical sliceTeam activity: Metroz Vertical SliceApplication ProcessingReal time, near real time processingBatch processingData access patternsDelivery and processing guaranteesData consistency and ACID transactionsStream processing guaranteesMachine Learning pipelinesTeam activity: Metroz ProcessingApplication DataThree V’s of Big DataData LifecycleData FormatsTransforming DataTeam activity: Metroz Data RequirementsScalable ApplicationsScale up, scale out, scale to XDetermining if an application will scalePoll: scalable airport terminal designsSpark scalability and parallel processingScalable storage engines: HDFS, Ozone, Kafka and KuduTeam activity: Scaling MetrozFault-Tolerant Distributed SystemsPrinciplesTransparencyHardware vs. Software redundancyTolerating disastersStateless functional fault toleranceStateful fault toleranceReplication and group consistencyApplication tolerance for failuresTeam activity: Failures in MetrozSecurity and PrivacyPrinciplesSecurity ArchitectureKnox Security ArchitectureRanger Security ArchitectureSetting security policies with RangerThreat AnalysisTeam activity: Securing MetrozDeploymentCluster sizing and evolutionOn-premise vs. CloudEdge computingCloudera on Cloud ArchitectureIntroduction to containers and kubernetesTeam activity: Deploying MetrozSoftware ArchitectureArchitecture artifactsTeam activity: Metroz Physical ArchitectureMachine Learning and AIIntroduction to ML and AI in Big Data ApplicationsArchitect’s Role in ML and AI-Driven ProjectsHigh-Level View of Machine Learning (ML) and Artificial Intelligence (AI)Big Data and ML/AI in Public Cloud vs. Private CloudCommon Challenges in ML/AI ArchitecturesBest Practices for Architecting ML and AI in Big DataEmerging TrendsAI StudiosLearn about AI StudiosExplain core features of RAG StudioExplain core features of Agent StudioBuild and Deploy Context-Aware ChatbotsAI Agent toolsPotential Cloudera SolutionsReview of Uber and Lyft Big data platformsReview of Metroz CDP solution architecturesWrap Up
Participants should mainly be architects, developer team leads, big data developers, data engineers, senior analysts, dev ops admins and machine learning developers who are working on big data or streaming applications and have an interest in how to design and develop such applications on Cloudera. To gain the most from the workshop, participants should have working knowledge of popular Big Data and streaming technologies such as HDFS, Spark, Kafka, Hive/Impala, Data Formats, and relational database management systems. Detailed API level knowledge is not needed, as there will not be any programming activities and instead the focus will be on architecture design.The workshop will be divided into small groups to discuss the problems, develop solutions, and present their solutions.



