Talend Hands On

Building an Agile Data Lake on Cloudera HDFS/HIVE using Talend & Spark

In this ‘Hands-On’ session we will open the doors to the Talend Studio and walk you through building an Agile Data Lake on the Talend Big Data Platform with Cloudera Hadoop.  You will create and execute your job to generate, extract, and push data to HDFS.  From there we will walk through the jobs that load all the data into a Raw Data Vault and then de-normalize the data into a Business Vault.

We will also take a brief stroll covering some important new features being developed at Talend for Data Streams, Data Catalog, and the new Component SDK.

Resources to view before Attending

Please take the time to register and view the recent Expert Session: An Introduction to the Agile Data Lake

Additional reading you may find interesting:

Requirements for your Laptop:

  • You will need a laptop with chrome browser installed
  • You will need MySQL Workbench 6.3.10 installed
  • You will need a WiFi connection
  • You will need 30Gb free disk space
  • You will need to download and install Talend Studio Installation
    • Link will be provided soon (eta: May 7th)
    • This will maximize your limited eval license

What you will learn:

  • Talend’s Next Generation Architecture
  • Introduction of new Talend Features
    • Data Streams
    • Data Catalog
    • Component SDK
  • How to Architect an Agile Data Lake
  • The Talend/Cloudera Cloud Solution Architecture
  • How to open and use a Talend Project and objects within
  • How to construct and execute a Talend job to populate the PSA
  • How to streamline PSA to Raw Data Vault loads
  • How to create de-normalized data in the Business Vault using Spark

Goals of the session:

  • Learn about Talend Architecture used in an Agile Data Lake
  • Open & access the Talend Studio Agile Data Lake Project
  • Open, modify, & execute Talend Jobs
  • Use Cloudera Manager & Hue to inspect the RDV & BV

Hands-On
Location: Pinnacle Room Date: May 14, 2018 Time: 9:00 am - 12:00 pm Dale Anderson