The purpose of this presentation is to examine the DV 2.0 architecture in light of big data architectures , including the Lamda Architecture for data processing and other streaming technologies used in Big Data. There will be an examination of the definition of what streaming is, and how it related to data warehousing. A discussion of the motivations for using streaming to populate a data warehouse in general and a data vault specifically will be introduced. In this presentation, the audience will learn:
* the various big data approaches and tools to streaming data, or near real-time data processing.
* challenges associated with managing distributed steam environments
* pros and cons different architectures in use today (Lamda vs. Kappa)
* how CDC fits into stream processing
* what challenges and opportunities arise for hooking up streaming platforms to load data vault structure
* one possible use-case via and example implementation of data vault using Kafka, Kafka Connect, Streaming and SQL access
Audience should be familiar with data vault concepts and have a general understanding of big data implementations on the hadoop platform.