We’ve added another new session to our upcoming WWDVC conference! This pre-recorded session from Scott Ambler is a “must hear” to prepare you for his Monday business presentation and his follow on conversation that will be delivered during the technical discussions. Here’s a sneak peek at what Scott will be talking to us about:
Techniques for Improving Data Quality: The Key to Machine Learning
One of the fundamental challenges for machine learning (ML) teams is data quality, or more accurately the lack of data quality. Your ML solution is only as good as the data that you train it on, and therein lies the rub: Is your data of sufficient quality to train a trustworthy system? If not, can you improve your data so that it is? You need a collection of data quality “best practices”, but what is “best” depends on the context of the problem that you face. Which of the myriad of strategies are the best ones for you?
This presentation compares over a dozen traditional and agile data quality techniques on five factors: timeliness of action, level of automation, directness, timeliness of benefit, and difficulty to implement. The data quality techniques explored include: data cleansing, automated regression testing, data guidance, synthetic training data, database refactoring, data stewards, manual regression testing, data transformation, data masking, data labeling, and more. When you understand what data quality techniques are available to you, and understand the context in which they’re applicable, you will be able to identify the collection of data quality techniques that are best for you.
You WON’T Want to miss this years’ conference. If you haven’t registered yet, please do so today – just click the link below.
See you at #WWDVC – the best and ONLY authoritative #datavault conference in the world.