The Future of Spark and Rapid Miner

The focus of this month’s event will be Spark integration as both RapidMiner and Hortonworks have been creating new ways to get more out of your big data sets.

6:45 PM – Networking, Food

7:00 PM – Continuation of Spark Summit East 2017 – If you are running Apache Spark in cloud environments, Object Stores —such as Amazon S3 or Azure WASB— are a core part of your system. What you can’t do is treat them like “just another filesystem” —do that and things will, eventually, go horribly wrong.

This talk looks at where Spark and Amazon S3 integration is going, especially recent work in the Hadoop core related to consistency and zero-rename work commitment, and details of how this relates to Spark

SpeakerBIO– Steve Loughran works at Hortonworks on leading-edge Hadoop applications, most recently in high-performance Amazon’s S3 storage support in Hadoop and Spark, as well as long-lived Yarn Service He’s the author of Ant in Action, a member of the Apache Software Foundation, and a committer on the Hadoop core since 2009.

Speaker BIO – Yuanyuan (YY) Huang is a resident data scientist for RapidMiner. She received her PhD from Iowa University in Biomathematics, Bioinformatics and Computational Biology. She has previously written on the Simulation for Yeast Cooperation in 2D. Currently YY is working on a project for text mining, predictive maintenance, fraud detection, customer prediction, and web analytics.

*The RapidMiner office is located at 10 Milk Street with an alternative entrance at 294 Washington Street, and located next door to the Old South Meeting House building.