Category: Hadoop

The University of Illinois’ Coordinated Science Laboratory had it student run conference last month. This video is Dr. Andy Feng – VP Architecture at Yahoo! He leads the architecture and design of big data and machine learning initiatives. “In this talk, we illustrate Yahoo use cases and datasets, and explain the evolution of big-data technology stack.” Watch Video

Video (33:41) with Databricks co-founder and CTO Matei Zaharia presenting the changes in Apache Spark 2.0 and the general availability (GA) of Databricks Community Edition at Spark Summit 2016. Afterwards, Michael Armbrust demos some new features found in Spark 2.0 on Databricks Community EditionGo To Video

Open source software tools have become all the rage, especially around big data and that is a GOOD thing. It allows for many players to work off of the same code base to build more add-on tools and it’s cheap and easy for the masses to get set up and use them. Hadoop, R, Cassandra, Mongo DB, Neo4i and HBase are among the most popular, but there are many more.

I have accumulated 3 lists that are very popular. Please let me know if you see things missing and I’ll attempt to create one large master list and post it on the site. Read More…

Apache Spark’s popularity as part of big data analytics solutions is exploding. Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark promises performance up to 100 times faster than Hadoop MapReduce for certain applications…and that’s why you should care!

Spark’s in-memory cluster computing is very well suited to machine learning algorithms. These Videos will give you a nice introduction to Spark, how it’s being used in business and why you should care…Watch Spark Videos…

DAY 2 | Spark Summit East 2016 took place last month in NYC. Here is the Day 2 Keynotes video. It begins with Reynold Xin – Chief Architect at Databricks Presenting Real-Time and Spark
It is followed by two other, very good presentations titled – ‘”Leveraging Spark, AWS, And Graph Analytics to Better Protect Customers” and “Data Profiling and Pipeline Processing with Spark – A Journey”’ (58min). Enjoy!

Spark Summit East 2016 took place last week in NYC. Here is the Day 1 Keynotes video. It begins with Matei Zaharia – MIT professor, Databricks co-founder and Creator of Spark – discussing the upcoming release of Spark 2.0.

It is followed by four other, very knowledgeable speakers discussing subjects like ‘Democratizing Spark,’ ‘Enterprise Spark’ and ‘Spark as an Analytics OS.’ (1Hr:12min). Enjoy! Watch Video

Booz Allen Hamilton has created a wonderful 114 page FREE ebook (pdf) called “The Field Guild to Data Science”. I recommend you give it a look see for the latest on what is going on in the space. READ MORE