In my previous blog, “Talend & Apache Spark: A Technical Primer” I walked you through how Talend Spark jobs equate to Spark Submit. In this blog post, I want to continue evaluating Talend Spark configurations with Apache Spark Submit. First, we are going to look at how you can map the options in the Apache Spark Configuration tab in the Talend Spark Job, to what you can pass as a...

A few years ago, Starbucks’ director of analytics and business intelligence, Joe LaCugna, said the Seattle coffee giant once struggled to make sense of the data pouring in from its loyalty card holders, which at the time was over 13 million and comprise 36 percent of all Starbucks’ transactions. The same was true of the coffee conglomerate’s social media data—they have mountains of it, but still can’t quite figure out what to do with it, according to Mr. LaCug...

Until a few months ago, it was thought that the issue of net neutrality had been definitively settled by the ruling of the Federal Communications Commission (FCC) in 2015; however, that all changed with the new Trump administration and statements by the new FCC president -...

Many businesses today are scrutinizing their operations to figure out how to join the digital transformation revolution. They understand that to become more competitive and customer-centric, they need processes that are flexible, integrated, insightful and scalable. They understand harnessing data and infusing business processes with it is the key to success. Unfortunately, poor data practices,...

By now, almost every organisation is aware of how GDPR will change individuals’ rights to own their own data when it comes into effect on 25 May 2018. One key principle GDPR establishes is the principle of accountability. It is up to the organization that needs personal data for purpose, referred to as the data controller under GDPR, to ensure enforcement of the privacy principles not only within its walls but also across suppliers with whom it might shar...

When it comes to solutions for the big data sector, there is a clear split between the legacy and next- generation approaches to software development. Legacy vendors in this space generally have their own large internal development organizations, dedicated to building proprietary, bespoke software. It’s an approach that has worked well over the years. However, the big data market has always moved at lightning speed, and it’s had a strong element of open source fro...

Today, almost everyone has big data, machine learning and cloud at the top of their IT “to-do” list. The importance of these technologies can’t be overemphasized as all three are opening up innovation, uncovering opportunities and optimizing businesses. Machine learning isn’t a brand new concept, simple machine learning algorithms actually date ...

We all know that enterprise data needs change constantly, and recently that change has come at an increasing pace. Companies that were once processing all their big data on-prem have suddenly moved into the cloud. Frameworks we used to know and love suddenly become obsolete. However, an interesting debate that still rages on is how to get data processed faster. There are generally two heralded ways of processing data today: Batch Processing Stream Processing...

DevOps is a set of practices that automates the processes between software development and IT teams so they can build, test, and release software more quickly and reliably. The concept of DevOps is founded on building a culture of collaboration between IT and business teams, which have historically functioned in relative siloes. The promised benefits include increased trust, faster software releases, and the ability to solve critical issues quickly. That said, implement...