At KDnuggets, we try to keep our finger on the pulse of main events and developments in industry, academia, and technology. We also do our best to look forward to key trends on the horizon.

In this post we present predictions from those in industry, which do not follow a prescribed question but which do address what to keep a look out for in different sectors in the upcoming year. Quotes are organized alphabetically by name of the company which they have been submitted on behalf of, and we have reserved the right to edit (extract excerpts from) for both content and length.

As the predictions come from across industry and reach into its many different niche sectors, there is no general consensus or over-arching themes herein, which makes intuitive sense. You will, however, read about the impact of GPUs, the future of the data lake, both NoSQL and SQL, IoT, machine learning, deep learning, and much more.

In 2017, NoSQL’s coming of age will be marked by a shift to workload-focused data strategies, meaning executives will answer questions about their business processes by examining the data workloads, use cases and end results they’re looking for. This mindset is in contrast to prior years when many decisions were driven from the bottom up by a technology-first approach, where executives would initiate projects by asking what types of tools best serve their purposes. This shift has been instigated by data technology, such as NoSQL databases, becoming increasingly accessible.

In 2017, organizations will stop letting data lakes be their proverbial ball and chain. Centralized data stores still have a place in initiatives of the future: How else can you compare current data with historical data to identify trends and patterns? Yet, relying solely on a centralized data strategy will ensure data weighs you down. Rather than a data lake-focused approach, organizations will begin to shift the bulk of their investments to implementing solutions that enable data to be utilized where it’s generated and where business process occur - at the edge. In years to come, this shift will be understood as especially prescient, now that edge analytics and distributed strategies are becoming increasingly important parts of deriving value from data.

In 2017, the reports of Big Data’s death will be greatly exaggerated, as will the hype around IoT and AI. In reality, all of these disciplines focus on datacapture, curation, analysis and modeling. The importance of that suite of activities won’t go away unless all businesses cease operation.

1. Hadoop distribution vendors will have crossed the chasm — unstructured data in Hadoop is a reality. But, since the open source problem has not been addressed, they aren’t making much money. As such, there will be an acquisition of many of these vendors by bigger players. As well as the idea that bigger ISV Hadoop vendors will band together and create larger entities in hopes of capitalizing on the economy of scale.

2. Data preparation will become more of a feature rather than a market as big data analytics continue to evolve both in product offerings and market share. As such, there may be a consolidation in the marketplace as companies start to acquire product offerings in this area as well as customer lists from small, niche vendors.

3. Artificial intelligence, machine learning, and advanced analytics will become more complex as people start to realize the true potential of these disciplines. All three areas require an excellent understanding of big data and big data analytics and they can eventually evolve into a master discipline of Analytics — or maybe we coin new term for it in the near future.

4. By the end of 2017, the idea of deep learning will have matured and true use cases will emerge. For example, Google uses it to look at faces and then determine if the face is happy, sad, etc. There are also existing use cases in which the police is using it to compare the “baseline” facial structure to "real time" facial expressions to determine intoxication, duress or other potentially adverse activities.

Trend #1: Real Change is Coming to Real-time Intelligence in 2017 with GPUs

Graphical Processing Units (GPUs) are capable of delivering up to 100-times better performance than even the most advanced in-memory databases that use CPUs alone. The reason is their massively parallel processing, with some GPUs containing over 4,000 cores, compared to the 16-32 cores typical in today’s most powerful CPUs. The small, efficient cores are also better suited to performing similar, repeated instructions in parallel, making GPUs ideal for accelerating the compute-intensive workloads required for analyzing large streaming data sets in real-time.

Trend #2: The Cloud will get “turbo-charged” performance with GPUs

Amazon has already begun deploying GPUs, and Microsoft and Google have announced plans. These cloud service providers are all deploying GPUs for the same reason: to gain a competitive advantage. Given the dramatic improvements in performance offered by GPUs, other cloud service providers can also be expected to begin deploying GPUs in 2017.

Certain enhancements in security and availability that are expected in 2017 will build on the foundation of the GPU’s proven performance and scalability to make their use enterprise-class. For security, support for user authentication, and role-and group-based authorization will make GPU acceleration suitable for applications that must comply with security regulations, including those requiring personal privacy protections. For availability, data replication with automatic failover capabilities will make GPU-accelerated databases sufficiently reliable for even the most mission-critical of applications.

Per Moore’s law, CPUs are always getting faster and cheaper. Of late, databases have been following the same pattern.

In 2013, Amazon changed the game when they introduced Redshift, a massively parallel processing database that allowed companies to store and analyze all their data for a reasonable price. Since then however, companies who saw products like Redshift as datastores with effectively limitless capacity have hit a wall. They have hundreds of terabytes or even petabytes of data and are stuck between paying more for the speed they had become accustomed to, or waiting five minutes for a query to return.

Enter (or reenter) Moore’s law. Redshift has become the industry standard for cloud MPP databases, and we don’t see that changing anytime soon. With that said, our prediction for 2017 is that on-demand MPP databases like Google BigQuery and Snowflake will see a huge uptick in popularity. On-demand databases charge pennies for storage, allowing companies to store data without worrying about cost. When users want to run queries or pull data, it spins up the hardware it needs and gets the job done in seconds. They’re fast, scalable, and we expect to see a lot of companies using them in 2017.

2) SQL will have another extraordinary year

The innovations we're seeing are blowing our minds. BigQuery has created a product that is essentially infinitely scalable, the original goal of Hadoop, AND practical for analytics, the original goal of relational databases.

SQL engines for Hadoop have continued to gain traction. Products like SparkSQL and Presto are popping up in enterprises and as cloud services because they allow companies to leverage their existing Hadoop clusters and cloud storage for speedy analytics. What’s not to love?

In 2017, I see data science for the masses fueling growth in traditional businesses. The workflows for deep learning, prescriptive analytics, and big data will become much easier and more accessible, making it more cost-effective for businesses to train existing domain experts than hire elusive data scientists. For example, transfer learning will mitigate the need for large training sets and NVidia GPU instances on Amazon EC2 will make it easy for anyone to get started with deep learning in minutes.

A focus for 2017 will also be on intelligently integrating analytics across heterogeneous systems, driven by the need for real-time decision making and sensor data from IoT systems. This means data processing and predictive algorithms will need to work smartly across IT systems, IoT aggregators, hybrid clouds, on-board sensors, and in complex embedded systems.