For a system to be “open for business,” system administrators must be able to efficiently manage and operate it. That requires a comprehensive dataflow and operations strategy. This track provides best practices for deploying and operating data lakes, streaming systems, and the extended Apache data ecosystem on premises and in the cloud. Sessions cover the full deployment lifecycle including installation, configuration, initial production deployment, upgrading, patching, loading, moving, backup, and recovery.

You’ll discover how to get started and how to operate your cluster. Speakers will show how to set up and manage high-availability configurations and how DevOps practices can help speed solutions into production. They’ll explain how to manage data across the edge, the data center, and the cloud. And they’ll offer cutting-edge best practices for large-scale deployments.

This 2 day course is designed for ‘Data Stewards’ or ‘Data Flow Managers’ who are looking forward to
automate the flow of data between systems.

TARGET AUDIENCE

Data Engineers, Integration Engineers and Architects who are looking forward to automate Data flow between systems.

PREREQUISITES

It is recommended that participants have some experience with Linux and a basic understanding of DataFlow tools. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.

Students must have experience with at least one programming such as Python, or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles.

This 2 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark. Topics include: Essential understanding of HDP & its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core, Spark SQL, Apache Zeppelin, and additional Spark features

TARGET AUDIENCE

Developers and data engineers who need to understand and develop applications on HDP

PREREQUISITES

Students should be familiar with programming principles and have experience in software development. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.

This 2 day course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. The focus will be on utilizing the Spark API from Python.

TARGET AUDIENCE

Developers, Architects, and Admins who would like to learn more about developing data applications in Spark, how it will affect their environment, and ways to optimize application.

PREREQUISITES

No previous Hadoop knowledge is required, though will be useful. Basic knowledge of Python or Scala is required. Previous exposure to SQL is helpful, but not required. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.

This 1 day course details the business value for, and provides a technical overview of, Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course serves as an optional primer for those who plan to attend a hands-on, instructor-led course.

No previous Hadoop or programming knowledge is required. Students are encouraged to bring their wi-fi enabled laptop pre-loaded with the Hortonworks Sandbox should they want to duplicate demonstrations on their own machine.

This 2 day course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. The focus will be on utilizing the Spark API from Scala.

TARGET AUDIENCE

Developers, Architects, and Admins who would like to learn more about developing data applications in Spark, how it will affect their environment, and ways to optimize application.

PREREQUISITES

No previous Hadoop knowledge is required, though will be useful. Basic knowledge of Python or Scala is required. Previous exposure to SQL is helpful, but not required. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.

This 2 day course is designed for administrators who will be managing the Hortonworks Data Platform (HDP) 2.5 with Ambari. It Covers installation, configuration, and other typical cluster management tasks.

TARGET AUDIENCE

IT administrators and operators responsible for installing, configuring, and supporting an HDP 2.5 deployment in a Linux environment using Ambari.

PREREQUISITES

No previous Hadoop knowledge is required, though will be useful. Attendees should be familiar with data center operations and Linux system administration. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.

This 2 day course is designed for system administrators and operators who need to manage secure HDP clusters. They will learn how to implement Kerberos, Apache Ranger, Apache Ambari, Apache Knox, SPNEGO, and other security concepts and tools to secure HDP clusters.

TARGET AUDIENCE

Systems administrators, operators, and security engineers that need to understand how to implement HDP security.

This 2 day course is designed for developers who need to create real-time applications to ingest and process streaming data sources using Hortonworks Data Platform (HDP). Specific technologies covered includes: Apache Hadoop, Apache Kafka, Apache Storm & Trident, Apache Spark and Apache HBase. The highlight of the course is the custom workshop-styled labs that will allow participants to build complete streaming applications with Storm and Spark Streaming.

TARGET AUDIENCE

Developers and data engineers who need to understand and develop real-time and streaming applications on HDP.

PREREQUISITES

Students should be familiar with programming principles and have experience in software development. Java programming experience is required. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.

Sponsors

Hortonworks is a leading innovator at creating, distributing and supporting enterprise‐ready open data platforms. Our mission is to manage the world’s data. We have a single‐minded focus on driving innovation in open source communities such as Apache Hadoop, NiFi, and Spark. Our open Connected Data Platforms power Modern Data Applications that deliver actionable intelligence from all data: data‐in‐motion and data‐at‐rest. Along with our 1600+ partners, we provide the expertise, training and services that allows our customers to unlock the transformational value of data across any line of business. We are Powering the Future of Data.

Yahoo is a guide focused on informing, connecting, and entertaining our users. By creating highly personalized experiences for our users, we keep people connected to what matters most to them, across devices and around the world. In turn, we create value for advertisers by connecting them with the audiences that build their businesses. Yahoo is headquartered in Sunnyvale, California, and has offices located throughout the Americas, Asia Pacific (APAC) and the Europe, Middle East and Africa (EMEA) regions.

Microsoft believes anyone should be able to get insights from Big Data. So, we bring the power of the cloud to Big Data making it easier than ever to work with all data types. With Microsoft data solutions, everyone can bring Big Data business insights to life through advanced analytics and stunning visualizations – all powered by our enterprise-grade, flexible, and open cloud.

Hewlett Packard Enterprise is an industry leading technology company that enables customers to go further, faster. With the industry’s most comprehensive portfolio, spanning the cloud to the data center to workplace applications, our technology and services help customers around the world make IT more efficient, more productive and more secure.

IBM is a globally integrated technology and consulting company headquartered in Armonk, New York. With operations in more than 170 countries, IBM attracts and retains some of the world’s most talented people to help solve technology problems and provide an edge for businesses, governments and non-profits. Innovation is at the core of IBM’s strategy. The company has reinvented itself through multiple technology eras and economic cycles, creating differentiating value for its clients. Today, as the IT industry is fundamentally changing at an unprecedented pace, IBM is much more than a “hardware, software, services” company. IBM is now emerging as a cognitive solutions and cloud platform company. Cognitive solutions powered by analytics and the cloud are the key to clients’ digital transformation. This transformation requires breakthroughs at every level of the enterprise IT foundation, from processors and computer design to storage, applications and analytics tools, networking and the integration layer. IBM solutions are built with open technologies and designed for mission-critical applications, offering a comprehensive platform for cognitive workloads.

The Oracle Cloud delivers hundreds of SaaS applications and enterprise-class PaaS and IaaS services to customers in more than 195 countries and territories while processing 55 billion transactions a day.

Dell EMC, a part of Dell Inc., enables organizations to modernize, automate and transform their data center using industry-leading converged infrastructure, servers, storage and data protection technologies. This provides a trusted foundation for businesses to transform IT, through the creation of a hybrid cloud, and transform their business through the creation of cloud-native applications and big data solutions. Dell EMC services its customers – including 98 percent of the Fortune 500 – with the industry’s broadest, most innovative infrastructure portfolio from edge to core to cloud.

Teradata empowers companies to achieve high-impact business outcomes through analytics. With a powerful combination of Industry expertise and leading hybrid cloud technologies for data warehousing and big data analytics, Teradata unleashes the potential of great companies. Partnering with top companies around the world, Teradata helps improve customer experience, mitigate risk, drive product innovation, achieve operational excellence, transform finance, and optimize assets. Teradata is recognized by media and industry analysts as a future-focused company for its technological excellence, sustainability, ethics, and business value.

Pentaho, a Hitachi Group Company, is a leading data integration and business analytics company with an enterprise-class, open source-based platform for diverse big data deployments. Pentaho’s unified data integration and analytics platform is comprehensive, completely embeddable and delivers governed data to power any analytics in any environment. Pentaho has over 15,000 product deployments and over 1,500 commercial customers including ABN-AMRO Clearing, BT, Caterpillar Marine Asset Intelligence, EMC, Halliburton, and NASDAQ.

BMC is a global leader in innovative software solutions that enable businesses to transform into digital enterprises for the ultimate competitive advantage. Our Digital Enterprise Management solutions are designed to make digital business fast, seamless, and optimized from mainframe to mobile to cloud and beyond. BMC – Bring IT to Life BMC digital IT transforms 82% of the Fortune 500®.

Impetus is a provider of innovative Big Data solutions and services. We empower enterprises in the financial services, healthcare, digital media, travel, entertainment and manufacturing industries to gain big business impact from Big Data. We are experts in the complete Big Data ecosystem, including Hadoop, Cassandra, NoSQL,MPP systems, real-time and predictive analytics, machine learning, visualization, cloud computing and enterprise mobility. For more information, visit bigdata.impetus.com. Connect with us on LinkedIn and Twitter at @impetustech. Kyvos Insights is the world’s fastest BI on Big Data platform that delivers ad hoc analysis with instant response times at massive scale. Kyvos unlocks the power of data lakes with its patent pending technologies, serving big data insights interactively to analysts using their favorite BI tools. By creating a BI Consumption Layer, Kyvos makes big data available and secure for all users throughout the enterprise. Kyvos partners with industry leaders of BI, Cloud and Hadoop technologies.

Syncsort, the global leader in Big Iron to Big Data software, organizes data everywhere, to keep the world working – the same data that powers machine learning, AI and predictive analytics. We use our decades of experience so that more than 7,000 customers, including 84 of the Fortune 100, can optimize traditional data systems and deliver mission-critical data to next-generation analytic environments, quickly extracting value from their data anytime, anywhere. Our products provide a simple way to optimize, assure, integrate, and advance data, helping to solve for the present and prepare for the future. Learn more at syncsort.com.

AtScale makes BI work on Big Data. With AtScale, business users get interactive and multi-dimensional analysis capabilities, directly on Big Data, at maximum speed, using the tools they already know, own and love – from Microsoft Excel to Tableau Software to QlikView. Built by Big Data veterans from Yahoo!, Google and Oracle, AtScale is already enabling the BI on Big Data revolution at major corporations across healthcare, telecommunications, retail and online industries.

Accenture is a leading global professional services company, providing services and solutions in strategy, consulting, digital, technology and operations. Combining unmatched experience and specialized skills across 40+ industries and all business functions, Accenture works at the intersection of business and technology to help clients improve performance and create sustainable value for stakeholders. With 442,000+ people serving clients in 120+ countries, Accenture drives innovation to improve the way the world works and lives.

Arcadia Data provides the first native visual analytics software running within modern data platforms for the scale, performance, and security users need to glean real-time business insights in the era of big data and IoT. Arcadia Enterprise is purpose-built to analyze large volumes of data without moving it, filling the gap between self-service BI and advanced analytics. Arcadia Enterprise is deployed by some of the world’s leading brands, including Procter & Gamble, HPE, RBC, Kaiser, and Neustar.

Cask makes building and running big data solutions on-premise or in the cloud easy with Cask Data Application Platform (CDAP), the first unified integration platform for big data. CDAP reduces the time to production for data lakes and data applications by 80%, empowering the business to make better decisions faster. Cask customers and partners include AT&T, Cloudera, Ericsson, Lotame, Microsoft, Salesforce, and Tableau, among others.

Datameer empowers organizations to embark on a data journey that answers a wide range of new, deeper business questions to increase business agility and responsiveness. Datameer’s modern BI platform offers agile analytics on an enterprise-grade infrastructure that can rapidly answer these questions and operationalize the results across the business.

DataTorrent empowers customers to make their decisions matter. Whether your data comes from machines, people or automated systems, we enable you to build and deploy production applications easily and rapidly while extracting relevant insights to make real-time decisions. With the DataTorrent RTS Platform for streaming data and App Factory, you’ll simplify data integration, enrichment, and analytics—spurring your business to act quickly and with immediate impact.

Infoworks is the only solution providing complete functionality in a single platform from data ingestion, data synchronization and the building of data models and cubes Scale your data warehousing and analytics on Hadoop, not by adding armies of people but instead by using advanced machine intelligence

Inspur is a leading datacenter and cloud computing solutions provider, and is ranked by Gartner as the top 5 server manufacturer in the world. Inspur can provide total solutions at IaaS, PaaS and SaaS level with high-end servers, mass storage systems, cloud operating system and information security technology.

Leading organizations worldwide count on NetApp for software, systems and services to manage and store data. We help customers capitalize on the value of their data in the hybrid cloud through our Data Fabric strategy, data management expertise, portfolio and ecosystem.

Pepperdata is the Big Data performance company. Leading Enterprise companies use Pepperdata products and services to manage and improve the performance of Hadoop and Spark. The Pepperdata product suite enables customers to troubleshoot performance problems in production, increase cluster utilization, and enforce policies to support multi-tenancy. Pepperdata products and services work with customer Big Data systems both on-premise and in the cloud.

Cloudera delivers the modern platform for machine learning and advanced analytics The world’s leading organizations trust Cloudera to help solve their most challenging business problems with Cloudera Enterprise, the fastest, easiest and most secure data platform built on Apache Hadoop and the latest open source technologies.

Alation’s enterprise collaborative data platform empowers employees inside of data-driven enterprises to find, understand, and use the right data for better, faster business decisions. Alation combines the power of machine learning with human insight to automatically capture information about what the data describes, where the data comes from, who’s using it and how it’s used.

BlueData is transforming Big Data infrastructure – enabling Big-Data-as-a-Service either on-premises, in the cloud, or in a hybrid architecture. The BlueData EPIC software platform leverages Docker containers to make it easier, faster, and more cost-effective to deploy Hadoop, Spark, and other Big Data tools. You can spin up virtual Hadoop or Spark clusters within minutes – providing data scientists with on-demand access to the applications, data, and infrastructure they need. Learn more at www.bluedata.com

Jethro accelerates Interactive Business Intelligence (BI) on big data. Customers use Jethro to serve thousands of concurrent users interactively analyzing tens of billions of rows with sub-second response time. Jethro customers do not have to re-engineer the underlying data or make any changes to their front end screens. Jethro customers enjoy EDW (Enterprise Data Warehouse) functionality and performance at Hadoop scale and cost.

Kognitio is a pioneer in the development of scale-out, in-memory software for big data analytics. It provides an ultra-fast, high concurrency SQL layer allowing modern data visualization tools to maintain interactive performance. Kognitio is fully integrated with YARN on Hadoop or can be installed on standalone hardware infrastructure.

Talend is a leader in cloud and big data integration software that helps companies make data a strategic asset that provides the data agility required for companies to rapidly adopt the latest technology innovations and scale to meet the constantly evolving demands of modern business.

Dataguise gives data-driven enterprises a simple, powerful solution for global sensitive data governance. We empower them with the ability to detect, protect, and monitor sensitive data in real time across all their data repositories, both on premises and in the cloud.

We Access, Integrate and Deliver Data 10x Faster and 10x Cheaper than Any Other Middleware Solution. For more than 10 years our customers have experienced how affective this 10x factor has been to data integration projects through Enterprise Data Lakes, Corporate Data Access Layers, Logical Data Warehouses, Shared Data Services Layers, Open Data Portals, Single Unified Views of Business Entities, and others. Additionally, we help to decouple front-end applications from back-end systems, facilitate data provisioning, system migrations, and the reusability of data services for different enterprise uses.

DriveScale brings the benefits of hyperscale computing, originally developed by companies like Google and Amazon for their own use, to enterprise data centers running Hadoop and other big data workloads. Our “Software Composable Infrastructure” technology transforms rigid data centers into flexible and responsive scale-out deployments.

In 2007 two ex-Bank Of America colleagues – Partha Sen and Mike Upchurch – formed Fuzzy Logix. With a combined passion for solving problems with quantitative methods, data mining and pattern recognition, and a foresight of how businesses would increasingly collect information and need to achieve actionable insight from this data, they created a business that transformed data analytics. By performing the analytics directly where the data resides and eliminating the need to move it, in-database analytics was created.

We are the American Supercomputer Company. For technology powered visionaries with a passion for challenging the status quo, PSSC Labs is the answer for hand-crafted HPC and Big Data computing solutions that deliver relentless performance with the absolute lowest total cost of ownership. For over 25 years we have dedicated ourselves to delivering the absolute highest quality computing solutions to the world’s most demanding organizations. While other companies talk a good game, we actually deliver.

SynerScope has developed a patented (eco)system for combining and analyzing all kinds of data: numerical, text, video/voice, IoT, structured and unstructured, real time and historical. For large to massive data sets. The first one able to easily match several kinds of data and thus creating valuable decisive information.

TamGroup was founded upon the belief that doing what is best for our clients should be our highest priority. We believe it is this philosophy that has contributed to our success for over fifteen years. By using a consultative approach we empower our clients and provide them with a competitive advantage and effective IT solutions.

BlueTalon keeps enterprises in control of their data by allowing them to give users access to all the data they need, but not a byte more. The BlueTalon Policy Engine delivers the most fined-grained data protection, data filtering and data masking capabilities and ensures consistency of controls across multiple platforms, including Hadoop, Spark, SQL-based, and Big Data environments — even when deployed in the cloud.

Join us for DataWorks Summit Singapore–one amazing day of learning and discovery where developers and businesses come together to explore what’s next in open source big data technology. Early bird Registration is open now through September 7, be sure to get your ticket now! http://bit.ly/2MezTp4

Leading enterprises are using advanced analytics, data science, and artificial intelligence to transform the way they deliver customer and product experiences at scale. Discover how they’re doing it at the world’s premier big data event for everything data—DataWorks Summit. Early bird Registration for Singapore is open now through September 7, be sure to get your ticket now! http://bit.ly/2MezTp4

This website uses cookies for analytics, personalisation and advertising. To learn more or change your cookie settings, please read our Cookie Policy. By continuing to browse, you agree to our use of cookies.