Jana's Data Warehousing Story: Then vs. Now

Jana's mission is to bring internet access to over a billion people in emerging markets via mobile applications. Already driving more than 3.8 billion MB of app usage, Jana needed a scalable and cost-effective solution to process and analyze that data.

Snowflake and AWS are helping Jana keep up with the demands of processing and analyzing that rapidly growing stream of data. Using Amazon S3 and the Snowflake Elastic Data Warehouse, Jana processes and analyzes app usage data in a high-performance, scalable way without the cost and complexity of other solutions.

Join us to learn:

- How Jana made the transition from MySQL to a cloud data warehouse
- The data pipeline that Jana designed to move data from source to analysts
- The benefits Jana realized as a result of moving to a cloud infrastructure and data warehouse

Who should attend?

Data scientists, analysts, and anyone who needs to understand how to make critical data rapidly available - without capital expenditures.

HDFS on Kubernetes: Lessons Learned is a webinar presentation intended for software engineers, developers, and technical leads who develop Spark applications and are interested in running Spark on Kubernetes. Pepperdata has been exploring Kubernetes as potential Big Data platform with several other companies as part of a joint open source project.

There has been a flood of publicity around big data, data processing, and the role of predictive analytics in businesses of the future.
As business operators how do we get access to these valuable business insights, even when there is not a data analyst around to walk us through their results?

- Should your software emulate a data scientist?
- Learn about the power of data visualizations.
- Learn about creating value from disperse data sets.

Fraud detection is a classic adversarial analytics challenge: As soon as an automated system successfully learns to stop one scheme, fraudsters move on to attack another way. Each scheme requires looking for different signals (i.e. features) to catch; is relatively rare (one in millions for finance or e-commerce); and may take months to investigate a single case (in healthcare or tax, for example) – making quality training data scarce.

This talk will cover a code walk-through, the key lessons learned while building such real-world software systems over the past few years. We'll look for fraud signals in public email datasets, using IPython and popular open-source libraries (scikit-learn, statsmodel, nltk, etc.) for data science and Apache Spark as the compute engine for scalable parallel processing.

David will iteratively build a machine-learned hybrid model – combining features from different data sources and algorithmic approaches, to catch diverse aspects of suspect behavior:

Apache Spark is used to run these models at scale – in batch mode for model training and with Spark Streaming for production use. We’ll discuss the data model, computation, and feedback workflows, as well as some tools and libraries built on top of the open-source components to enable faster experimentation, optimization, and productization of the models.

Ram D. Sriram, Chief of the Software and Systems Division, IT Lab at National Institute of Standards and Technology

In this talk, Ram will provide a unified framework for Internet of Things, Cyber-Physical Systems, and Smart Networked Systems and Societies, and then discuss the role of ontologies for interoperability.

The Internet, which has spanned several networks in a wide variety of domains, is having a significant impact on every aspect of our lives. These networks are currently being extended to have significant sensing capabilities, with the evolution of the Internet of Things (IoT). With additional control, we are entering the era of Cyber-physical Systems (CPS). In the near future, the networks will go beyond physically linked computers to include multimodal-information from biological, cognitive, semantic, and social networks.

This paradigm shift will involve symbiotic networks of people (social networks), smart devices, and smartphones or mobile personal computing and communication devices that will form smart net-centric systems and societies (SNSS) or Internet of Everything. These devices – and the network -- will be constantly sensing, monitoring, interpreting, and controlling the environment.

A key technical challenge for realizing SNSS/IoE is that the network consists of things (both devices & humans) which are heterogeneous, yet need to be interoperable. In other words, devices and people need to interoperate in a seamless manner. This requires the development of standard terminologies (or ontologies) which capture the meaning and relations of objects and events. Creating and testing such terminologies will aid in effective recognition and reaction in a network-centric situation awareness environment.

Before joining the Software and Systems Division (his current position), Ram was the leader of the Design and Process group in the Manufacturing Systems Integration Division, Manufacturing Engineering Lab, where he conducted research on standards for interoperability of computer-aided design systems.

Good applications of machine learning and AI can be difficult to pull off. Join Brian Lange, Partner and Data Scientist at data science firm Datascope, as he discusses a variety of ways machine learning and AI can fail (from technical to human factors) so that you can avoid repeating them yourself.

If a volcano erupts in Iceland, why is Hong Kong your first supply chain casualty? And how do you figure out the most efficient route for bike share replacements?

In this presentation, Chief Data Scientist Dmitri Adler will walk you through some of the most successful use cases of supply-chain management, the best practices for evaluating your supply chain, and how you can implement these strategies in your business.

Continuous streams of data are generated in every industry from sensors, IoT devices, business transactions, social media, network devices, clickstream logs etc. Within these streams of data lie insights that are waiting to be unlocked.

This session with several live demonstrations will detail the build out of an end-to-end solution for the Internet of Things to transform data into insight, prediction, and action using cloud services. These cloud services enable you to quickly and easily build solutions to unlock insights, predict future trends, and take actions in near real-time.

Samartha (Sam) Chandrashekar is a Program Manager at Microsoft. He works on cloud services to enable machine learning and advanced analytics on streaming data.

If a database is filled automatically, but it's not analyzed, can it make an impact? And how do you combine disparate data sources to give you a real-time look at your environment?

Chief Executive Officer Merav Yuravlivker discusses how companies are missing out on some of their biggest profits (and how some companies are making billions) by aggregating disparate data sources. You'll learn about data sources available to you, how you can start automating this data collection, and the many insights that are at your fingertips.

Ray Rashid is a Senior Business Intelligence Consultant at Unilytics, specializing in ETL, data warehousing, data optimization, and data visualization. He has expertise in the financial, manufacturing and pharmaceutical industries.

We all are aware of the challenges enterprises are having with growing data and silo’d data stores. Businesses are not able to make reliable decisions with un-trusted data and on top of that, they don’t have access to all data within and outside their enterprise to stay ahead of the competition and make key decisions for their business.

This session will take a deep dive into current Healthcare challenges businesses are having today, as well as, how to build a Modern Data Architecture using emerging technologies such as Hadoop, Spark, NoSQL datastores, MPP Data stores and scalable and cost effective cloud solutions such as AWS, Azure and BigStep.

Past infrastructures provided compute, storage and network enabling static enterprise deployments which changed every few years. This talk will analyze the consequences of a world where production SAP and Spark clusters including data can be provisioned in minutes with the push of a button.

What does it mean for the IT architecture of an enterprise? How to stay in control in a super agile world?

Whether you're just starting out or a seasoned solution architect, developer, or data scientist, there are most likely key mistakes that you've probably made in the past, may be making now, or will most likely make in the future. In fact, these same mistakes are most likely impacting your company's overall success with their analytics program.

Join us for our upcoming webinar, 3 Critical Data Preparation Mistakes and How to avoid them, as we discuss 3 of the most critical, fundamental pitfalls and more!

• Importance of early and effective business partner engagement
• Importance of business context to governance
• Importance of change and learning to your development methodology

The basics of data cleaning are remarkably simple, yet few take the time to get organized from the start.

If you want to get the most out of your data, you're going to need to treat it with respect, and by getting prepared and following a few simple rules your data cleaning processes can be simple, fast and effective.

The Practical Data Cleaning webinar is a thorough introduction to the basics of data cleaning and takes you through:

Gartner predicts that “analytics will be pervasive … for decisions and actions across the business.” Sounds like analytics nirvana with instant access for any analysis you want to do, in other words self-service BI. Is this dream or reality?

Join this webinar to find out how clouds like AWS or Azure are moving the industry close to this nirvana today through simple assembly of cloud services combined with the appropriate consumption model of these services.

We will demonstrate how easy it is to provision your high end SAP HANA Database right next to your BI Analytics tier.

In the cloud computing era, data growth is exponential. Every day billions of photos are shared and large amount of new data created in multiple formats. Within this cloud of data, the relevant data with real monetary value is small. To extract the valuable data, big data analytics frame works like SparK is used. This can run on top of a variety of file systems and data bases. To accelerate the SparK by 10-1000x, customers are creating solutions like log file accelerators, storage layer accelerators, MLLIB (One of the SparK library) accelerators, and SQL accelerators etc.

FPGAs (Field Programmable Gate Arrays) are the ideal fit for these type of accelerators where the workloads are constantly changing. For example, they can accelerate different algorithms on different data based on end users and the time of the day, but keep the same hardware.

This webinar will describe the role of FPGAs in SparK accelerators and give SparK accelerator use cases.

Lesley-Anne Wilson, Group Product Rollout & Support Engineer, Digicel Group

Many studies have been done on the benefits of Predictive Analytics on customer engagement in order to change customer behaviour. However, the side less romanticized is the benefit to IT operations as it is sometimes difficult to turn the focus from direct revenue impacting gain to the more indirect revenue gains that can come from optimization and pro-active issue resolution.

I will be speaking, from an application operations engineers perspective, on the benefits to the business of using Predictive Analytics to optimize applications.

I will summarize the stages of analytics maturity that lead an organization from traditional reporting (descriptive analytics: hindsight), through predictive analytics (foresight), and into prescriptive analytics (insight). The benefits of big data (especially high-variety data) will be demonstrated with simple examples that can be applied to significant use cases.

The goal of data science in this case is to discover predictive power and prescriptive power from your data collections, in order to achieve optimal decisions and outcomes.

NoSQL databases like Cassandra and Couchbase are quickly becoming key components of the modern IT infrastructure. But this modernization creates new challenges – especially for storage. Storage in the broad sense. In-memory databases perform well when there is enough memory available. However, when data sets get too large and they need to access storage, application performance degrades dramatically. Moreover, even if enough memory is available, persistent client requests can bring the servers to their knees.

Join Storage Switzerland and Plexistor where you will learn:

1. What is Cassandra and Couchbase?
2. Why organizations are adopting them?
3. What are the storage challenges they create?
4. How organizations attempt to workaround these challenges.
5. How to design a solution to these challenges instead of a workaround.

Watch this webinar to learn about Big-Data-as-a-Service from experts at Dell and BlueData.

Enterprises have been using both Big Data and Cloud Computing technologies for years. Until recently, the two have not been combined.

Now the agility and efficiency benefits of self-service elastic infrastructure are being extended to big data initiatives – whether on-premises or in the public cloud.

In this webinar, you’ll learn about:

- The benefits of Big-Data-as-a-Service – including agility, cost-savings, and separation of compute from storage
- Innovations that enable an on-demand cloud operating model for on-premises Hadoop and Spark deployments
- The use of container technology to deliver equivalent performance to bare-metal for Big Data workloads
- Tradeoffs, requirements, and key considerations for Big-Data-as-a-Service in the enterprise