Avoid Re-Inventing the Wheel When Seeking Big Data Bliss

About 2/3 of all organizations are investing in Big Data and Hadoop but less than 10% have production applications. Given the broad acknowledgement that Big Data is a huge competitive differentiator, it’s likely that the companies that have not yet deployed Big Data applications are struggling to get their arms around this new technology.

Join this webinar to hear Michael Cote of 451 Group and Joe Goldberg from BMC Software discuss the state of this exciting technology, the challenges companies like yours are facing and how you can accelerate Big Data development and implementation projects with enterprise management solutions such as building Hadoop environments through self service provisioning and reducing the scripting required to create fully operational applications with enterprise workload automation.

This talk tells the story of implementation and optimization of a sparse logistic regression algorithm in spark. I would like to share the lessons I learned and the steps I had to take to improve the speed of execution and convergence of my initial naive implementation. The message isn’t to convince the audience that logistic regression is great and my implementation is awesome, rather it will give details about how it works under the hood, and general tips for implementing an iterative parallel machine learning algorithm in spark.

The talk is structured as a sequence of “lessons learned” that are shown in form of code examples building on the initial naive implementation. The performance impact of each “lesson” on execution time and speed of convergence is measured on benchmark datasets.

You will see how to formulate logistic regression in a parallel setting, how to avoid data shuffles, when to use a custom partitioner, how to use the ‘aggregate’ and ‘treeAggregate’ functions, how momentum can accelerate the convergence of gradient descent, and much more. I will assume basic understanding of machine learning and some prior knowledge of spark. The code examples are written in scala, and the code will be made available for each step in the walkthrough.

Lorand is a data scientist working on risk management and fraud prevention for the payment processing system of Zalando, the leading fashion platform in Europe. Previously, Lorand has developed highly scalable low-latency machine learning algorithms for real-time bidding in online advertising.

You can do a lot with a Raspberry and ASF projects. From a tiny object
connected to the internet to a small server application. The presentation
will explain and demo the following:

- Raspberry as small server and captive portal using httpd/tomcat.
- Raspberry as a IoT Sensor collecting data and sending it to ActiveMQ.
- Raspberry as a Modbus supervisor controlling an Industruino
(Industrial Arduino) and connected to ActiveMQ.

The 10x growth of transaction volumes, 50x growth in data volumes and drive for real-time visibility and responsiveness over the last decade have pushed traditional technologies including databases beyond their limits. Your choices are either buy expensive hardware to accelerate the wrong architecture, or do what other companies have started to do and invest in technologies being used for modern hybrid transactional analytical applications (HTAP).

Learn some of the current best practices in building HTAP applications, and the differences between two of the more common technologies companies use: Apache® Cassandra™ and Apache® Ignite™. This session will cover:

- The requirements for real-time, high volume HTAP applications
- Architectural best practices, including how in-memory computing fits in and has eliminated tradeoffs between consistency, speed and scale
- A detailed comparison of Apache Ignite and GridGain® for HTAP applications

About the speaker: Denis Magda is the Director of Product Management at GridGain Systems, and Vice President of the Apache Ignite PMC. He is an expert in distributed systems and platforms who actively contributes to Apache Ignite and helps companies and individuals deploy it for mission-critical applications. You can be sure to come across Denis at conferences, workshop and other events sharing his knowledge about use case, best practices, and implementation tips and tricks on how to build efficient applications with in-memory data grids, distributed databases and in-memory computing platforms including Apache Ignite and GridGain.

Before joining GridGain and becoming a part of Apache Ignite community, Denis worked for Oracle where he led the Java ME Embedded Porting Team -- helping bring Java to IoT.

Attend this session to learn how to easily share state in-memory across multiple Spark jobs, either within the same application or between different Spark applications using an implementation of the Spark RDD abstraction provided in Apache Ignite. During the talk, attendees will learn in detail how IgniteRDD – an implementation of native Spark RDD and DataFrame APIs – shares the state of the RDD across other Spark jobs, applications and workers. Examples will show how IgniteRDD, with its advanced in-memory indexing capabilities, allows execution of SQL queries many times faster than native Spark RDDs or Data Frames.

Akmal Chaudhri has over 25 years experience in IT and has previously held roles as a developer, consultant, product strategist and technical trainer. He has worked for several blue-chip companies such as Reuters and IBM, and also the Big Data startups Hortonworks (Hadoop) and DataStax (Cassandra NoSQL Database). He holds a BSc (1st Class Hons.) in Computing and Information Systems, MSc in Business Systems Analysis and Design and a PhD in Computer Science. He is a Member of the British Computer Society (MBCS) and a Chartered IT Professional (CITP).

When monitoring an increasing number of machines, the infrastructure and tools need to be rethinked. A new tool, ExDeMon, for detecting anomalies and raising actions, has been developed to perform well on this growing infrastructure. Considerations of the development and implementation will be shared.

Daniel has been working at CERN for more than 3 years as Big Data developer, he has been implementing different tools for monitoring the computing infrastructure in the organisation.

As data analytics becomes more embedded within organizations, as an enterprise business practice, the methods and principles of agile processes must also be employed.

Agile includes DataOps, which refers to the tight coupling of data science model-building and model deployment. Agile can also refer to the rapid integration of new data sets into your big data environment for "zero-day" discovery, insights, and actionable intelligence.

The Data Lake is an advantageous approach to implementing an agile data environment, primarily because of its focus on "schema-on-read", thereby skipping the laborious, time-consuming, and fragile process of database modeling, refactoring, and re-indexing every time a new data set is ingested.

With new technologies such as Hive LLAP or Spark SQL, do you still need a data warehouse or can you just put everything in a data lake and report off of that? No! In the presentation, James will discuss why you still need a relational data warehouse and how to use a data lake and an RDBMS data warehouse to get the best of both worlds.

James will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. He'll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution, and he will put it all together by showing common big data architectures.

Watson is a computer system capable of answering questions posed in natural language. Watson was named after IBM's first CEO, Thomas J. Watson. The computer system was specifically developed to answer questions on the quiz show Jeopardy! (where it beat its human competitors) and was then used in commercial applications, the first of which was helping with lung cancer treatment.

NetApp is now using IBM Watson in Elio, a virtual support assistant that responds to queries in natural language. Elio is built using Watson’s cognitive computing capabilities. These enable Elio to analyze unstructured data by using natural language processing to understand grammar and context, understand complex questions, and evaluate all possible meanings to determine what is being asked. Elio then reasons and identifies the best answers to questions with help from experts who monitor the quality of answers and continue to train Elio on more subjects.

Elio and Watson represent an innovative and novel use of large quantities of unstructured data to help solve problems, on average, four times faster than traditional methods. Join us at this webcast, where we’ll discuss:

Chris Trimper will show you how to leverage Splunk for storing and analyzing performance data from LoadRunner. This allows for easy trending and possible collaboration with other application metrics stored in Splunk. We will also look at building self-service dashboards showing application performance metrics.

In the enterprise, block storage typically handles the most critical applications such as database, ERP, product development, and tier-1 virtualization. The dominant connectivity option for this has long been Fibre Channel SAN (FC-SAN), but recently many customers and block storage vendors have turned to iSCSI instead. FC-SAN is known for its reliability, lossless nature, 2x FC speed bumps, and carefully tested interoperability between vendors. iSCSI is known for running on ubiquitous Ethernet networks, 10x Ethernet speed bumps, and supporting commodity networking hardware from many vendors.

As the storage world moves to more flash and other non-volatile memory, more cloud, and more virtualization (or more containers), it’s time to revisit one of the great IT debates: Should you deploy Fibre Channel or iSCSI? Attend this SNIA Ethernet Storage Forum webcast to learn:
•Will Fibre Channel or iSCSI deliver faster performance? Does it depend on the workload?
•How is the wire speed race going between FC and iSCSI? Does anyone actually run iSCSI on 100GbE? When will 128Gb Fibre Channel arrive?
•Do Linux, Windows, or hypervisors have a preference?
•Is one really easier [to install/manage] than the other, or are they just different?
•How does the new NVMe over Fabrics protocol affect this debate?

Join SNIA experts as they compare FC vs. iSCSI and argue in an energetic yet friendly way about their differences and merits of each.

After you watch the webcast check out the Q&A blog http://sniaesfblog.org/?p=680

IT is a key player in the digital and cognitive transformation of business processes delivering solutions for improved business value with analytics. This session will step by step explain the journey to secure production while adopting new analytics technologies leveraging mainframe core business assets

Unlocking the data’s true value is a challenge, but there are a range of tools and techniques that can help. This live discussion will focus on the data analytics landscape; compliance considerations and opportunities for improving data utility in 2018 and beyond.

In a recent survey of enterprise hybrid cloud users, the Evaluator Group saw that nearly 60% of respondents indicated that lack of interoperability is a significant technology-related issue that they must overcome in order to move forward. In fact, lack of interoperability was chosen above public cloud security and network security as significant inhibitors. This webcast looks at enterprise hybrid cloud objectives and barriers with a focus on cloud interoperability within the storage domain and the SNIA’s Cloud Storage Initiative to promote interoperability and portability of data stored in the cloud.

Researchers generate huge amounts of valuable unstructured data and articles from research every day. The potential for this information is huge: cancer and pharmaceutical breakthroughs, advances in technology and cultural research that can improve the world we live in.

This webinar discusses how text mining and Machine Learning can be used to make connections across this broad range of files and help drive innovation and research. We discuss using Kubernetes microservices to analyse the data and then applying Machine Learning and graph databases to simplify the reuse of the data.

Public, private and hybrid cloud are nothing new, but protecting sensitive data stored on these servers is still of the utmost concern. The NSA is no exception.

It recently became publicized that the contents of a highly sensitive hard drive belonging to the NSA (National Security Agency) were compromised. The virtual disk containing the sensitive data came from an Army Intelligence project and was left on a public AWS (Amazon Web Services) storage server, not password-protected.

This is one of at least 5 other leaks of NSA-related data in recent years. Not to mention the significant number of breaches and hacks we’ve experienced lately, including Yahoo!, Equifax, WannaCry, Petya, and more.

The culprit in this case? Unprotected storage buckets. They have played a part in multiple other recent exposures, and concern is on the rise. When it comes to storing data on public cloud servers like AWS, Azure, Google Cloud, Rackspace and more, what are the key responsibilities of Storage Architects and Engineers, CIOs and CTOs to avoid these types data leaks?

Tune in with Chris Vickery, Director of Cyber Risk Research at UpGuard and the one who discovered the leak, along with George Crump, Chief Steward, Storage Switzerland, David Linthicum, Cloud Computing Visionary, Author & Speaker, Charles Goldberg, Sr. Director of Product Marketing, Thales e-Security, and Mark Carlson, Co-Chair, SNIA Technical Council & Cloud Storage Initiative, for a live panel discussion on this ever-important topics.

The new business reality of GDPR and how you use customer data is inexorably approaching, if you work in or are doing business with anyone in the EU you must deal with this regulation.

With data protection, there are really only two options: protection of data through ever-more data centralization and security or turning the customer data paradigm on its head and decentralize the data.

We have a new model: give your customers full control over their data, gain their trust, and lower your costs with the open-source Pillar Business Wallet. Join our conversation Thursday, 30th of November.

The data economy and digital technologies are deeply transforming almost all areas of our lives. One of the most heavily transformed revolve around insurance and healthcare with a number of really interesting development possibly redefining the way we take care of ourselves and the way we consumer and use insurance as well.

From harnessing the power of data to better help mental health patients, carers and medical personnel with their treatments to assessing the risk of developing broad range of illnesses and engaging better with users to propose them personalised healthy life plans to using big data and analytics to track down and prepare for epidemics to using data to better cover cars and drivers with car insurances and finally using social media data for insurers to better engage with customers, this webinar will propose a fascinating exploration of the opportunities, risks, new models supporting the digital transformation in banking.

We’re all accustomed to transferring money from one bank account to another; a credit to the payer becomes a debit to the payee. But that model uses a specific set of sophisticated techniques to accomplish what appears to be a simple transaction. We’re also aware of how today we can order goods online, or reserve an airline seat over the Internet. Or even simpler, we can update a photograph on Facebook. Can these applications use the same models, or are new techniques required?

One of the more important concepts in storage is the notion of transactions, which are used in databases, financials, and other mission critical workloads. However, in the age of cloud and distributed systems, we need to update our thinking about what constitutes a transaction. We need to understand how new theories and techniques allow us to undertake transactional work in the face of unreliable and physically dispersed systems. It’s a topic full of interesting concepts (and lots of acronyms!). In this webcast, we’ll provide a brief tour of traditional transactional systems and their use of storage, we’ll explain new application techniques and transaction models, and we’ll discuss what storage systems need to look like to support these new advances.

And yes, we’ll explain all the acronyms and nomenclature too.

You will learn:

•A brief history of transactional systems from banking to Facebook
•How the Internet and distributed systems have changed and how we view transactions
•An explanation of the terminology, from ACID to CAP and beyond
•How applications, networks & particularly storage have changed to meet these demands

Public cloud deployments have become irresistible in terms of flexibility, low barriers to entry, security, and developer friendliness. But the sheer inertia of traditional data lakes make them difficult to transition to cloud. In this talk we'll look at examples of how leading companies have made the transition using open source technologies and hybrid strategies.

Instead of following a "lift and shift" strategy for moving data lake workloads to the cloud, there are new considerations unique to cloud that should be considered alongside traditional approaches related to compute (eg, GPU, FPGA), storage (object store vs. file store), integrations, and security.

Viewers will take away techniques they can immediately apply to their own projects.