The Download: Tech Talks by the HPCC Systems Community, Episode 9

Join us November 16 for another episode of The Download: HPCC Systems Community Tech Talks!

This series of workshops is specifically designed for the community by the community with the goal to share knowledge, spark innovation, and further build and link the relationships within our HPCC Systems community.

Featured speakers and topics include:

Robert Pelley, Architect for UK and Ireland, LexisNexis Risk Solutions - Integrating REDIS with HPCC Systems in high volume UK infrastructure.
REDIS (REmote DIctionary Server), an in-memory caching technology will be integrated into the LexisNexis application stack to provide an additional cache at the entry point of the application infrastructure. This will serve as a cache of Product Responses whereas existing caches will continue to serve Vendor Responses. The aim of the REDIS front-end Product Response cache is to improve system throughput and response times.

Bob Foreman, Senior Software Engineer, LexisNexis Risk Solutions – ECL Tips: The Bright Green Data Generation Machine (DataGen)
Bob will be talking about the ECL Code Generator, DataGen, best practices and tips for its use in generating random data and will walk through a short demo.

Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation, and further build and link the relationships within our HPCC Systems community.

Featured speakers include:

Adwait Joshi, CEO DataSeers - HPCC Systems - An IoT use case for Payments
Traditionally we all have used Thor for data processing and ROXIE indexes for data pulls. Think about using ROXIE for a data ingest and Thor directly pulling data into the back end repository. This talk will explain about how DataSeers has designed a realtime transaction monitoring system using HPCC Systems, Kafka, ElasticSearch and MySQL pushing the envelope for a typical use case. Learn the roadblocks we encountered, how we worked around them, and how we hardened the system to be truly disaster resistant with all open source technologies.

Yanrui Ma, Software Architect, LexisNexis Risk Solutions - Dynamic ESDL Has Become More Dynamic In 7.0
In this talk, Yanrui will talk about some of the major changes with Dynamic ESDL in 7.0, with a focus on the mechanisms and enhancements that have made it even more dynamic. He’ll give a demo of creating a DESDL service with the improved “esdl” command line to show you how easy and quick it can be. He’ll also go over DESDL related ECL Watch changes in 7.0, and some of the upcoming DESDL features.

Bob Foreman, Senior Software Engineer, HPCC Systems, LexisNexis Risk Solutions - ECL Tip: All About the ECL SET
This month’s ECL Tip spotlights the ECL SET definition, value type, and other supported functions that use it. Several code examples and best practices will be demonstrated.

In this session, we are highlighting some of the rock stars of the HPCC Systems Community. Today's session is 5 Questions with Richard Chapman.

Richard has been with LexisNexis Risk Solutions for more than 25 years. He is the VP of Research and Development and the leader of the HPCC Systems development team. Richard wrote the code to create the HPCC Systems query cluster, also knows as ROXIE which stands for Richard’s Online XML Inquiry Engine. He was one of the original designers of ECL which was created as a data centric programming language for easily expressing problems involving large quantities of data.

Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation, and further build and link the relationships within our HPCC Systems community.
Featured speakers and topics include:

•Rob Mansfield, Senior Data Scientist, Proagrica - Dapper - A bundle to make your ECL neater
Have you ever written a long project for a simple column rename and thought, this should be easier? What about nicely named output statements? Yeah they bother me too. Oh, and DEDUP(SORT(DISTINCT()))? There is a better way! Learn how dapper can help!

•Bob Foreman, Senior Software Engineer, HPCC Systems, LexisNexis Risk Solutions - ECL Tip: The Seven Faces (Forms) of Dr. LOOP (Function)
The LOOP function has always been a powerful, yet tough ECL function to understand and use. Bob will review and examine the upcoming major changes to this documentation and showcase new examples.

In this session, we are highlighting some of the rock stars of the HPCC Systems Community. Today's session is 5 Questions with Lil Xu.

Lili is in the final stages of completing her PhD in Computer Science. She has worked in the DICE lab directed by Dr. Apon in the school of computing at Clemson University.

Lili has completed three internships with the HPCC Systems team, working on machine learning applications. Her research area is machine learning, natural language processing and high performance computing. We are pleased that Lili has joined the team as a LexisNexis employee.

In this session, we are highlighting some of the rock stars of the HPCC Systems Community. Today's session is 5 Questions with Amy Apon, Ph.D.

Dr. Apon maintains an active research program at Clemson. Areas of research interest include cloud computing, performance modeling and analysis of parallel and distributed system, data-intensive computing, emerging parallel architectures, and impact of high performance computing to research competitiveness. Her research is currently supported by the National Science Foundation, the Department of Education, BMW, HPCC Systems, LexisNexis, Elsevier Scopus, RELX Group, and Amazon.

Jayashree Ukkinagatti, Rashtreeya Vidyalaya College of Engineering, India
Set up Automatic Builds for the continuous integration of ECL queries stored in GIT using Jenkins

Software developers work in an isolated team. If they need to integrate their changes with different code base, waiting for days to integrate their code may create many merge conflicts , may get hard to fix the bugs or may lead to duplicate efforts. In this presentation, Jayashree will speak about the setting up of automatic builds to integrate ECL queries stored in Git using the Jenkins deployment pipeline techniques, when the pull request is made on additions or changes to ECL queries stored in Git.

Nicole Navarro, New College of Florida
Measuring the geo-social distribution of Opioid Prescriptions

Drug overdose was the leading cause of accidental death in the US in 2015, and the number of drug overdoses involving opioids in 2016 was 42,249 – an increase of 18% per year since 2014. In this talk, Nicole will explain how she utilized the open source HPCC Systems capabilities around knowledge engineering to create data features and interactive visualizations. These were designed to allow research into Drug Socialization across social groups and geographical regions with a focus on opioid prescription rates.

In this session, we are highlighting some of the rock stars of the HPCC Systems Community. Today's session is 5 Questions with David Dasher.

David Dasher is the Chief Technology Officer and Founder of CPL Online, the leading provider of e-Learning and digital services to the UK’s hospitality sector, that since 2018 has been part of CGA Group.

With over 25 years’ experience within the IT sector, he has worked extensively in the UK’s corporate sector developing database, marketing, and management solutions. Under David’s leadership, CPL Online has established itself as a market leader and enjoyed several years of strong year on year growth.

Farah Al Shanik, Clemson University - Equivalence Terms for Text Search Bundle
Text Search Bundle (TSB) is an open source project for searching on XML text documents & contains many subtasks, one being equivalence terms. We can consider equivalence terms as strong synonyms for TSB. Several term equivalences: initialism, abbreviation, synonyms & similarity based on context. We used HPCC Systems to develop a Text search tool via Moby thesaurus to return a set of synonyms, word2vec algorithm to return similar words, then built a dataset for state names & its abbreviation to return the set of related documents while improving the initialism for TSB to find strings with or without the punctuation.

Soukaina Filali, Georgia State University - Fraud Detection on Transactional Data using a Time Series Mining Approach
The project consists of detecting fraudulent pre-paid cards from non-fraudulent ones using mined patterns on their respective historical bank transactions data. There are numerous types of card programs, each of which comes with different fraud risk levels. Every fraud category has representative patterns that a human manually monitors on a daily basis. The goal here is to combine the domain expert engineered features with time series shapelets mining techniques to provide an automated fraud detection solution, which can potentially help in early fraud detection.

Lili Xu, Clemson University & Gus Reyna, LexisNexis - Using HPCC Systems ML to Map Thousands of Public Records Data Descriptions to Standard Codes
There is a challenge of incorporating public records data into business processes given disparate descriptions across states for similar events, and finding standards giving a consistent meaning for use. This session tells the story of how HPCC Systems ML addressed the problem of mapping thousands of disparate public record data descriptions to a corresponding set of standard codes.

In this session, we are highlighting some of the rock stars of the HPCC Systems Community. Today's session is 5 Questions with Itauma Itauma.

Itauma Itauma is a doctoral candidate at Keiser University and a computer science instructor at Wayne State University. His interests lie in learning analytics and utilizing HPCC Systems for educational research. He has an undergraduate degree in Electrical Engineering from the University of Ilorin and two Masters Degrees, a Master of Science in Computer Engineering from Istanbul Technical University, majoring in human-robot interaction and a Master of Science in Computer Science from Wayne State University where his thesis was based on leveraging HPCC Systems for Big Data analytics.

Robert will cover what he implemented during his summer internship. Combining HPCC Systems and Google’s TensorFlow, Robert created a parallel stochastic gradient descent algorithm to provide a basis for future deep neural network research and to enhance HPCC System’s distributed neural network training capabilities.

Aramis Tanelus, programmer and senior at American Heritage High School where he is the lead programmer for the Advanced Robotics Team - Developing HPCC Systems Data Ingestion APIs for Common Robotic Sensors.

Aramis’s project will make it easy for anyone in robotics around the world to ingest data from common robotic sensors into an HPCC Systems platform for use in data analysis. Aramis will be speaking about his work on the autonomous agricultural robot and implementing new packages for the Robotics Operating System to interface with HPCC Systems for big data analysis.

The built-in "Message Passing" library in HPCC Systems is designed to handle these communications among dissimilar components and perform non-trivial communication patterns among them. Saminda will explore how this library currently operates and how we can introduce a different implementation such as an existing popular library called MPI.

Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation and further build and link the relationships within our HPCC Systems community. This episode will feature three speakers on the following topics:

Jingqing Zhang, Imperial College of London
Deep Sequence Learning and Text Classification

Bob Foreman, LexisNexis Risk Solutions
ECL Summer Code Camp Review
On May 16th, five HPCC Systems Ambassadors along with Flavio Villanustre met with eight iRISE2 members for a two-hour ECL Code Camp. The event was a great success, and I thought I’d share with the community what we did and some of the ECL ideas that came out of it. Tips from Data Ingestion to ECL to Data Evaluation will be included in this segment.

Join Anirudh Shah, Founder & CEO, 3LOQ Labs, and Flavio Villanustre, VP Technology, HPCC Systems, to learn how 3LOQ is solving the problem of customer churn with open source big data and machine learning technology. 3LOQ addresses this challenge by deploying proprietary machine learning algorithms to analyze billions of data points and map out dynamic feature recommendations to reinforce repeated usage of a product. The end result? Reduced churn with high customer engagement for businesses.

3LOQ recently partnered with a leading Indian banking institution to increase adoption of their digital channels. The project yielded impressive results for the client, including a:

· 45% reduction in customer churn
· 145% increase in digital banking transactions
· 75% increase in users who made four or more transactions per month

In this webcast, Flavio will give an overview of one of the key tech tools that contributes to 3LOQ's success, the completely free, open source HPCC Systems big data platform. Anirudh will share how 3LOQ Labs leverages this platform to:

• Analyze four terabytes of data combined with built-in analytics libraries to create personalized recommendations
• Utilize efficient coding in an implicitly parallel platform that allows prototypes to be developed and iterated quickly
• Enable horizontal scaling on commodity hardware, with the flexibility to deploy both on premises and in the cloud

Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation and further build and link the relationships within our HPCC Systems community. This episode will feature three speakers on the following topics:

Tai Donovan, Robotics Director, American Heritage School - High School Autonomous Agricultural Project
A group of 5-6 students are working on an autonomous agricultural project with the goal of providing time sensitive data to the owner-operator/farmer/grower of a production farm. Tai will discuss their challenges and how he is using HPCC Systems.

Lorraine Chapman, Consulting Business Analyst, LexisNexis Risk Solutions - Meet Our Summer Interns
By the end of 2018, ten students will have completed projects as part of the HPCC Systems intern program. Find out about these students, including where and what they are studying, the projects they will be working on and the intern experience we provide to help them feel part of the team. Lorraine will also speak about how you can get involved with the program by being a mentor, or contributing a project idea for a new feature or enhancement to the HPCC Systems platform and/or Machine Learning Library.

Richard Taylor, Chief Trainer, HPCC Systems, LexisNexis Risk Solutions – Current/Longest Event Sequence by Month
Richard will discuss processing event dates to discover for each event within a given time frame: the current number of sequential months the event occurred, and the longest contiguous month-by-month sequence. This topic is based on questions from one of our Statistical Modelers (new to ECL) regarding how to approach the problem in a non-procedural manner. The example code will make use of the GROUP and HAVING functions.

Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation and further build and link the relationships within our HPCC Systems community.

Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation and further build and link the relationships within our HPCC Systems community.

Over 4,000 U.S. workers die on the job every year. While new wearable technologies are aggressively entering consumer applications, industrial safety equipment has not seen a fundamental innovation in the last decade.

Join us to learn how Guardhat CTO Anupam Sengupta and Guardhat use open source big data technology to address this issue with its “smart hard hat ecosystem”, an industrial wearable that uses IoT and wireless communications systems to protect and empower industrial workers.

In this webcast, Flavio will give an overview of the completely free, open source HPCC Systems big data platform.

Anupam will share how Guardhat leveraged this platform to:
• Allow real-time complex event processing of vast amounts of streaming data.
• Enable horizontal scaling on commodity hardware, with the flexibility to deploy both on premises and in the cloud.
• Support big data analytics including the ability to analyze, identify, and predict trends.
• Enable rapid green-field development

Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation and further build and link the relationships within our HPCC Systems community.

Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation and further build and link the relationships within our HPCC Systems community.

Episode 10 will kick off our first Tech Talk in 2018 and includes 15 minute Tech Talks featuring speakers from the community:

The HPCC Systems Machine Learning Library contains a number of powerful tools, but it is important to use them properly. Chris will discuss how to ask the right questions by taking a step backwards from the methods themselves and examining the requirements defined by the applications.

The HPCC Systems platform provides everything you need to easily create production grade web services to deliver your query data. Rodrigo will discuss the tools and frameworks provided by the HPCC Systems platform and walk through the end-to-end creation of a sample web service.

Bringing heterogeneous data into a homogenous data warehouse environment is one of the most daunting aspects of any big data implementation.

Even though Apache Spark and HPCC Systems Thor can be thought of as complementary, there is interest in comparing their performance with data analytics-related benchmarks, specifically transformation, cleaning, normalization, and aggregation. Join us to hear how HPCC Systems Thor's performance compares to Apache Spark utilizing standard benchmarking methodologies.

Learn how these benchmarks and HPCC Systems can help you establish new baselines that:
•Improve the speed and accuracy of the transformation, cleaning, normalization, and aggregation processes
•Enable efficient use of developer resources and development budgets
•Facilitate the use of standard hardware, operating systems, and protocols

HPCC Systems is an open source Big Data analytics solution for businesses of all sizes, allowing them to improve critical time to results and decisions. Subscribe to our channel to keep informed of the latest HPCC Systems events.