Use your business data to your advantage with the help of Syncfusion’s new data science offerings. Discover how a custom big data solution can provide your company with valuable predictions about key market trends.

Who This Book Is For

This book shows functional developers and analysts how to leverage their existing knowledge of Haskell specifically for high-quality data analysis. A good understanding of data sets and functional programming is assumed.

Table of Contents

Chapter 1: The Hunt for Data

Introduction

Harnessing data from various sources

Accumulating text data from a file path

Catching I/O code faults

Keeping and representing data from a CSV file

Examining a JSON file with the aeson package

Reading an XML file using the HXT package

Capturing table rows from an HTML page

Understanding how to perform HTTP GET requests

Learning how to perform HTTP POST requests

Traversing online directories for data

Using MongoDB queries in Haskell

Reading from a remote MongoDB server

Exploring data from a SQLite database

Chapter 2: Integrity and Inspection

Introduction

Trimming excess whitespace

Ignoring punctuation and specific characters

Coping with unexpected or missing input

Validating records by matching regular expressions

Lexing and parsing an e-mail address

Deduplication of nonconflicting data items

Deduplication of conflicting data items

Implementing a frequency table using Data.List

Implementing a frequency table using Data.MultiSet

Computing the Manhattan distance

Computing the Euclidean distance

Comparing scaled data using the Pearson correlation coefficient

Comparing sparse data using cosine similarity

Chapter 3: The Science of Words

Introduction

Displaying a number in another base

Reading a number from another base

Searching for a substring using Data.ByteString

Searching a string using the Boyer-Moore-Horspool algorithm

Searching a string using the Rabin-Karp algorithm

Splitting a string on lines, words, or arbitrary tokens

Finding the longest common subsequence

Computing a phonetic code

Computing the edit distance

Computing the Jaro-Winkler distance between two strings

Finding strings within one-edit distance

Fixing spelling mistakes

Chapter 4: Data Hashing

Introduction

Hashing a primitive data type

Hashing a custom data type

Running popular cryptographic hash functions

Running a cryptographic checksum on a file

Performing fast comparisons between data types

Using a high-performance hash table

Using Google's CityHash hash functions for strings

Computing a Geohash for location coordinates

Using a bloom filter to remove unique items

Running MurmurHash, a simple but speedy hashing algorithm

Measuring image similarity with perceptual hashes

Chapter 5: The Dance with Trees

Introduction

Defining a binary tree data type

Defining a rose tree (multiway tree) data type

Traversing a tree depth-first

Traversing a tree breadth-first

Implementing a Foldable instance for a tree

Calculating the height of a tree

Implementing a binary search tree data structure

Verifying the order property of a binary search tree

Using a self-balancing tree

Implementing a min-heap data structure

Encoding a string using a Huffman tree

Decoding a Huffman code

Chapter 6: Graph Fundamentals

Introduction

Representing a graph from a list of edges

Representing a graph from an adjacency list

Conducting a topological sort on a graph

Traversing a graph depth-first

Traversing a graph breadth-first

Visualizing a graph using Graphviz

Using Directed Acyclic Word Graphs

Working with hexagonal and square grid networks

Finding maximal cliques in a graph

Determining whether any two graphs are isomorphic

Chapter 7: Statistics and Analysis

Introduction

Calculating a moving average

Calculating a moving median

Approximating a linear regression

Approximating a quadratic regression

Obtaining the covariance matrix from samples

Finding all unique pairings in a list

Using the Pearson correlation coefficient

Evaluating a Bayesian network

Creating a data structure for playing cards

Using a Markov chain to generate text

Creating n-grams from a list

Creating a neural network perceptron

Chapter 8: Clustering and Classification

Introduction

Implementing the k-means clustering algorithm

Implementing hierarchical clustering

Using a hierarchical clustering library

Finding the number of clusters

Clustering words by their lexemes

Classifying the parts of speech of words

Identifying key words in a corpus of text

Training a parts-of-speech tagger

Implementing a decision tree classifier

Implementing a k-Nearest Neighbors classifier

Visualizing points using Graphics.EasyPlot

Chapter 9: Parallel and Concurrent Design

Introduction

Using the Haskell Runtime System options

Evaluating a procedure in parallel

Controlling parallel algorithms in sequence

Forking I/O actions for concurrency

Communicating with a forked I/O action

Killing forked threads

Parallelizing pure functions using the Par monad

Mapping over a list in parallel

Accessing tuple elements in parallel

Implementing MapReduce to count word frequencies

Manipulating images in parallel using Repa

Benchmarking runtime performance in Haskell

Using the criterion package to measure performance

Benchmarking runtime performance in the terminal

Chapter 10: Real-time Data

Introduction

Streaming Twitter for real-time sentiment analysis

Reading IRC chat room messages

Responding to IRC messages

Polling a web server for latest updates

Detecting real-time file directory changes

Communicating in real time through sockets

Detecting faces and eyes through a camera stream

Streaming camera frames for template matching

Chapter 11: Visualizing Data

Introduction

Plotting a line chart using Google's Chart API

Plotting a pie chart using Google's Chart API

Plotting bar graphs using Google's Chart API

Displaying a line graph using gnuplot

Displaying a scatter plot of two-dimensional points

Interacting with points in a three-dimensional space

Visualizing a graph network

Customizing the looks of a graph network diagram

Rendering a bar graph in JavaScript using D3.js

Rendering a scatter plot in JavaScript using D3.js

Diagramming a path from a list of vectors

Chapter 12: Exporting and Presenting

Introduction

Exporting data to a CSV file

Exporting data as JSON

Using SQLite to store data

Saving data to a MongoDB database

Presenting results in an HTML web page

Creating a LaTeX table to display results

Personalizing messages using a text template

Exporting matrix values to a file

What You Will Learn

Obtain and analyze raw data from various sources including text files, CSV files, databases, and websites

Implement practical tree and graph algorithms on various datasets

Apply statistical methods such as moving average and linear regression to understand patterns

Fiddle with parallel and concurrent code to speed up and simplify time-consuming algorithms

Find clusters in data using some of the most popular machine learning algorithms

Manage results by visualizing or exporting data

In Detail

This book will take you on a voyage through all the steps involved in data analysis. It provides synergy between Haskell and data modeling, consisting of carefully chosen examples featuring some of the most popular machine learning techniques.

You will begin with how to obtain and clean data from various sources. You will then learn how to use various data structures such as trees and graphs. The meat of data analysis occurs in the topics involving statistical techniques, parallelism, concurrency, and machine learning algorithms, along with various examples of visualizing and exporting results. By the end of the book, you will be empowered with techniques to maximize your potential when using Haskell for data analysis.

Authors

Nishant Shukla

Nishant Shukla is a computer scientist with a passion for mathematics. Throughout the years, he has worked for a handful of start-ups and large corporations including WillowTree Apps, Microsoft, Facebook, and Foursquare.

Stepping into the world of Haskell was his excuse for better understanding Category Theory at first, but eventually, he found himself immersed in the language. His semester-long introductory Haskell course in the engineering school at the University of Virginia (http://shuklan.com/haskell) has been accessed by individuals from over 154 countries around the world, gathering over 45,000 unique visitors.

Besides Haskell, he is a proponent of decentralized Internet and open source software. His academic research in the fields of Machine Learning, Neural Networks, and Computer Vision aim to supply a fundamental contribution to the world of computing.

Alerts & Offers

Series & Level

We understand your time is important. Uniquely amongst the major publishers, we seek to develop and publish the broadest range of learning and information products on each technology. Every Packt product delivers a specific learning pathway, broadly defined by the Series type. This structured approach enables you to select the pathway which best suits your knowledge level, learning style and task objectives.

Learning

As a new user, these step-by-step tutorial guides will give you all the practical skills necessary to become competent and efficient.

Beginner's Guide

Friendly, informal tutorials that provide a practical introduction using examples, activities, and challenges.

Essentials

Fast paced, concentrated introductions showing the quickest way to put the tool to work in the real world.

Cookbook

A collection of practical self-contained recipes that all users of the technology will find useful for building more powerful and reliable systems.

Blueprints

Guides you through the most common types of project you'll encounter, giving you end-to-end guidance on how to build your specific solution quickly and reliably.

Mastering

Take your skills to the next level with advanced tutorials that will give you confidence to master the tool's most powerful features.

Starting

Accessible to readers adopting the topic, these titles get you into the tool or technology so that you can become an effective user.

Progressing

Building on core skills you already have, these titles share solutions and expertise so you become a highly productive power user.