Ggvis

* Ggvis is used“…more for data exploration than data presentation. …ggvis makes many more assumptions about what you’re trying to do: this allows it to be much more concise, at some cost of generality.”* “Ggvis provides a tree like structure allowing properties and data to be specified once and inherited by children.”
* Ggvis vs Ggplot2* Range selector for ggvis

2. Choosing Tools for Interactivity

Shiny

Shiny simply turns your R into a web server and lets you interact with your data through a browser. See the cheatsheet “Shiny” (also above).
Shiny is ok to start with, however you might wish to extend it with widgets or whatever fits your needs best.

Dashboard Theory

The essence in one quote: “The key to usability is the association between appropriate controllers and the individual meters. In a car, the controllers are the steering wheel, the gas pedal, the brake pedal, the ignition switch, and the gearshift, primarily. Generally, there are one or two controllers associated with each meter and the action of each controller is usually proportional to the metric that appears on the meter (e.g. Gas pedal and brake pedal control speed; gas pedal and gear shift control RPM, etc.). There are more controllers on a plane, but the same relationships hold between controllers and meters, at least for older planes.”

4. Managing Your Workflow

A workflow is used to automate repetitive operations you perform on the data. In case you generate so much data it turns into a hard-to-use pile, as was in my case, you can plan ahead and have a look at various tools that suit your needs. I am still a long way from organizing every aspect of the project into a coherent system, but my preliminary survey of available software makes me think that DAWN (see below) seems to be most flexible; however, it requires most programming skills. Other tools, such as Rapid Miner or Weka, can be used with the R programming environment almost out of the box.

Magittr (R package)

This R package brings “forward-piping” operators, e.g. %>% (Just see the ‘cheatsheet’ “Data Wrangling” above.)quote from the description of the package: “The magrittr package offers a set of operators which promote semantics that will improve your code by structuring sequences of data operations left-to-right (as opposed to from the inside and out), avoiding nested function calls, minimizing the need for local variables and function definitions, and making it easy to add steps anywhere in the sequence of operations.”

Other Datamining Software (commercial and open source)

5. Data Mining/Analytics Workflow Theory

6. Useful Quotes from R-Bloggers, Mostly

An Introduction to Statistical Learning with Applications in R (free pdf)

http://www-bcf.usc.edu/~gareth/ISL/“This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.”

Elements of Statistical Learning (free pdf)

http://statweb.stanford.edu/~tibs/ElemStatLearn/download.html“The go-to bible for this data scientist and many others is The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Each of the authors is an expert in machine learning / prediction, and in some cases invented the techniques we turn to today to make sense of big data: ensemble learning methods, penalized regression, additive models and nonparemetric smoothing, and much much more.”

Why you should learn R first for data science

Data wrangling“It’s often said that 80% of the work in data science is data manipulation. … R has some of the best data management tools you’ll find. The dplyr package in R makes data manipulation easy. … When you “chain” the basic dplyr together, you can dramatically simplify your data manipulation workflow.”

Data visualization“ggplot2 is one of the best data visualization tools around, as of 2015. What’s great about ggplot2 is that as you learn the syntax, you also learn how to think about data visualization. … there is a deep structure to all statistical visualizations. There is a highly structured framework for thinking about and creating all data visualizations. ggplot2 is based on that framework. By learning ggplot2, you will learn how to think about visualizing data.

Machine learning“While … most beginning data science students should wait to learn machine learning (it is much more important to learn data exploration first), machine learning is an important skill. When data exploration stops yielding insight, you need stronger tools … [and] R has some of the best tools and resources.