Tag Archives: featured

If you are here, I assume you are banging your head against the wall trying to figure out Redux for a React project. If you are looking for a quick start for a React project that has Redux already setup then this is a good boilerplate: http://mikechabot.github.io/react-boilerplate/.

But you are probably still a little confused on how the Redux part of that works…or at least I was. Even the demos on the official Redux site have a lot of complications to illustrate more features, so it takes a while to learn the skeleton structure of the library.

For my example. I’m going to assume a few things:

a working knowledge of React.

familiarity with the single source of truth behind the Flux model in React

a node.js / webpack build environment

familiarity with ES6 JavaScript syntax

The example is going to be an on/off switch and a “light bulb” that also has an on/off state. Here is the code and the deployed example. So there will be one action with two values. Dead. Simple. [If you need more complicated examples, please references some of the more involved tutorials. My example is only meant to show the relationship between the components and the parts of redux.]

I’m going to divide this tutorial into two parts: 1.) the Redux parts and 2.) the connection into React components.

Redux

Let’s first look at the three parts of Redux:

Store — keeps the single state of truth in its state and dispatches actions

Action — the thing that’s dispatched to a reducer

Reducer — a function that takes the previous state and the dispatched action and returns a new state

Here is the diagram I use to help me think about what’s happening.

The one thing I want to draw your attention to is the Dispatch() and action part. The action itself isn’t really doing anything. It’s just a JavaScript object. But the dispatch() method within the store is what actuates the entire process, not the action.

So let’s look at the example’s code.

Action

JavaScript

1

2

3

4

5

6

export defaultfunctionflipSwith(value){

return{

type:'FLIP_SWITCH',

value

}

}

There is nothing special going on here. We are not importing any dependencies. All this is is just a function that returns a JavaScript object like { type: 'FLIP_SWITCH', value: 'off'}. The type property is use by the reducer to determine what type of action is being dispatched. In our simple example the value with be either 'on' or 'off'.

Reducer

JavaScript

1

2

3

4

5

6

7

8

constlightSwitch=(state='off',action)=>{

if(action.type=='FLIP_SWITCH'){

state=action.value

}

returnstate

}

export defaultlightSwitch

Once again there are no dependancies here, so there is nothing special going on here either! The reducer is just a function that takes a state and action as parameters and then does something with those to create a new state. In our case we test the action’s type property to make sure it’s 'FLIP_SWITCH', and if that’s true we set the state to the action’s value. [Either 'on' or 'off'.]

The reducer will return a state to the store.

Store

JavaScript

1

2

3

4

5

6

7

8

9

import{createStore}from'redux'

import reducer from'./reducer'

constinitialState='off'

export defaultcreateStore(

reducer,

initialState

)

Alright now we have some fancy dependencies. We use the createStore() from the redux package to create our store. It takes the reducer and the initial state as parameters. Here I set the initial state to 'off' and import the reducer we just made. This is how the store is aware of the reducer.

State
A word about state. In our example we are using a string as the state. But this could be a JS object instead. Most other example will use a JS object as the state.

React

I don’t want to get into React components and all of that stuff. The two components I made for this example are pretty simple and use props and an event listener / handler. The details of the components don’t matter all too much. The interaction of Redux and React come from the connection and the mapping functions.

This the entire app.jsx code. I’ll go through the important parts and it’s not necessarily in order.

connect()

JavaScript

1

2

3

4

constConnectedApp=connect(

mapStateToProps,

mapDispatchToProps

)(App)

For this App is a React component. What we are doing is connecting the dispatch and state from Redux to our App component. The mapStateToProps and mapDispatchToProps are both functions.

mapStateToProps

JavaScript

1

2

3

4

5

constmapStateToProps=(state)=>{

return{

power:state

}

}

mapStateToProps takes the state from the store and passes into the connected component’s props. In this case we take the state which is a string and pass it into App‘s power prop. It returns a JavaScript object that it merges into the components props.

mapDispatchToProps

JavaScript

1

2

3

4

5

6

7

constmapDispatchToProps=(dispatch,ownProps)=>{

return{

onChange:(value)=>{

dispatch(flipSwitch(value))

}

}

}

So we can get the state into the component but how to we actuate change in the store? With dispatch! So what we are doing here passing a function to the components props that runs a dispatch() method inside of it. Here we are passing the a function that takes the on/off value into a flipSwitch action which is dispatched to the reducer which then updates the store’s state, which because of the mapStateToProps function, updates the components power props.

Provider

JavaScript

1

2

3

4

ReactDOM.render(<Provider store={store}>

<ConnectedApp/>

</Provider>

,document.getElementById('light'))

This isn’t that interesting, but it’s necessary. To make it all work we place our ConnectedApp component inside a Provider component which deals with the store.

App

JavaScript

1

2

3

4

5

6

7

8

9

classApp extendsReact.Component{

render(){

return<div>

<LightBulb power={this.props.power}/>

<LightSwitch onChange={this.props.onChange}power={this.props.power}/>

</div>

}

}

Let’s finally look at the App component. This is what’s connected to the store and the mapStateToProps and mapDispatchToProps functions. You can see the power props are passed to the LightBulb and LightSwitch components. These are just props since the mapStateToProps function is handling all of that for us.

Now the trickier part is taking the onChange function from mapDispatchToProps and placing it in the LightSwitch so it can run the function. This bit of code: onChange={this.props.onChange} accomplishes that. Which the LightSwitch changes it passes the on/off value into the function we defined in the mapDispatchToProps which dispatches our action (with the value) to the store.

Hopefully this tutorial helped you understand how the basic setup of Redux works. To make a more complicated app will require much more advance concepts you can find elsewhere on the internet such as combined reducers, async thunks, more complicated states, etc.

There are three doors. And hidden behind them are two goats and a car. Your objective is to win the car. Here’s what you do:

Pick a door.

The host opens one of the doors you didn’t pick that has a goat behind it.

Now there are just two doors to choose from.

Do you stay with your original choice or switch to the other door?

What’s the probability you get the car if you stay?

What’s the probability you get the car if you switch?

It’s not a 50/50 choice. I won’t digress into the math behind it, but instead let you play with the simulator below. The game will tally up how many times you win and lose based on your choice.

What’s going on here? Marilyn vos Savant wrote the solution to this game in 1990. You can read vos Savant’s explanations and some of the ignorant responses. But in short, because the door that’s opened is not opened randomly, the host gives you additional information about the set of doors you didn’t choose. Effectively, if you switch, you are select all the other doors. If you choose to stay, you are select just one door.

In her answer, she suggests:

Here’s a good way to visualize what happened. Suppose there are a million doors, and you pick door #1. Then the host, who knows what’s behind the doors and will always avoid the one with the prize, opens them all except door #777,777. You’d switch to that door pretty fast, wouldn’t you?

To illustrate that in the simulation, you can increase number of number of doors in the simulator. It becomes pretty clear that switch is the correct choice.

For a project I was working on, I needed a quick, simple solution to make a dynamic table based on data sent back from an AJAX call. I used jQuery to build and manipulate the table HTML, since it was quick to use jQuery and it’s already in my project.

After considering a few different ways to approach this, I decided different arrays would be the easiest way to handle the data. The data looks like this:

JavaScript

1

2

3

4

5

6

7

8

vardata={

k:['Name','Occupation','Salary','Roommate'],

v:[['Chandler','IT Procurement Manager','$120,000','Joey'],

['Joey','Out-of-work Actor','$50,000','Chandler'],

['Monica','Chef','$80,000','Rachel'],

['Rachel','Assistant Buyer','$70,000','Monica'],

['Ross','Dinosaurs','$100,000','No Roommate']]

}

It’s a JavaScript object with two keys: one for the header that I abbreviated k and the main data values which have the key of v. The header is just an array of strings, while the values are an array of arrays. I specifically designed this code to work within these parameters, so there could be more checks built in, but the data source is rather rigid.

To make the Table class, I defined the attributes:

JavaScript

1

2

3

4

5

6

functionTable(){

//sets attributes

this.header=[];

this.data=[[]];

this.tableClass=''

}

Using this prototype code is a little bit of overkill, but it can be reused and extended. I plan on having the application update with new data and possibly other features. Creating a prototype allows that to be a little bit easy and cleaner.

I have three setter methods, which just allow the Table object to have it’s attributes set and have the data set.

JavaScript

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

Table.prototype.setHeader=function(keys){

//sets header data

this.header=keys

returnthis

}

Table.prototype.setData=function(data){

//sets the main data

this.data=data

returnthis

}

Table.prototype.setTableClass=function(tableClass){

//sets the table class name

this.tableClass=tableClass

returnthis

}

All the methods I’ve written have return this in them. That allows method chaining, which makes the implementation of the code a lot simpler. The meat of the code is in the build method.

JavaScript

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

Table.prototype.build=function(container){

//default selector

container=container||'.table-container'

//creates table

vartable=$('<table></table>').addClass(this.tableClass)

vartr=$('<tr></tr>')//creates row

varth=$('<th></th>')//creates table header cells

vartd=$('<td></td>')//creates table cells

varheader=tr.clone()//creates header row

//fills header row

this.header.forEach(function(d){

header.append(th.clone().text(d))

})

//attaches header row

table.append($('<thead></thead>').append(header))

//creates

vartbody=$('<tbody></tbody>')

//fills out the table body

this.data.forEach(function(d){

varrow=tr.clone()//creates a row

d.forEach(function(e,j){

row.append(td.clone().text(e))//fills in the row

})

tbody.append(row)//puts row on the tbody

})

$(container).append(table.append(tbody))//puts entire table in the container

returnthis

}

I’ve annotated most of the code, but basically this creates jQuery objects for each of part of the table structure: the table (table), a row (tr), a header cell (th) and a normal table cell (td). The clone() method is necessary so that jQuery creates another HTML element. Otherwise it will keep on removing, modifying and appending the same element.

Using the prototype we just created is rather easy, we did the hard part already. We use the new keyword to instantiate a new object. This allows us to create many different independent Table objects which can be manipulated individually within the application.

JavaScript

1

2

3

4

5

6

7

8

9

//creates new table object

vartable=newTable()

//sets table data and builds it

table

.setHeader(data.k)

.setData(data.v)

.setTableClass('sean')

.build()

Above is the short snippet of code which has method chaining. This allows us not to have to write separate lines of code for each method which would look like table.setData(). I used the setHeader() to set the array (data.k) which populates the table’s header. The setData() method sets the array of an array (data.v) as the source of the data for the rest of the table.

Finally, the build() method uses the data we just set to actually run the code that manipulates the HTML and this is what you see in your web browser.

Before using it on a web page, there has to be some HTML. (And some CSS so that the table looks decent.) The most important part is that the div container has the class of "table-container". The Table class is by default looking for that class to append the table to. You can customize that by changing using a jQuery selection string as a parameter in the table.build([jQuery selection string]) method.

D3 visualizations work by manipulating elements in the browser window. This short tutorial will demonstrate the very basics of that. This is also a working, simple demonstration of the interplay of HTML, CSS and JavaScript from the introduction page in this D3 tutorial set.

For the sake of making this simple, everything will come from one HTML document, which can be found in my GitHub . This will container the HTML and JavaScript. You can [and should] separate the JavaScript into its own file on bigger projects.

In this small project, we will start with a simple div container with the class of “container”.

Python

1

<div class="container"></div>

Right now this doesn’t do anything, so it’s not worth showing. But if D3 code is added to create a blue box, it might look a little more interesting. [Provided you find blue boxes interesting.]

Blue Block

XHTML

1

2

3

4

5

6

7

8

9

10

11

12

<script type="text/javascript">

//creates the blue block

d3.selectAll('.container')

.append('div')//adds another div as a child of the container

.attr('class','new-block')//gives the new div the class of new-block

.style('width','300px')//sets width

.style('height','100px')//sets height

.style('background-color','#0066cc')//sets background color

.style('color','white')//sets font color

.text('Blue block')//sets the text that the div will contain

</script>

Above is a simple example of a basic D3 procedure. It can helpful to think about this command as having two parts: getting a DOM [browser] element to manipulate and giving instructions for those manipulations. Here is what the code is doing:

The select statement [d3.selectAll()] code is finding every instance of the class of “container”. [There is only one element on the page.]

The append() method adds another div as a child element inside of the container div.

The attr() method gives the new div a class of “new-block”.

The style() method gives the new div several style properties

The text() method puts the text “Blue block” into the div

The style() and attr() methods can be a little confusing since they do similar things, but attr() will place attributes in the HTML tag, while style() writes in-line CSS for the element, which will override any CSS stylesheet you load in the header. These are just a few of the different methods you can use for a D3 DOM selection object, and you can find more in the D3 API reference.

Blue Block with Functions

Creating the blue block was a blast, but adding some functionality might make this a little more useful. Let’s make the block change color on a click and then remove it on a double click.

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

<script type="text/javascript">

//creates the blue block

d3.selectAll('.container')

.append('div')

.attr('class','new-block')

.style('width','300px')

.style('height','100px')

.style('background-color','#0066cc')

.style('color','white')

.style('cursor','pointer')

.text('Blue block')

d3.selectAll('.container')

.append('div')

.attr('class','new-block')

.style('width','300px')

.style('height','100px')

.style('background-color','#0066cc')

.style('color','white')

.style('cursor','pointer')

.text('Blue block')

//click

d3.selectAll('.new-block').on('click',function(){

//turns block to orange

d3.select(this)

.style('background-color','#FF9821')

})

//double click

d3.selectAll('.new-block').on('dblclick',function(){

d3.select(this)

.remove()

})

</script>

The first part of the script to create the blue block is the same, except I’ve doubled the code to have to blue blocks and I added new code that interacts with the blue block. The on() method allows you to attach an event listener to elements rendered in the browser. These will wait until a certain event happens. In the example it executes a function to turn the box orange when the blue box is clicked. You can put many different instructions in here and aren’t limited to manipulating the element that is being clicked. Below is an illustration that I find useful for visualizing how event listeners are attached. I will devote another post to the details of D3 event listeners.

You might notice the d3.select(this) in the code. this is a fun keyword in JavaScript syntax which deals with scope, and the this in the code refers to the specific DOM element which was clicked or double clicked. If you click on the left block, only that block turns orange. You could change the code replace this with '.block-new' and clicking one button will change both buttons to orange.

Having the d3.select(this) code blocks in the event listener function makes it so they are only executed when the event happens. The block’s background color is changed to orange [#FF9821] when it’s clicked. The remove() method deletes any DOM elements within the selection including the children elements. This comes in handy when you need to rebuild or update a data visualization.

[Next]

Data visualization is important, really important. I can’t be more blunt than that. We are able to process much more information faster by seeing a visual representation than we could look at a table, database or interacting with a spreadsheet. I will be writing a series of posts that explore some of the foundations D3 is built on along with how to create engaging data visualizations using it.

D3 is a powerful tool that allows you to create interactive data visualizations for the web. Understanding how D3 works starts with understanding how modern web pages are designed.

If you have found this page, you probably at least have some knowledge of how to make a modern website: HTML, CSS, JavaScript, responsive design, etc. D3 uses basic elements from these components of web design to create the visualizations. This by no means the only way to create interactive visualizations, but this is an effective way to produce them.

Before jumping into D3 nuts and bolts, let’s look at what does each of these components do. [If you already know this stuff, feel free to skip ahead…once I get the other posts built out.]

In the most simplistic terms, HTML provides the structure of the webpage, CSS provides the styling and formatting, and JavaScript provides the functionality of the site. The browser brings these three components together and interprets them into something the end user (you) can understand and use. Sometimes one component can accomplish what the other does, but if you stick to this generalization you’ll be in good shape.

To produce a professional-looking, fully-functional D3 data visualization you will need to understand, write and manipulate all three components.

HTML

The most vivid memories I have of HTML is from the websites of the late 90s: Geocities, Angelfire, etc. HTML provides instructions on how browsers should interpret information; it organizes the information. Everything you see on a webpage has corresponding HTML code.

If you look at the source HTML or inspect one of this site’s page you’ll see some of the structure. When HTML renders in the browser these elements are referred to DOM elements. DOM stands for Document Object Model, which is the structure of a webpage.

Looking at the DOM tree you can see the many of the div containers that provide structure for the how the site is laid out. The p tags contain each paragraph in the content of my posts. h1, h2 and h3 are subheadings I’ve made to make the post more organized. You also notice some attributes especially class which have many uses for CSS, JavaScript and D3. Classes in particular are used to identify what function that DOM element plays in JavaScript or how to style it in CSS.

CSS

A house without painted walls or decorations is pretty boring. The same thing happens with bare bones HTML. You can organize the information, but it won’t be in an appealing format.

Most sites have style sheets (CSS) which sets margins, colors, display options, etc. Style sheets have a specific syntax which identifies HTML elements by type, class or id. This identification and selection concept is used extensively in D3.

Above is some CSS from this site. It contains formatting instructions for elements of the class “page-links”. It includes instructions for the font size, margins, height, width and to make the text all uppercase. The advantage of CSS is that it keeps formatting away from the structure of the HTML allowing you to format many elements at once. For example if you wanted to change the color of every link, you could easily do that by modifying the CSS.

There is an alternative to using CSS style sheets and that’s by using inline style definitions.

Python

1

<pstyle="text-align: center;">

Inline styles use the same markup as the CSS in the style sheets. Inline styles

control only the element they are in

OVERRIDE any CSS styles [without an !important tag]

The code above overrides the normal paragraph’s style property which aligns it left. Using inline styles are generally bad for web design, but it’s important to understand how they work since D3 manipulates inline styles often.

JavaScript

JavaScript breathes life into your web page. It certainly not the only way to have your website become interactive or build programming into it, but it is widely used and supported in the popular browsers. D3 is a JavaScript library, so you will inevitably have to write JavaScript to use it.

For D3 visualization, JavaScript will be use to

Manage and manipulate data for the visualization

Create DOM elements

Manipulate DOM elements

Destroy DOM elements

Attach data to DOM elements

JavaScript will be used insert elements onto the page, it will also be used to change colors and styles of those elements. You might be able to see how this could be useful. For example JavaScript could map data points to an element’s position for a scatter plot or to an element’s height or width for a bar chart.

I bolded the last function D3 does, attaching data to elements, because it’s so critical to D3. This allows you to attach a data point beyond x, y data to allow for rich visualization.

Above is data attached to a D3 visualization I made for FanGraphs. This is a simple example, but I was able to attach data detailing the team’s name, id, league, ERA and FIP. Using the attached data I was able to create the graph and tooltips. More complex designs can take advantage of the robust data structure D3 provides.

[Next]

I’ll look at how to set up a basic project by organizing data, files and code.

I’ve been in contact with the team over at Stattleship. They have a cool API that allows you to get various stats for basketball, football and hockey. I used data from that API to create the following data visualization for their blog. The visualization shows the offensive and special team yards gained by each team remaining in the playoffs. The yardage is totaled for the entire season as well as the one playoff game each team played. I’ve displayed the points off of offensive TDs and special teams scoring, and that score is color coded with to wins and loses. A black background is a win, and a white background is a loss.

The backwards K is normally used to denote a called third strike in a strikeout. It’s typically written on a scorecard. I’ve been looking for the backwards K so I can denote the strikeout looking on Twitter, and I finally found it:

ꓘ

(for unsupported browsers — Chrome)

The easiest way to use this character is to copy and paste the backwards K from above and save it in a note or something you can copy and paste from routinely. This character is actually from Apple’s implementation of the Unicode from the artificial, Latinized version of the Lisu alphabet. This alphabet contains an upside-down, turned K which looks similar enough to a backwards K I think this pass on Twitter.

If you don’t see the backwards K in the block above, you computer or mobile device probably isn’t using a font that supports that specific character. It’s supported on Macs and iPhones (as well as the Edge browser in Windows 10).

Data Manipulation: Subsetting

Making a subset of a data frame in R is one of the most basic and necessary data manipulation techniques you can use in R. If you are brand new to data analysis, a data frame is the most common data storage object in R and subsets are a collection of rows from that data frame based on certain criteria.

Data Frame

V1

V2

V3

V4

V5

V6

V7

Row1

Row2

Row3

Row4

Row5

Row6

Subset

V1

V2

V3

V4

V5

V6

V7

Row2

Row5

Row6

The Data

For this example, I’m using data from FanGraphs. You can get the exact data set here, and it’s provided in my GitHub. This data set has players names, teams, seasons and stats. We are able to create a subset based on any one or more of these variables.

The Code

I’m going to show four different ways to subset data frames: using a boolean vector, using the which() function, using the subset() function and using filter() function from the dplyr package. All of these functions are different ways to do the same thing. The dplyr package is fast and easy to code, and it is my recommended subsetting method, so let’s start with that. This is especially true when you have to loop an operation or run something that will be run repeatedly.

dplyr

The filter() requires the dplyr package to be loaded in your R environment, and it removes the filter() function from the default stats package. You don’t need to worry about but it does tell you that when you first install and load the package.

#Finds players not in the NL East and who have more than 30 home runs.

data.sub.5<-filter(data,!(Team%in%NL.East),HR>30)

The filter() function is rather simple to use. The examples above illustrate a few simple examples where you specify the data frame you want to use and create true/false expressions, which filter() uses to find which rows it should keep. The output of the function is saved into a separate variable, so we can reuse the original data frame for other subsets. I put a few other examples in the code to demonstrate how it works.

Built-in Functions

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

#method 1 -- using a T/F vector

data.sub.1<-data[data$Team=='Marlins',]

#method 2 -- which()

data.sub.2<-data[which(data$Team=='Marlins'),]

#method 3 -- subset()

data.sub.3<-subset(data,subset=(Team=='Marlins'))

#other comparison functions

data.sub.4<-data[data$HR>30,]#greater than

data.sub.5<-data[data$HR<30,]#less than

data.sub.6<-data[data$AVG>.320&data$PA>600,]#duel requirements using AND (&)

data.sub.8<-data[data$HR>40|data$SB>30,]#duel requirements using OR (|)

data.sub.9<-data[data$Team%in%c('Marlins','Nationals','Mets','Braves','Phillies'),]#finds values in a vector

data.sub.10<-data[data$Team!='- - -',]#removes players who played for two teams

If you don’t want to use the dplyr package, you are able to accomplish the same thing uses the basic functionality of R. #method 1 uses a boolean vector to select rows for the subset. #method 2 uses the which() function. This function finds the index of a boolean vector of True values. Both of these techniques use the original data frame and uses the row index to create a subset.

The subset() function works much like the filter() function, except the syntax is slightly different and you don’t have to download a separate package.

Efficiency

While subset works in a similar fashion, it doesn’t perform the same way. While some data manipulation might only happen once or a few times throughout a project, many projects require constant subsetting and possibly from a loop. So while the gains might seem insignificant for one run, multiply that difference and it adds up quickly.

I timed how long it would take to run the same [complex] subset of a 500,000 row data frame using the four different techniques.

Time to Subset 500,000 Rows

Subset Method

Elapsed Time (sec)

boolean vector

0.87

which()

0.33

subset()

0.81

dplyr filter()

0.21

The dpylr filter() function was by far the quickest, which is why I prefer to use it.

The full code I used to write up this tutorial is available on my GitHub .

Before I get too far I don’t actually analysis taco emojis. At least not yet. I, however, give you the tools to start parsing them from tweets, text or anything you can get into Python.

This past month Apple released their iOS 9.1 and their latest OS X 10.11.1 El Capitan update. That updated included a bunch of new emojis. I’ve made a quick primer on how to handle emoji analysis in Python. Then when Apple released an update to their emojis to include the diversity, I updated my small Python class for emoji counting to include to the newest emojis. I also looked at what is actually happening with the unicode when diversity modifier patches are used.

With this latest update, Apple and the Unicode Consortium didn’t really introduce any new concepts, but I did update the Python class to include the newest emojis. In my GitHub the data folder includes a text file with all the emojis delimitated by ‘\n’. The class uses this file to find any emoji’s in a unicode string which has been passed to the add_emoji_count() method.

Building off of the diversity emoji update, I added a skin_tone_dict property of the EmojiDict class. This property returns a dictionary with the number of unique human emojis per tweet and their skin tones. This property will not catch multiple human emojis written if they in the same execution of the add_emoji_count() method

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

importsocialmediaparse assmp#loads the package

counter=smp.EmojiDict()#initializes the EmojiDict class

#goes through list of unicode objects calling the add_emoji_count method for each string

#the method keeps track of the emoji count in the attributes of the instance

forunicode_string incollection:

counter.add_emoji_count(unicode_string)

#output of the instance

printcounter.dict_total#dict of the absolute total count of the emojis in corpus

printcounter.dict#dict of the count of strings with the emoji in corpus

printcounter.baskets#list of lists, emoji in each string. one list for each string.

printcounter.skin_tones_dict#dictionary of unique emoji emojis aggregated by the counter.

Above is an example of how to use the new attribute. It is a dictionary so you can work that into your analysis however you like. I will eventually create better methods and outputs to make this feature more robust and useful.

R, a statistics programming language environment, is becoming more popular as organizations, governments and businesses have increased their use of data science. In an effort to provide a quick bootcamp to learn the basics of R quickly, I’ve assemble some of the most basic processes to give a new user a quick introduction to the R language.

This post assumes that you have already installed R and have it running correctly on your computer. I recommend getting RStudio to use to write and execute your code. It will make your life much easier.

Getting Started

First R is an interactive programming environment, which means you are able to send commands to its interpreter to tell it what to do.

There are two basic methods to send commands to R. The first is by using the console, which is like your old-school command line computing methods. The second method is more typically used by R coders, and that’s to write a script. An R script isn’t fancy. At its core it’s a text document that contain a collection of R commands. Then when the code is executed it is treated like a collection of individual commands being feed one-by-one into the R interpreter. This differs on how other, more fundamental programming languages work.

Basics

Comments are probably the best place to start, especially because my code is chock-full of them. A comment is code that is fed to R, but it’s not executed and has no bearing on the function of your script or command. In R comments are lines prefaced with a #.

Python

1

2

3

#comments start with #-signs

9-3#basic math (this doesn't save this in a variable)

One of the most basic thing you could use R for is a calculator. For instance if we run the code 9-3, R will display a 6 as the result of that code. All of this is rather straight forward. The only operator you might not be familiar with if you are new to coding is the modulus operator, which yields the remainder when you divide the first number by the second. This gets used often when dealing with data. For example, you can get a 0 for even number and 1 for odd number if you take you variable use the modulus operator with the number 2.

Python

1

2

3

4

5

6

7

#basic operations

#yields numeric value

1+2#addition

3-2#subtraction

3*2#multiplication

4/5#division

3%%2#modulus (remainder operator)

Beyond the basic math and numeric operations you can do, R has several fundamental data types. NULL and NA are representative of empty objects or missing data. These two data types aren’t the same. NA will fill a position in an vector or data frame. The details are best left for another entry.

Python

1

2

3

4

5

6

7

8

9

10

#basic data structure

NULL#empty value

NA#missing value

9100#numeric value

'abcdef'#string

TRUE#boolean

T#equilvant form

FALSE

F

Numeric values can have mathematical operations performed on them. Strings are essentially non-numeric values. You can’t add strings together or find the average of a string. In any type of data analysis, you’ll typically have some string data. It can be used to classify entries in categorically such as male/female or Mac/Windows/Linux. R will treat these like factors.

Finally, boolean values (True or False) are binary logical values. They work like normal logic operations you might have learned in math or a logic class with AND (&&) and OR (||) operators. These can be used in conditional statements and various other data manipulation operations such as subsetting.

Python

1

2

3

#logical operators

T&&F#and

F||T#or

Now that we covered the basic operations and data types, let’s look at how to store that — variables. To assign a value to a variable it’s rather easy. You can use a simple equation or the traditional R notation using an arrow.

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

x<-1#basic assignment

x=1

#####Acceptable Variables

x<-1

X<-1

X1<-1

X.1<-1

X_1<-1

########################

####UNACCEPTABLE Variables

1X<-1

X-1<-1#CODE WILL NOT WORK

X,1<-1

#######################

Variables must begin with a letter and they are case-sensitive. Periods are acceptable faux separators in variable names, but that doesn’t translate to other programming languages like Python or JavaScript, so that might factor in how you establish naming conventions.

I’ve mentioned vectors a few times already. They are an important data structure within R. A vector is an ordered list of data. Typically, thought of as numeric data, but character (string) vectors are often used in R. The c() operator can create a vector. It’s important that vectors contain the same type of data: boolean, numeric or character. If you mix types it will force values into another type. And you can assign your vectors to variables. In fact, you can store just about any thing in R to a variable.

Python

1

2

3

4

5

6

x.vector<-c()#vector operator

x.vector<-c(1,2,3,4,5,6)#creates a vector (typically numeric)

x.vector<-c(T,1)

x.list<-list('A',12,'b')#creates a list (not used for numeric operations)

mean(c(1,3,2))

Lists are created with the list() command. They are used more for storage and organization than for data structure. For example you could store the mean, median and range for a set of data in a list. A vector would house the data used to calculated said summary stats. Lists are useful when you begin to write bigger programs and need to shuffle a lot of things around.

The basic statistic operators are listed below. All of these require a vector to operate on.

Python

1

2

3

4

5

6

7

8

9

10

11

#basic stats

x.vector<-c(10,11,12,12,10,11,20,9)#puts your data into a vector

mean(x.vector)#takes mean of vector

median(x.vector)#median of vector

max(x.vector)#maximum of vector

min(x.vector)#minimum of vector

range(x.vector)#yields a vector with a range

sd(x.vector)#standard deviation

var(x.vector)#variance

Handling Data

Above we discussed some of the building blocks of basic analysis in R. Beyond introductory Statistics classes, R isn’t very useful unless you can import data. There are many ways to do this since data exists in many different formats. A .csv file is one of the most basic, compatible way data is stored to be used between different analytical tools.

Before loading this data file into R, it’s a good idea to set your working directory. This is where the data file is stored.

Python

1

2

3

#load in data

setwd('**folder path**')#sets your working directory

#specific to each computer

Next you can use the read.csv() function to ingest a .csv file into R. This call won’t save the data in a variable, it just brings it in as a data frame and show it to you.

Python

1

2

3

4

5

read.csv('data_bryant_kobe.csv')#reads the data into R

#does not save it into a variable

data<-read.csv('data_bryant_kobe.csv')#reads the data and saves it

#into a variable called 'data'

Data frames are the primary form of data structure you’ll encounter in R. Data frames are like tables in Excel or SQL in that they are rectangular and have a rigid schema. However, at a data frame’s core are a collection of equal-length vectors.

If you assign the data frame output of the read.csv() function to a variable, you can pass around the data frame to different data manipulation functions or modeling functions. One of the most basic ways to manipulate the data is to access different values within the data frame. Below are several different examples on how to get to values, rows or columns in a data frame.

The basic concept is that data frames can be accessed by row and column number. [row, column] And that an entire row or column can be accessed by omitting the dimension you aren’t trying to retrieve. You can retrieve individual fields (variables) by using the $ sign and using the variable name. This is the method I use most often. It requires you knowing and using the name of the variables, which can make your code easier to read.

Python

1

2

3

4

5

6

7

8

#accessing values

row<-3

column<-2

data$Age#returns the Age variable column

data[row,column]#individual value

data[data$Age<=25,]#returns entire row

data[,column]#returns entire column as a vector

data$Age[3]#returns entire column as a vector

By accessing rows, you can create a subset of data by using a logical argument to filter out your data set.

Python

1

2

3

#creating a quick subset

data.U25<-data[data$Age<25,]#creates an under-25 set

data.O25<-data[data$Age>=25,]#creates a 25 and older set

The code above creates two new data frames which separate Kobe Bryant’s season stats into an under-25 data set and a 25 and under data set.

Relationships Between Variables

Correlation is often used to summarize the linear relationship between two variables. Getting the correlation in R is simple. Use the cor() function with two equal length vectors. R uses the corresponding elements in each vector to get a Pearson correlation coefficient.

Python

1

2

3

4

5

6

#correlation between different variables within the subset

cor(data.U25$MP,data.U25$PTS)

cor(data.O25$MP,data.O25$PTS)

cor(data$MP,data$PTS)#correlation with two related vectors from data set

cor(data$MP,data$FTpct)

A simple linear model can be made by using the lm() function. The linear model function requires two things: a formula and a data frame. The formula uses a tilde (~) instead an equal sign. The formula represent the variables you would use in your standard ordinary least squares regression. The data parameter is the data frame which contains all the data.

Python

1

2

3

#create a basic linear model

linear.model<-lm(PTS~MP,data=data.U25)

linear.model<-lm(PTS~MP+Age,data=data.U25)

The summary() function will take the linear model object and displays information about the coefficients of your linear model.