I decided to try my hand at building a “working” BB8 as well. Starting in January with the goal of being ready for PortCon (Portland, ME ) in June.

The Sphere

There are three primary methods for constructing the sphere.

Purchase a pre made plastic sphere( two halfs)

This can be expensive. There is also the issue of assembling the sphere

3D print various panels and then assemble them to form a sphere.

Reading what others have said about this process its not simple. Because of the size of the panels and their complexity this is a difficult process. Besides being expensive its hard on the printer. A number of people report having to repair or replace their printers.

Construct a sphere from a material such as fiber glass.

This started out to be the common method most people used. An early DIY project made this seem much simpler than it really is. This method involves covering a ball,beach or yoga, with a paper/canvas mache mixture. The BB8 community decided that the body is about 20 cm in diameter. The ball in the DIY project is not that big. As it turns out, finding a beach ball in Maine in January is impossible. So it was off to Amazon.

All three balls are listed as 20 cm. Hmmmm…

First attempt with paper and canvas, following the DIY project.

Clearly this was not going to work. I decided to use fiber glass instead of canvas. I also found a 20 cm ball at a party store.

The Head

The Drive Train(part 1)

June, Portcon Portland Maine

Despite the drive train issues, BB8 still spoke and the light worked. So it was off to the con.

Its back to the drawing board with the drive system…

The new drive mechanism.

I started over with new motors, frame and servos. So far its looking a lot better.

The Idea

Being the dad of a teenage daughter means I listen to a lot of the current music. Lady Gaga, Taylor Swift. Recently is all about One Direction. As “” recently said “One Direction owns the internet in 2015. Sometimes I hear “this is a sad song” or “this is a happy one”. What could I learn about their music using Neo4j? Could one derive any sort of sentiment from the lyrics? Could I get my daughter interested in this? Only one way to find out…

How to start

The first step was to learn more about the group. There are currently four members but for most of their albums there were five. Harry Stiles, Niall, Liam, Zayn and Louis.They have released five albums, Four, Take me home, Up all night, Midnight memories and Made in the A.M. With the help of my daughter we found a site that had the lryics to all of the songs. What I found was that while some of the song files contained information about who was singing what section, many did not. I was hoping that maybe the sentiment could be aided by knowing the singer. Maybe Harry always sings sad/ break up songs(he did date Taylor Swift). Since this information isn’t consistent I couldn’t count on it.

Song sentiment ?

I felt it was important to have the ability to track lyrics by location in the song, row and column. This way one could query “what words appear the most often at the start(0,0) of a song? How often do certain word combinations( “I” and “you”) appear on the same line? This last question could be useful in better understanding sentiment?

Tools

Tools: Python, py2neo, R and RNeo4j.

The Model

The first step was to organize the songs into files by album. Once this was done it was simple to get Python to read in a list of albums, songs titles, and lyrics(words). The graph…

I decided that a Group node would refer to a band or singer. A group would be made up of members and members were artists. For bands this is fine. I made the choice to treat single acts the same as way. So Lady Gaga or Taylor Swift would be a considered a group,member and artist.

Nodes

Group

Member

Artist

Album

Song

Lyrics

Relations

Album BY Group

Lyric IN Song

Song ON Album

Member ISA_ARTIST Artist

Group HAS_MEMBER Member

Graph

For the gist I restricted the data to one song per album and reduced the lyrics by two thirds. Even with this there are still 581 lyric nodes. There are 232 unique words. The difference is due to words being repeated but in different locations. The word “you” is found 28 times in the five songs.

Show distinct lyrics in the song “If I Could Fly”

Query 4

Show all lyrics in Act My Age.

Show all artists and members for the group

Show all songs on all of the albums. For the gist there is only one song per album.

Show all albums and members for the group

Show all of the lryics for the song “Kiss you”. There are some connections of lryics to other songs. This is becuase those lryics are used in the same location. The lryic “Baby” is used in “Kiss Me” and “What makes you beautiful” in the same row and column.

A query to find songs where the words ‘I’ and “you” are on the same line. The query works well in Python since I can filter out return values of 0. This type of search will be help when looking for phrases, words on the same line.

Sentiment and R

Below is a bar chart of the top ten most common lyrics. “I” and “you” are popular.

Sentiment The last thing to consider is sentiment. Using the simple process of positive and negative words I’d like to see if one make a determination of sentiment. There isn’t a song word list that I could find so I elected to use the AFINN list. Following examples from Jeffrey Breen and Andy Bromberg I was able to get some results. I didn’t divide the songs up into training and test sets, instead I picked two songs and processed them. My daughter suggested that “Best Song Ever” would be happy and “If I could Fly” would be sad.

This returned a list of lryics. Next I counted the number of lyrics that matched a positive or negative word in the AFINN list. I classified the words into “reg”, scale 1-3 and “very” scale 4-5 for both positive and neg.

Using R functions naiveBayes() and predict(). The method is very simple but the results do follow that Best Song Ever “happier” then If I Could Fly. It would be good to get One Directions opinion on this.

“Best Song Ever” reg very positive 10 3 negative 3 0

“If I Could Fly” Reg very positive 1 0 negative 4 0

One thing I noticed is that simple word matching isn’t sufficient.For movie reviews or emails this may work. Song are more complex.

Example. A happy song might have the line “I love you” while a sad song might have a line “I used to love you”. Both have the positive word “love” in them but the second line could be viewed as sad, love lost. This is where querying lyrics on the same line could help. Its more complex than matching positive and negative words.

Conclusion This was fun and I got a little Father daughter time in as well. I’d like to pursue this to see what can be done by considering phrases and connected words.

Like a lot of people I grew up with video games. But these were quit different from what we have today. Space invaders, Lunar lander, Missile Command and Asteroids look like cave drawings when compared to what is available today. I have experimented with tools like LightWave and Maya but their costs are prohibitive and they are not really suited for amateur game developers. Unity 3D, on the other hand, is ideally suited for those just getting started with game development. In addition, it can easily support more complex professional games. Their recent announcement for free support for mobile applications means its time for me to make the leap.

A modern game typically requires a lot of people, mainly artists, to create scenes and characters. I can use tools such as Blender but I am not nearly proficient enough to build the images as well as create the game. I need a game where I can leverage existing art work and just focus on the mechanics of the game and learning Unity.

What I need is a 2D side scrolling space game. I decided on trying to replicate the Lunar Lander game.

It won’t be an exact match but instead more updated and something that fits with the Unity model. Look around in the Apple and Google app stores and you can find a number of these games. Some are 2D while others are 3D and much more realistic. I am not trying to be the next “Flappy Bird” so I don’t expect to compete with other games. Its all about the learning.

Unity 3D

A lot can bee done with Unity right out of the box. Anything that requires reacting to a user(player) in going to bring up the need to add custom coding. There are two choices for doing this, C# and Javascript. A lot of the tutorials and examples are in Javascript so I’ll stick with this.

The Game

The point of game is to land the ship on the surface before you run out fuel and crash. In the earlier games the ship would rotate as well as translate. Correcting the rotation makes the game much more difficult to play. For this version I’ll stick with simple translation left, right, up and down. Of course there needs to be a surface to land on. A simple flat surface is boring. Adding some sort of obstacles will make it a bit more challenging.

Things to consider:

The ship

Obstacles

Landing

Movement

Gravity

Fuel

Crashing

Player controls

Scoring

Sound

The Ship

Unity can import models from many tools such as Blender and Max 3D. For a mobile game the model can not be too complex. The more detailed the poorer the game performance will be. I found a reasonably sized lunar lander model from NASA that is free to use.

Obstacles

In the original game the surface changed from flat to mountains. I decided to add rocks to a flat surface. In order to make things a bit more complex I added the rocks at random locations and sizes.

Landing

The rocks provide obstacles to avoid but there needs to be a ‘safe’ landing place. These are marked ‘green’ so the player can be seen. Since the rocks are randomly placed the landing places need to be adjusted as well. The process is to place a landing spot and then place the rocks. The code has to make sure the rocks are not covering the landing place and that there is enough room for the lander.

Startup code to build the scene:

Declare the rocks and landing pads

var rocks: Transform[];
var landingPads: Transform[];

Find the game object tagged GUI so that we can determine the player’s level. The landing pads are adjusted differently once the player is beyond level one.

Create a 1000 rocks. Each rock is generated in a random x location. The height of each rock is also random( y direction). The game is 2D but I am using Unity in 3D mode. For creating the rocks I am creating a 3D field. At some point I may change the game to be more 3D. Each rock is check to make sure that it doesn’t overlap with a landing pad. I didn’t want the code to get stuck in the overlap process so after 10 tries I give up.

A lot of values are hardcoded simply for expedience. Good software practice would be to use variables or contestants

Movement

Since the game has more than one or two controls it requires the addition of buttons. Keyboard controls are not an options and multi-touch is complicated. I need to control the main engine(up), left and right thrusters and a pause button.

An audio file is played when the engine is on. While the engine button is pressed the emitter is set too true

// if the Emitter is not running then fire it
// and play the sound
// then move ship up
if(engineThruster.emit == false)
{
GetComponent.().PlayOneShot(engineSound);
engineThruster.emit = true;
}
moveShip_up();

The assumption is that the planet has gravity. I have left the gravity setting standard as Unity sets it.

Fuel

Fuel usage is adjusted when ever the engine is running. In the FixedUpdate() Unity function the fuel is adjusted:

fuelMeterCurentValue -=fuelLossRate*Time.deltaTime;

The term Time.deltaTime increments the fuel usage according to the FixedUpdate() rate. It is standard in Unity to do this when doing something in the fixed update call.

Crashing

There are two ways to fail a landing. One is to land on rocks. The other is to land too fast. A vertical velocity indicator turns red when the ship is landing too fast. When the ship touches the landing pad the velocity is checked. The function OnCollisionEnter() is called when two objects touch. In this case it will be the ship and either a landing pad or a rock. setting Time.timeScale to zero stops the game play. the GUI.guiMode is set to either win or lose. This will cause the correct screen to be displayed and the score to be adjusted.

Since this is a mobile game there needs to be buttons for the player. A single touch would work if it was to run the lander engine. Left and right translations are harder. Touch to the left of the lander could go left and the same for right. Since the lander moves it could move under the touch point and cause the movement to change. Buttons just seem easier.

Unfortunately Unity’s UI is not straight forward.The placement and operation of a button is pretty simple. Buttons are GUITexture components. Getting the position and sizing correct for different size devices is a challenge. There is talk that future versions of Unity will have better UI tools.

Scoring is pretty straight forward. Land successfully and you get a point and proceed to the next level. Crash and you have to repeat the level. At each level the landing spots get harder to find. As the level increases I need to increase the fuel(or lower the rate at which its is used).

Sound

Sound is handled from an AudioSource component.

GetComponent.().PlayOneShot(engineSound);

This plays the sound once. As long as the button is held down the sound will be played over and over. Playing the sound in a loop is possible for something like background music. For sounds like the engine or thrusters I need short burst of sound.

Screen Shots

The ship approaching a landing pad. The vertical velocity is in white and positive. This indicates that the ship is moving up at rate within the range for landing.

Since the landing pads are randomly placed I found it hard to locate them and no run out of fuel. I added a overhead view in the upper right corner to guide the player towards a landing pad.

The left corner shows the fuel and velocity levels.

The ship over the rocks. The vertical velocity is in red and negative. This indicates that the ship is moving down at rate too large to land.

Goggle Play

I decided to put the game on Goggle Play just to see how this process works.

Update:I see one person has complained that at a high level you just crash into the rocks. It could be that this is a fuel issue. The landing pads are too far away for the fuel usage rate.

The task is to query the system each and store the results. The goal is to have sufficient data to process through a Hadoop/Spark process and perform text analysis. Many of the postings, especially from agencies, are duplicates. I want to see how well one could match job posting using Spark machine learning clustering.

Since the API will also return lat and lon there is the opportunity
to do some spatial analysis.

The API for the job site lets you filter by keyword, state, city, date and employer or employer/agency.
You can also limit the data returned each time. Using the ‘-‘ with a keyword will ignore listings that include that word.
At this stage I want to ignore jobs from agencies and contract jobs. This is because many agencies post the same job and many ignore the location,i.e post jobs in MA that are located in CA.

For the second part of this experiment I will change this to pick up all jobs and try to use to classification to identify similar jobs.

I define several lists:
1. The states to check
2. The language keywords to use
3. Skill set keywords

It is probably over kill and I could simply skip this step and go right to the database. I really like the compactness of the object and it makes the code look cleaner.

for each languageset
for each state
query()
convert the data to a result object
get the url to the main posting
get the the page
use BeautifulSoup to parse the html
get the content section of the page
store the result in Neo4j database

add to Neo4j
use graph.merge_one() to create state,city and language nodes

create new Job node(jobkey)
job key is from the api and has a unique constraint to avoid adding the same one again.
set Job properties(lat,lon,url,snippet.content, posted date, poll data)
create relationships
Relationship(job, “IN_STATE”, state)
Relationship(job, “IN_CITY”, city)
Relationship(job, “LANGUAGE”, lang)

That is the code. after a few false starts I have been able to get it run and gather about 16k listings.

Results

Below are some of the query results. I used Cypher’s RegEx to match on part of the job title.

One of the goals is run Spark’s machine learning lib against the data. As a first test I will count the words in the job title. In order to determine if the process is working I counted the words in job titles for New Hampshire. Now I have something to compare to after the Spark processing.
Below is a sample of the word count for all job polled in New Hampshire

word

count

analyst

14

application

10

applications

7

architect

15

architecture

4

associate

3

automation

5

business

2

chief

2

cloud

4

commercial

3

communications

4

computer

2

consultant

4

database

4

designer

3

developer

59

development

15

devices

3

devops

3

diem

3

electronic

2

embedded

5

engineer

83

engineering

2

engineers

2

integration

4

java

31

junior

3

linux

3

management

4

manager

8

mobile

3

I have Hadoop and Spark running. I need to get mazerunner installed and run a few tests. Then the fun begins…

Anyone who has worked in a Scrum/Agile environment understands the pain involved with task estimation. Some of the methods include shirt sizes (S, M, L), the Fibonacci sequence (1, 2, 3, 5, 8.), powers of 2 (1, 2, 4, 8) and even poker. Then there is the process of dealing with disparate estimates. One person gives an estimate of 2 and another suggests its 13 . After some discussion its agreed that the task is a 8. At the end of the sprint maybe it turns out that the task really was a 5. It would be useful,and interesting, to determine how well people do in their estimation. Is person A always under estimating? Person B is mostly spot on,….

This seems like a good candidate for Machine Learning, supervised learning to be more specific. I am not sure how many teams capture information from the estimation process but they should.

Basic information such as:

The original estimates for each team member

The final agreed upon estimates

The actual size of the task once completed

The data might look like this:

Task

TM1

TM2

TM3

TM4

TM5

TM6

TM7

TM8

TM9

Actual

1

1

8

1

13

5

8

2

5

13

8

2

3

8

5

8

8

5

3

1

8

5

3

2

5

5

5

5

2

1

8

1

3

4

8

5

6

3

1

2

2

13

5

5

5

3

5

5

8

8

8

8

13

13

13

6

1

3

5

1

1

1

1

2

5

2

7

1

3

5

1

1

5

8

5

3

2

8

5

3

5

3

2

1

1

3

2

1

9

8

8

6

5

8

8

13

3

5

5

10

2

5

5

8

8

8

8

8

8

13

The ‘training’ data consists of ten tasks, the estimates from each of the nine team members and actual value the task turned out to be. I choose the Fibonacci sequence as a method for estimates. Another piece of information that could be useful is the estimate the team agreed upon. That could be compared to the actual value as well. I decided not to do since it hides the interesting information of each team members estimate. By using each team members input we could determine which ones are contributing more or which ones are further off in their estimates.

Gradient decent

I am not going to try and explain Gradient Decent as there are others much better qualified to do the that. I found the Stanford Machine learning course to be the most useful. The downside is that the course used Octave and I want to use Python. There is a bit of a learning curve trying to make the change. Hopefully I have this figured out.

The significant equations are below.

The cost function J(θ) represents how well theta can predict the outcome.

Where xj(i) represents each team member’s estimate for all of the task.
x(i) represents the estimate(feature) vector of the training set.
θT is the transpose of the theta; vector

hθ(x(i)) is the predicted value.

The math looks like this.

For this I am using the following python packages

numpy
pandas
matplotlib.pyplot
seaborn

Note: In order to use seaborn residplot I had to install ‘patsy’ and ‘statsmodel’

easy_install patsy
pip install statsmodels

Set up pandas to display correctly
pd.set_option(‘display.notebook_repr_html’, False)

Now do something!
First calculate the prediction. Theta . estimates.
Next perform the the J(0) calculation
Calculate the cost and record the cost. This last step will tell us if the process is decreasing or not.

This graph shows the linear fit between the predicted and actual values

This graph shows the difference between the predicted and actual values

The data set is far too small to declare anything. The cases where the actual was high there is less data and the error is greater. In order to get more data I’ll have to make it up. Having worked in development for years( many) I know that people tend to follow a pattern when giving estimates. Also the type of task will dictate estimates. A UI task may seem simple to someone familiar with UI development. While a server/backend person may find a UI task daunting. In deciding how to create sample data I devised a scheme to give each team member a strength in skills, UI, database, and server. Also each member has a estimation rating. This defines how they tend to rates tasks, low, high, mix or random. Once I get this working I start over and see how this new data performs.

Project tycho.
The project has gathered data(level 2) over a 126 year period(1888 to 2014). Divided in to cases and deaths it include fifty diseases, fifty states and 1284 cities. Access is via a web service. There are calls to get a list of all diseases, states, cities, cases and deaths. Using Python, I pulled the various pieces and stored each is a file. The process takes a while it was better to get the data once and then format it as needed. For each state/city I also obtained the lat/lon information. Finally I gathered all of the data into one file where each record looked like the ones below:

The process:
1. Get all diseases.
2. Get all States.
3. For each State get all cities.
4. For each State/City geocode the city.
5. For each Disease.
For each State and City get events.

Some python code:

The code below gives examples of how to pull the data from the Tycho site. The key is assigned by them. I found some cases where there are ‘[‘ and ‘]’ characters is the data. Since I couldn’t determine what to do with this I simply skip it. I also check for commas and spaces which make parsing difficult.

The hardware. For the most part I just use a Windows laptop. I need to run Hadoop and Spark and since I use the laptop for work I need a different solution.Something that I can run without disruption . Hadoop is marketed as running on commodity hardware. Lets see. I have two old systems that I have installed Linux(Ubuntu) on. Also I installed Hadoop and a host of other support stuff. I need to get these set up as a cluster at some point.

I have been trying to work on data analysis for a while, but its been a lot of start and stop. I started with pure spatial data(University of Maine, Spatial Information) and then started working with public health data. Eventually I came to understand that the two are connected. Considering where events occurred can be helpful in understanding how to handle
public health issues. Some guy named Snow figured this out in the 1850’s. The big data movement has made things like machine learning,R, NLP, Hadoop, Pandas and Spark popular. I have decided to spend the next year mucking about with a couple of data sets to get a better idea of what can and can’t be done.

The data.
There are two data sets I plan to use(so far).
The first is public health information assembled by Project Tycho, University of Pittsburgh . The project has gathered public health data for over a hundred years. It consists of events(cases or deaths) due to disease. Each event is associated with a State, City, Year, and Day of the year. I have added Lat and Lon for each City.

The second set is being created each day(when I remember…). It involves pulling data from a job board using their api. This data is nice because it is changing every day. It also has a lot of free text that might be useful for NLP or classification.

Coding
My day job involves Java. For this effort I’d like to stay with Python. There are some exceptions where Java might make more sense, loading large data sets, or Hadoop MapReduce. I am using Django to create web apps as needed.
Python has it own analysis library, but works well with R. Probably a good path to stay with.