Sustainability is important to me but lifestyle changes are never easy. To adopt new habits I have found that a strict 2 month 'cold turkey' period followed by a more realistic lifestyle adjustment works best. This year I decided to try my hand at DIY shampoo, body wash and conditioner with products I already use at home. My hope is to use a Castile soap for most things (hands, body, hair, dishes) and just refill it from the bulk soap at the grocery store. This would drastically minimize my container waste!

As great as it sounds, it's not necessarily easy and I was worried to commit to two months of potentially destroying my hair and skin. The last time I tried to use Castile soap as shampoo it just turned my hair into a tangle gnarled mess. This time I used a coconut milk recipe and apple cider vinegar rinse. After three weeks I'm still going strong! It's an adjustment but I think I can do this! If you're curious about my regimem please let me know, I'd be happy to do a longer post on the subject.

The last few weeks have been a blur! After a final busy week in design haven Barcelona (thanks Gaudi) we took off for a sprint tour of Europe. We started in London filling up on metropolitan culture and Yorkshire pudding. From there we found our way to Kinvara and the Cliffs of Moher. We ended our tour of the British isles in Dublin with a pint of Guiness and a shot of Teeling whiskey. From Dublin we flew to Berlin where we learned how to tear down walls (pictured) and build up tolerance. Our final stop in Mainz was filled with family, food and foolery. We were both working throughout the trip so our nights were filled with hacking (also train rides, plane rides, or in the backseat of the car). Stay tuned for my next project update!

A. Background

Large data problems are inherently complex. The result is that I usually have to break problems down into chunks. In this post I investigate demographics that could be linked to the opioid epidemic in America.

All of my projects stem from personal curiosity but current events definitely influence my ideas. Last week, the Trump administration eluded that it would take a hard stance on recreational marijuana, reversing the decision of the Obama administration. In a press conference, Sean Spicer drew unsubstantiated links between the use of recreational marijuana and the opioid overdose epidemic. This contradicts evidence that the epidemic is caused by abuse of prescription opioids. In the words of the CDC, 'The best way to prevent opioid overdose deaths is to improve opioid prescribing to reduce exposure to opioids, prevent abuse, and stop addiction.' In fact, a recent infographic published by the CDC indicates that people addicted to prescription pain killers are 40x more likely to be addicted to heroine compared to just 2x for alcohol addiction and 3x for marijuana addiction.

The problem, for me, is not the threat of a marijuana crack down, it is the perpetuation of unsubstantiated claims that marijuana is a gateway drug to opioids. This is especially troubling considering that prescription opioids are significantly more likely to lead to an opioid addiction (and overdose) than weed. Additionally, medical marijuana could actually play a role in reversing the opioid epidemic. Spicer's sentiment reinforces the stigma that marijuana is dangerous and should be a schedule 1 drug. This stagnates research into cannabinoids for pain management and makes it taboo for people to seek marijuana as an alternative to opioids. Given the complexity of this problem, I decided to see if I could predict narcotics deaths based on socioeconomic demographics (including marijuana legality/sentiment). This is a simplification of a highly complex problem and is more of a thought experiment than a research project, so, please don't take the results as gospel.

B. Selecting Features

For this project, I recycled a number of the demographics from the U.S. maternal mortality project I did in January. The bullet point list was a little hard to look at so I re-organized the features into a table (see below).

1. Investigating New Variables

I have two new data sources: narcotics death data from the CDC (2015) and marijuana sentiment data from 2016. If you would like to learn more about any of the other data sources you can check out my full project write up for US maternal mortality rates here. Please note that feature data is generally from the years 2010 or newer with the exception of maternal mortality rates (2001-2006). Lets look take a look at how the two new demographics pan out across the United States.

The rank for marijuana enthusiasm starts with Colorado as 'most enthusiastic' and ends with North Dakota as 'least enthusiastic'.

According to the CDC, the data for narcotics deaths from North and South Dakota is 'unreliable'. Read more on the CDC website.

I used the code below to graph these choropleths with plotly. You will need your own plotly credentials to save your graphs online or you can choose to save them locally. I derived my code from this example on the plotly website.

2. Relationships in the Data

As always, overly correlated features can hurt your model. So, it is always good to start out with a correlation matrix of all of the features you plan to include. This way you can determine which features are redundant and make some decisions about what is best to leave out.

Overall, the features have few strong correlations, however, some interesting things did show up: Teen Birth Rate per 1,000 live births is inversely correlated with median income (R^2 = -0.746086). This suggests that the more money a state has the fewer teen births there are . Another surprising correlation is found between abortion policy and marijuana enthusiasm (R^2 = 0.645330). As a reminder, the feature 'weed enthusiasm' ranks states with 'most enthusiastic' as '1' and the feature 'abortion policy rank' rates 'least barriers to abortion' as '1'. So, the most liberal state for both policies is given the lowest number. This is suggests that states that are anti-abortion are also less enthusiastic about marijuana usage. Producing this chart is really simple with Seaborn. You can see the code I used to produce the heat map below.

C. Building the model

1. Setup and Null Accuracy

For the first version of the model I decided to include all to the features from above. Once the model was optimized I removed some features to decrease the error further. The first step towards optimization is knowing what your null error is. I calculated null root mean squared error (RMSE) for Narcotics death as RMSE=4.61095354742. I used the mean of the response variable to determine the null RMSE with the code below.

b. Model Selection and Estimator Tuning. Next I selected the proper model for my response variable. Given that this is a complex continuous response variable the RandomForest Regressor model was chosen. next I tuned the model for the ideal number of estimators and features:

Now that we know the ideal number of estimators (50) we can take a look at the features. In the next step we will look at the ideal number of features and the contribution of each feature to the model.

Tuning predicts the ideal number of features at 7 with an RMSE of 3.5. This is helpful but does not tell us which features we should include in our optimized model. To do this I computed the feature importances for all of the features.

The feature importances indicate that the majority of the predictive power of the model comes from the top 3 features.

Using the optimized features above with 20-fold cross validation I was able to get an un-optimized RMSE of 3.50146776799. Now that I have an idea of which features are important I can begin to optimize the model. This is done by restricting the features that are used. Usually, I would use the max features as a cut-off guideline, but I was able to get better performance by using fewer features. See the next section for the optimized model.

D. Model Optimization and Evaluation

1. Model Optimization

Utilizing the feature importances and the tuning strategies from above I was able to distill the features down to two: [ 'Teen Birth Rate per 1,000', 'total exports'].

Fitting the model with two features produced the feature importances seen above. With just these demographics the feature importance is even for both features.

2. Final RMSE and Model Evaluation

With the optimized RMSE I calculated how well the model improved over null RMSE:

nullRMSE: 4.61095354742

OptimizedRMSE: 3.23286248334

Improvement: 1-(OptimizedRMSE/nullRMSE) = 0.298873334964 (or ~ 30%).

out of bag accuracy score = 0.30322192270209503

E. Conclusions

Given the small amount of variation in narcotics deaths across the states, the accuracy did not improve by much when using the Random Forest Regressor. In the future, looking at the data with more granularity could improve the model (by county or city). Converting the variable from a continuous to classified could also improve the model. Although the out of bag accuracy score was quite low (30%) I was able to improve the model over the null be 30%.

It is interesting to note that total agricultural exports and teen birth rates are the main predictive demographics. Neither are particularly well correlated with narcotics deaths (total exports (R^2 = -0.338059, teen birth rates R^2 = -0.354916). Both of these features were also predictive for maternal mortality rates.

When all of the features were included in the model, marijuana sentiment accounted for ~10% of the model prediction. Through optimization this feature was removed along with a number of other demographics. In this model there doesn't appear to be much of a connection between marijuana sentiment (measured by legality and usage) and narcotics overdose corroborating with previous studies. However, this model is not very accurate and further study is needed to draw any definitive conclusions.

In my next post I will move away from cannabinoids and take a more holistic approach to investigating underlying trends in opioid prescriptions.

After years of trying to learn new languages (and failing) I decided to find a job abroad where I can practice. I don't have anything permanent yet, but for the next month I am living in Barcelona! I'm practicing my Spanish and hunting for a job. This is our adorable little flat on 'the hill' north of Gracia. Exciting things are to come! Also, stay tuned for my next data project!

A. Background

My interest in data science comes from years of analyzing small self-generated datasets. The information eked from those experiments was invigorating and exciting but I quickly realized that self-generated data doesn't scale. This, taken together with the practically limitless supply of public data just waiting to be analyzed, convinced me to learn data science. I am personally interested in drug re-purposing, genetics and public health so I started there. I recently developed a model to investigate demographics contributing to global maternal mortality rates and decided to continue this project for the United States.

Created with Plotly. High maternal mortality rates are above 14 maternal deaths per 100,000 live births, low rates are less than 7 maternal deaths per 100,000 live births. Data for MMR were collected between 2001 and 2006.

In the literature, the focus is usually on tackling low hanging fruit to reduce MMR. The suggestions usually aim to increase access to clinicians, emergency services and prenatal education. Most of these interventions help substantially in countries with poor medical access and infrastructure. However, the United States, which has one of the most expensive health care systems in the world, has an abysmally high maternal mortality rate. By my calculations, our average MMR of 14 deaths per 100,000 births put us in 46th place out of 188 countries. That is right behind Qatar (45th) and just ahead of Uruguay (47th).

This poor performance is emphasized by the fact that these deaths are almost entirely preventable. It should also be noted that in some states MMR is at or below the global minimum. Unfortunately, other states have MMRs as high as 20 women per 100,000 live births. So why is this still a problem in the U.S.? Why are we behind other rich post-industrialized countries? I took a stab at developing a model to shed some light on the problem. You can see the full project write up as an ipython notebook on my github or read the brief summary below.

B. Selecting Features

Correlation Matrix

This heat map shows the correlations of each of the features and the response variable. Notice that median income is highly correlated with teen birth rate, percent of medicaid paid births and percent obesity in women.

Highly correlated features are redundant and were paired down to simplify and improve the accuracy of the model. In this case, removing the features that correlate with teen birth rate improved the model accuracy. The following features were included in the models (for detailed descriptions of the feature variables please see project documentation on github):

Feature Choropleths and Descriptions

Made with Plotly. Darker color always corresponds to higher value. For more info about each feature see code descriptions below.

'State Taxes Per Capita' - Includes property taxes, income taxes, sales tax and other taxes per capita in US dollars ($)

'Total Exports' - measure of total agricultural exports

C. Evaluating Model

Response Variables:

MMR - The continuous MMR variable is the rate of maternal deaths per 100,000 live births. The random forest regressor model with this response variable.

MMR Classifier - In this response variable the MMRs were mapped to a scale from 1-4 based on quartile rank. ( highest MMR - 4, lowest MMR- 1). The random forest classifier model was used with this response variable.

D. Conclusions

Feature importance ranks are shown below for the random forest classifier. The teen birth rate per 1,000 births has the highest feature importance. This feature accounts for variation in the MMR response variable and is a predictive feature of the model. Proportional poverty for people of color is also high on the list. Total agricultural exports, state taxes per-capita and abortion policy rank are other important features of the model.

Discussion:

Given the small amount of variation in MMR accross the states, the accuracy did not improve much when using the Random Forest Regressor. Once the MMRs were classified into quartiles the random forest model performed better. The final accuracy score for the random forest classifier improved over the null by ~40%. The overall accuracy was still fairly low (43%) but was a vast improvement over the null.

It is interesting to note that total agricultural exports rank quite high in feature importance. It is difficult to interpret this finding given that it does not correlate with any of the other measures of economic prosperity.

If you have feedback or ideas, please feel free to leave a comment below.

A. Background

If you are a fan of sour, kombucha might be the drink for you. Unfortunately, it comes with the health food price tag of $3/bottle. Brewing it is much cheaper and way more fun. For a healthy culture and delicious tasting 'buch you'll need to maintain a pretty high temperature (~77F/25C) while brewing (5-7 days). Keeping the brew that warm is challenging in colder climates. With a little help from a terrarium heater and some electronics, I created a thermostat for brewing year round. Check it out below!

C. Hardware Setup

1. Board Assembly

Board assembly is new for me so I had some help with this step. For the Feather, we went with socket female headers. For the Temperature Sensor MCP9808 (Thermometer), plain male headers were used. Check out the assembly guides for more details: Featherassembly guide and MCP9808 thermometer assembly guide.

2. Load CircuitPython Firmware and Drivers:

If you're starting from scratch you will need to load the CircuitPython frimware onto your board. Start by plugging your feather into your computer with the micro USB cord. Next, use this guide to install CircuitPython onto your Feather. If you're a beginner I STRONGLY suggest doing this on Mac OS in the terminal. Keep in mind CircuitPython is in development right now, if something doesn't make sense don't give up! Post on the Adafruit message boards and they will help you.

Once you have the firmware flashed onto your board your computer should recognize it as a USB mass storage device called CIRCUITPY. Next, load the drivers for your thermometer. To do this got to the Adafruit CircuitPython repository and download the lastest CircuitPython bundle. The file should look like this <adafruit-circuitpython-bundle-#.#.#.zip> where "#.#.#" will be the latest release number. Once downloaded, open the .zip file and drag and drop the adafruit_bus_device folder and the adafruit_mcp9808.mpy file onto your board (see picture below).

3. Connecting the Thermometer to the Feather

Now that you have the firmware and drivers installed, disconnect your Feather from your computer and grab your thermometer and jumper extension wires. Use the wires to connect the Feather to the thermometer following the diagram below. The VDD and GND pins will power your thermometer through the Feather. The SCL pin is the I2C clock pin and the SDA pin is the I2C data pin. Pinout diagrams for the Feather and the thermometer may be helpful at this point. If you're following my board assembly the male end of the jumper extension wires will go to the feather and the female ends will go to the thermometer.

4. Connecting the PowerSwitch Tail 2 to the Feather

For this step, it is important to know where your PowerSwitch Tail will plug-in in relation to your kombucha brew vessel and thermometer. For reference, my kombucha is in a cabinet about a meter away from an outlet. I used wire spool to connect my Feather to my PowerSwitch tail so it was the right length. Note: If you're using wire spool, you will have to strip the ends.

Now that you have the correct length of wire, connect pin A5 on the Feather to the PowerSwitch tail at 1: +in, tighten the screw on the top until the wire is secure. Connect the lower left ground pin (GND) to the PowerSwitch tail at 2: - in. Again tighten the screw until the wire is secure. There are a number of pins that you can use to connect your Feather to your PowerSwitch Tail, I chose A5 because it was on the same side of the board as the ground pin. If you choose to use a different pin make sure you edit this in the code (see below).

D. Software

At this point your electronics should be connected (PowerSwitch Tail, Feather and Thermometer) and your feather should have CircuitPython firmware and thermometer drivers installed.

1. Getting Started

Plug your Feather into your computer with a Micro USB cable. The Feather will show up on your computer as a USB mass storage device called "CIRCUITPY". Open a new text file in the editor of your choice (I use Atom) and save it to the CIRCUITPY drive as "code.txt". To see the output of your Feather while coding you will need to connect a serial read–eval–print loop (REPL). If you are not sure how to open the serial REPL follow this guide. Once you have your code.txt file saved to your Feather (CIRCUITPY drive) and the serial REPL open you are ready to code!

2. Connect the thermometer:

In your code.txt file use the code below to import the thermometer and supporting libraries. This will also import the I2C protocol from nativeio and rename the thermometer output (here we used t). Save the code to your Feather and check to see that the temperature is printed in your serial REPL.

# Load the libraries we need.importadafruit_mcp9808fromboardimport*importnativeioimporttime# Do one reading and print it out.withnativeio.I2C(SCL, SDA) asi2c:
t=adafruit_mcp9808.MCP9808(i2c)
print("temperature in celsius:", t.temperature)

3. Connect to the LED and the PowerSwitch Tail

For this section you do not need to have the PowerSwitch Tail plugged into the wall, the microcontroller will power its LED. This code labels your indicator LED at pin D13 and switches it to output. Next, it connects the PowerSwitch Tail to the Feather through pin A5. If you used a different pin to connect to the PowerSwitch Tail, you will need to change the code here (edit A5).

4. Write the code for the thermostat

I wanted to use as little power as possible to make my heater smart so I am using a pretty simple thermostat. First, the while loop creates an infinite loop for the microcontroller. Inside the loop, the temperature sampling is delayed to every 5 minutes with a sleep function. Since I am dealing with a large vat of tea, the temperature shouldn't fluctuate very much. To test your sensor you may want to change the "time.sleep()" function to something shorter or comment it out. As is, the print statement will output the temperature to the serial REPL every 5 minutes. The print statement is helpful during setup but can be commented out when you save the code to your board. The thermostat is a simple if/elif statement which turns the power on if the temperature falls below 24.5C and turns the power off if the temperature is over 26.5C. This works for me but if you are looking for something more exact you can may consider a PID function:

E. Connecting the Pieces

1. Testing the thermostat

Start by plugging your Feather into the wall socket with the micro USB AC adapter. If you have an infrared thermometer this would be a good time to use it. Measure the temperature of your thermometer and make sure it is behaving like you expect. If you don't have a infrared thermometer try this:

Check for the OFF position - To increase the temperature hold the thermometer in your closed hand, it should exceed 27C (~80F) quickly and the power should turn off. When the power is off the red indicator light on the PowerSwitch Tail AND the LED light on the Feather should be OFF.

Check for the ON position - Place the thermometer, temperature sensor side down, on a cold surface (a hard counter or window should work, if not, try the fridge).Once the temp is below 24.5 C (~75F) the power should turn ON. When the power is ON both the indicator light on the PowerSwitch Tail AND the LED on the Feather should be ON.

2. Connect the heater and assemble the pieces.

If you're confident that your thermometer and thermostat code are functioning you can start connecting the pieces. The schematic below is an overview of my setup. Start by securing your thermometer to your kombucha brew vessel with electrical tape. Make sure the temperature sensor is touching the brew vessel. Next attach your heater to the kombucha vessel. Plug the PowerSwitch tail into the wall outlet and plug the heater into the PowersSwitch Tail. Next plug your Feather into a wall outlet with the micro USB AC adapter. Observe your thermostat over the next few days to ensure that everything is functioning normally. Last but not least, brew for 5-7 days, and ENJOY YOUR 'BUCH!

F. Troubleshooting and Further Reading

Having trouble? I did too! Here are some great resources to get your thermostat up and running:

3. Help brewing kombucha

After a week of babysitting, eating junk food, and burning the candle at both ends it's no surprise I ended up with a cold to start off the new year! This seriously addictive 'mermaid' pillow is a kinesthetic dream that kept me occupied while I relaxed in bed #giftingwin. Although my new year started off slow, here's to a productive, healthy, happy January!!