It seemed like some very interesting data, but I found it very difficult to read the individual values of all the states, and compare them in my head. Therefore I imported the data into SAS, and started exploring it. I tried graphing the data several ways, and here is my favorite visualization (click the image below to see the full-size map, with hover-text): Read More »

There's been quite a bit of controversy about the number of undocumented immigrants in the US lately - for example, Ann Coulter claims that number is 30 million, whereas others claim it's about 11 million (readers of my blog are data-savvy, and would dig into the details of such claims, of course). It's difficult to get a definitive count of something that is by definition 'undocumented,' therefore I focus on something that is more easily quantifiable - the number of illegal aliens that have actually been apprehended.

I found the data on the US Customs & Border Patrol (CBP) website, in the form of a table, as shown in the partial screen-capture below:

This is very interesting quantitative data on the topic, but I found it a bit difficult to digest in their tabular form. There are just too many numbers to try to keep track of in my head, as I look for trends and such. Therefore I imported the data into SAS and created a bar chart - this really helped me see how the numbers have changed over time. I also used SAS to calculate the grand total (nearly 50 million) and annotate it onto the graph using a large font. Read More »

I saw the dress photo as blue & black. If you're a female, even if we perceived the exact same color, you might might not have said 'blue & black'. That's because women have a larger color vocabulary than men, and you might have elaborated on exactly which blue and which black.

This blog is about a fun/unscientific comparison of the color names men and women use. If you do a Google search for 'men women color names' and look at the images, you will get several matches showing various visualizations of a spectrum of colors, showing that women have a different name for each one, whereas men tend to lump them together into groups.

July has been an exciting month for me. Not only because of the historic Tour de France this year... but even more because this month the new offering SAS Factory Miner was officially released!

With SAS Factory Miner you can run predictive models in an automated model tournament environment to quickly identify the best performer for each segment. Wait a second… tournament, best performer, segment… doesn’t that sound like the Tour de France? Maybe it’s not such a coincidence after all that the launch of SAS Factory is during the world most prestigious cycling race. Let’s investigate how they relate.

Not one, but multiple winners

As organizations begin to apply analytics to growing numbers of customer and business segments, predictive models often must be developed at increasingly granular levels. SAS Factory Miner provides an environment for building, comparing and retraining predictive models at scale across multiple segments. With just a few clicks you can uncover the champion model for each segment. Read More »

Saint Peter’s is the latest to collaborate with SAS to offer such a program. We’ve helped launch more than 30 Master’s degrees and 60 certificate programs all over the world in analytics and related disciplines.

In the past year alone, we helped lay the foundation for new Master’s programs at Michigan State University, University of Maryland, University of Missouri, George Washington University, Shiv Nadar University, Indian Institute of Management and University of South Australia.

For us, it’s a no-brainer. We need analytics expertise as much as any company. And that expertise is scarce. Consider a recent report released by MIT Sloan Management Review. The upshot: technology is no longer the key inhibitor for organizations struggling to get value from analytics. It’s lack of analytical talent.

What’s it like to build a data science program to bridge the gap? I asked Dr. Sylvain Jaume, Director of Saint Peter’s Data Science graduate program, what it entails. Key early steps, he said, were the school’s vision and strategic investment into the program as well as getting commitment from companies to provide practical experience for students. Read More »

When I was a kid, I remember a motivational poster on my dentist's wall that said "You don't have to brush all your teeth -- only the ones you want to keep." That poster really made me think, and brush my teeth! And now that I'm a data-analyst adult, I think I've found an even scarier motivational poster ... graphs showing the percentage of senior citizens who have lost all their natural teeth!

Before we get to the scary data though, here's a picture of my friend Becky's daughter, who pulled her first tooth while performing on-stage in the Sword of Peace outdoor drama. Hopefully once all her permanent teeth come in, she'll keep them for a very long time!

One of the most important skills for data scientists and business analytical professionals is communications. If decision makers and managers don't understand what the numbers mean -- results won't turn into action.

I had the chance to interview Zeanah about his new course and the state of the analytics industry.

What are some of the biggest advancements in data mining over the past 10 years?

Data and change of focus based on data. There has been a subtle, meaningful almost unspoken change from modeling populations to modeling individuals or individual events. So while there is a lot of discussion of “big data”, the reality is the data is being parsed to get to specific details (a subset of the data) that relate to the detailed investigation. I like to call this reduction in data and dimensionality moving closer to an “Actionable Truth” – details around fact(s) that we can take action on.

As such, I believe even the term Analytics is now more descriptive than data mining as it implies greater use for decisions – and that in itself is an advancement. Read More »

Hadoop is an amazingly flexible platform for inexpensively storing and processing massive amounts of all types of data. With a well-provisioned Hadoop cluster & SAS, even more processing speed can be achieved. I have access to a small Hadoop cluster with the SAS Embedded Process software components installed and SAS on Windows which included licenses for the SAS/Access Interface to Hadoop and the SAS In-Database Code Accelerator for Hadoop. With this arrangement, it's possible to run DS2 DATA step and thread code directly in Hadoop. If you are reading and writing to Hadoop files, the DS2 code goes in and processes in Hadoop, and nothing comes out but the log! Reducing the need to push data to the compute platform should definitely improve processing speed.

I set out to compare processing data with DS2 threads in base SAS to processing the same data in-database in Hadoop. Here is the code I used for my experiment:

I managed to cut the elapsed time almost in half, even with my puny Hadoop test cluster! It makes a real difference when you can take the code to the data, instead of having to bring the data to the code.

I'm not going to post a ZIP flie for this blog entry, because I can't give you my Hadoop environment to play with. But if you'd like take DS2 and Hadoop for a test drive, you can see this and lots of other really amazing SAS & Hadoop technology by checking out the SAS Data Loader for Hadoop trial download. Better yet, join me in Boston for the next "DS2 Programming Essentials with Hadoop" class and we'll take a deep dive together. Or, if you would rather see a great introduction to Hadoop and an overview of all the ways it interacts with SAS, try our "Introduction to SAS and Hadoop" course, and I think you'll agree: SAS and Hadoop - it's a wonderful thing :-)

I recently came across some very interesting data on serial killings ... but it was in tabular/text form. This seemed like an invitation for me to create some graphs that make it easier to understand the data.

It seems many people have a morbid curiosity about serial killers. For example, some of the most popular shows on TV (such as Dexter, and Criminal Minds) focus on them. So when I found this data on serial killings, I thought it would be interesting to 'bring it to life' with a graphical analysis.

Let's start with something simple - the number of victims per year, since 1900:

Bald eagles, the national bird of the United States, came perilously close to becoming extinct here, but are now making a comeback! Let's look at the data with a SAS map!

When I was growing up in the 1970s & 80s here in North Carolina, I spent a lot of time outdoors but never once saw an eagle. That's because eagles were basically extinct in NC during those years. What happened to the eagles? The main factor was the widespread use of DDT as an insecticide after WWII ... and one of the side-effects was egg shell thinning, and the eagle eggs broke before they could hatch. DDT was banned in the US in 1972, and the eagles have been making a dramatic comeback since then.

And for some proof of this comeback, here's a picture of a bald eagle that my friend Joe took at Jordan Lake (about 20 miles from the SAS headquarters)...