Friday, November 13, 2015

June 2016 update: Both of these posts are now filled.While the official application isn't up and running quite yet, I'm looking for a news apps developer (two, actually) to join our team at the Statesman. We've done some incredible work lately, and now it's time to build upon that with more, better and different. If you have interest, please fill out the official application. please email me at cmcdonald@statesman.com and include the normal stuff: resume, examples of your work, etc. Here is a job description:

The Statesman is looking for a news applications specialist for the interactives team. This developer journalist will work with reporters, editors and other team members to design and build interactive graphics, data visualizations and news applications to support journalism ventures.

Share and expand knowledge with other team members, and to learn from the experience of others.

Research new technology and best practices and tools and analyze for best fit, usage, stability and performance.

Communicate with both technical and non-technical colleagues to serve as a bridge between content and digital design of applications and visualizations.

Some reporting and contacting sources for data and information.

SKILLS & EXPERIENCE:

Understanding of data structures and database management

Familiarity with web API’s and common data visualization libraries

Demonstrated ability to turn concepts into user-focused apps using HTLM5/CSS3/Javascript. We use Node.js for package management, PHP/WordPress, Python and Django. While experience in these areas is preferred, we recognize developers are adaptable, and so are we.

Friday, June 19, 2015

I have this salary data -- title, gender, length of service and the annual salary -- and I need to do some comparisons. Seems like some easy stuff. Well, it's not as easy as you might think using our typical data workhorses of Excel and SQL.

Salaries are one of those data sets where extreme ranges can skew distributions, making the mean (or average) a poor representation of the data. A small number of really high or low salaries can move the average out of whack from the rest of the data.

Excel does not have median as sumoption for pivot tables.

So, Viva La Median! Line all those salaries up in order and pick the one in the middle (or average the two in the middle) and you get a better sample to explain your data set. I have more than 10,000 rows of data with hundreds of job titles to compare, so I can't just use MEDIAN() at the bottom of an Excel column with so many titles.

So I whip out Excel's Pivot table, put my job titles in row and annual salary in the values and then ... wait. What? No median? I can average, sum, min, max ... but no median? There might besome solutions that I haven't really checked out, because I figured I'd just do this in MySQL.

But I was able to solve this in about three minutes using Tableau. After importing my data, I set the Title on the Rows shelf and Annual Salary on the Text mark. I used the contextual menu for SUM(AnnualSalary) and change the Measure to Median, as show in the screenshot.

Once I had the data on the screen the way I wanted it, I went to the menus Worksheet > Export > Crosstab to Excel, which saved out the files as an Excel spreadsheet.

Of course, I can and did analyze and visualize the data within Tableau, but in this case I had a need to get the data out into another program as well.

Sunday, March 08, 2015

I'm writing this on the last day of #NICAR15, but I should've started this before #tapestryconf five days ago. Let this be a lesson to you on the way to Denver for #NICAR16. Take my advice:

Every night before you go to bed, take a couple of minutes to write down what you learned that day, and what you want to do with it. TiL and IWT for each and every day. Better yet, work on it all day. Do it on your phone if you have to. I'm doing this in the airport as I wait for my flight back to Austin, but I bet I miss stuff.

Getting to Tapestry 2015

TiL

That fog can bring an airport to its knees.

Tapestry 2015

TiL

The NYT is still awesome, but that's nothing new.

Meredith Broussard is full of energy and vigor, and it is contagious. Thank you for your enthusiasm.

Where is the YOU in my visualization. Chad Skelton gave a great talk at Tapestry about keeping the reader in mind when it comes to your presentations. Showing income; add a comparison calculator.

Thursday, January 15, 2015

I haven't written in the ol' blog here in some time, and there has been LOTS going on. It all culminated this past week with the publication of Missed Signs, Fatal Consequences, an immersive story-telling, data-driven project by the Austin American-Statesman.

The project started with obtaining through public record requests abuse- and neglect-related child fatality reports from Child Protective Services. The documents, required since 2009, are only available as a PDF, and the content is not saved as data to be analyzed in any manner. Great law, but of little use if no one is using the reports.

Before I go any further, let it be known unto all the world that Andrew Chavez is the genius behind all the online development and design for this project, including that awesome data explorer. He's taken our online immersive template, rebuilt it with bionics and really made it sing. It's something we can build with and on into the future (including this weekend with another immersive project.)

The first thing we did (my contribution) was to create a Caspio database to collect certain fields from the 780 documents. Caspio is derided by many for good reason, but it is actually pretty good for this purpose ... to collect hand-entered information in a structured manner. (I would perform morally-suspect tasks for a JSON output direct from Caspio, though. Might even pay for it.) It would've been great if we could scrape the PDFs for our data, but the reporters were doing on-the-read analysis that couldn't be done programmatically. We also put all the documents in DocumentCloud to help with reporting (and for later use in our online data explorer and links within stories).

I made lots of changes to the forms as we went along, responding to requests and storylines found as the reporters read through the reports. We picked at it for a year, adding reports as they came out.

And then around summertime, the investigative team got serious. Stories were reported out by Andrea Ball and Eric Dexheimer, and they eventually went near full-time on the project. As stories gelled and sources were found, our visual folks Kelly West (video) and Laura Skelding (photography) started working their magic.

And Andrew. Wow. He created a template system that helped us wrangle all this mass of content (15 stories and about 100 images) into a clean, integrated, responsive, online masterpiece. I learned so much during this project just dabbling my little bit in the code ... I'm giddy with excitement about all the tech. Just love it.

What was that tech? Oh, gawd. Andrew should be listing all of it out, but I'll give it a go:

Backbone to bring out data across all page, like with the child pop-ups where we reveal basic data and point to source documents when a child is mentioned.

Lots of grunt, including grunt-generator to bake out all our files into flat files. The finished project is entirely self-contained html/images/javascript with no server-side processing. It could run anywhere.

Friday, July 11, 2014

We're formalizing our news and data interactives work at the Statesman and we are looking for a developer to join Rob Villalpando and I to form a new News Interactives Team (or some other snazzy name we think up.)

If you read this story, and say, “Yep, that’s me” or “I want to make a difference like that,” then you are the kind of person I’m looking for. If you can buy into this philosophy and help us do some of the same type of work they do, then you are a good candidate. If you understand 80% of the the posts on that blog, then you are probably a really good candidate. If you understand it AND were already a regular reader of that blog, then OMG CALL ME!

I’m looking of someone who loves to create and to share what they know and what they learned today. We’ll soak it up. We also have a lot to offer, so you’ll grow and learn, too.

Eagle Ford drugs did not break any new ground. In fact, it was a baby step for us built upon the work of others, inspired by the Globe and Mail's Magnetic North feature. We had a desire to get into this space, a good story by one of my UT-Austin students, Michael Marks, and some excellent photography by Jay Janner.

It was a method for us to discover the challenges we might face doing more of this kind of storytelling. It's really just the first step. My aim as the editor for our interactive team is to tell stories in completely different ways than a "long story with a bunch of pictures." I personally hold NPR's T-shirt project as seminal inspiration. We'll get there. (I could really use some help ... come help me build something bad ass.)

So, here is what we used:

Foundation: This is the responsive design HTML/CSS framework we used to build Eagle Ford drug (and our XGames package and the Austin Homicide Project and some others. It gives us flexibility to focus more on the content than on how to present it. Our goal is to be mobile-first, and this code base allows us to do that quickly. I started with nothing on Tuesday and had everything finished by Friday, including some new development I hadn't done before.

Slick: A javascript plugin that allowed us to embed a swipeable? gallery within the story. I had a BUNCH of great photos that I wanted to include. Slick was one way to do this, but we'll keep experimenting with this an other JS plugins. This brought in quantity, but I didn't do that great a job in giving these multitude of great photos the play they deserved. We need BIGGER. Some kind of lightbox treatment or something. Also, we have some issues with Firefox on a desktop where the usability is a bit buggy.

The inset concept I lifted liberally from the aforementioned Globe and Mail presentation. I didn't see anything like this native in Foundation, so I looked at how they accomplished this and tried to build in that functionality to our Foundation framework. It worked for the most part, but there are some issues to work out with IE where our insets don't break the right margin of the text.

I used this project to build a template for future projects, but as you can see it still has some issues. I'm not sophisticated enough as a coder to share all this on Github, but we'll get there eventually. I'm sure there are plenty of other better places to start out there anyway.

Saturday, May 17, 2014

My Google Fusion Tables challenge this week was a single shapefile/kml that had about 25 different polygons in the same layer, which when I converted it to KML it put them all within the same tag. Fusion tables would only show some of the shapes on the map, even though they all showed in the KML preview of the data row.

What the Fusion Tables map showed

What the KML preview from the data showed (which is correct)

So I put a question about it into the Fusion Tables API Users Google Group. Folks there explained there is a limitation with Fusion Tables (or maybe the Maps API) that shows only the 10 largest polygons if there are multiple in a Placemark in the KML file. They suggested I use “Singleparts to multipart” tool in QGIS to split the shapes.

Well, I tried that, but I got the following error and couldn't make it work not matter what I tried in that dialog.

So, I peaked at the KML file in a text editor. After all, it's really just an XML file, which is by nature structured, so I hoped I could figure out how it worked and split the file myself. I could see the and tags with very long, multiple tags.

Word wrap is off, but the white line is really, really long.

So I duplicated the tag and then pulled out enough tags to make sure I had no more than 10 within a specific Luckily there wasn’t too much data to this shape, or that would’ve been a nightmare. The final KML looked like this: