Data Science Radar – Data Wrangler Profile

I started off as a Product Analyst doing a bit of this, a bit of that, but moved into a Data Analyst role on a search for more data. My Data Analyst role swiftly moved into a Business Intelligence role where I tackled data integration and reporting challenges. I rose through the ranks, mentoring others, and started working on more predictive as opposed tasks using R. Still involving a strong data platform focus, I try to help others get more value out of their data.

2. How would you describe what a Data Wrangler is in your own words?

A data wrangler knows how to integrate data from multiple sources, solving common transformation problems, and resolve data quality issues. A great data wrangler not only knows their data but helps the business enrich the data.

Not really! Wrangling data is where I started out and it remains a strong foundation upon which I build my other skills. I also use data wrangling as my key to help pick up and learn new languages – I have a strong grasp of the theory and design patterns so I can see how a language maps to those patterns, thus shortening the learning curve.

4. Is knowing this information beneficial to shaping your career development plan? If so, how?

I build my other skills upon data wrangling so I need to stay sharp. At the moment I’m focussing quite extensively my communication and visualising skills, to help me better convey the value I can help people get from their data.

5. How do you apply your skills as a Data Wrangler at Mango Solutions?

I’m a data wrangler but my skill set is quite broad so I beef up the DataOps component of the consultancy. I’m able to help build data-centric solutions that deliver on results, and most of the time do a reasonable job at explaining their value! Recently, I’ve also been delivering a lot of Data Wrangling focused training, including a Working with Databases workshop at LondonR most recently.

6. If someone wanted to develop their Data Wrangler skills further, what would you recommend?

It can be tough, but the best way is handling as much dirty-ish data as possible. The sorts of cleansed, small, narrow data-sets we see most R examples run with do not allow you to learn how to wrangle data effectively – a good place to start for dirty-ish data is web scraping or probably some of the data you have in your company!

7. Which of your other highest scoring skills on the Radar compliments a Data Wrangler skill set and why?

I think Data Wrangling is in some respects a composite skill with perhaps modelling being the least contributing skill. To be a good data wrangler you need to understand the business requirements (communication), you need to program your wrangling effectively (programming), understand the limitations of your platforms (technologist), and present back your results concisely (visualiser).

8. What cool data wrangler techs are there at the mo?

Not cool but relational databases and SQL continues to hold the lead in data wrangling, with things like Impala making it easy to run SQL over Hadoop. More in the non-relational world, Drill is proving quite interesting. For me R continues to be a valuable asset for ETL. I’m always on the hunt for more – so do suggest one!