Pandas with Jeff Reback - Episode 98

February 26, 2017

Summary

Pandas is one of the most versatile and widely used tools for data manipulation and analysis in the Python ecosystem. This week Jeff Reback explains why that is, how you can use it to make your life easier, and what you can look forward to in the months to come.

Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? Check out Linode at linode.com/podcastinit or use the code podcastinit2019 and get a $20 credit to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.

Inspired by Python and the topics discussed during the show? Feeling it’s something you want to try and master? Start by choosing the right tools! JetBrains delivers intelligent software solutions that make developers more productive by simplifying their challenging tasks, automating the routine, and helping them adopt the best development practices. JetBrains PyCharm, a Python IDE for Professional Developers, provides the complete set of tools for productive Python, Web, and Scientific development and Data Analysis. For listeners of the show, JetBrains is offering a free 3-month PyCharm Professional Edition individual subscription. Use the promo code podcastinit during checkout at jetbrains.com/pycharm.

Preface

Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.

I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.

When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.

When you’re writing Python you need a powerful editor to automate routine tasks, maintain effective development practices, and simplify challenging things like refactoring. Our sponsor JetBrains delivers the perfect solution for you in the form of PyCharm, providing a complete set of tools for productive Python, Web, Data Analysis and Scientific development, available in 2 editions. The free and open-source PyCharm Community Edition is perfect for pure Python coding. PyCharm Professional Edition is a full-fledged tool, designed for professional Python, Web and Data Analysis developers. Today JetBrains is offering a 3-month free PyCharm Professional Edition individual subscription. Don’t miss this chance to use the best-in-class tool with intelligent code completion, automated testing, and integration with modern tools like Docker – go to <www.pythonpodcast.com/pycharm> and use the promo code podcastinit during checkout.

Visit the site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.

To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers

Your host as usual is Tobias Macey and today I’m interviewing Jeff Reback about Pandas, the swiss army knife of data analysis in Python.

Interview

Introductions

How did you get introduced to Python?

To start off, what is Pandas and what is its origin story?

How did you get involved in the project’s development?

For someone who is just getting started with Pandas what are the fundamental ideas and abstractions in the library that are necessary to understand how to use it for working with data?

Pandas has quite an extensive API and I noticed that the most recent release includes a nice cheat sheet. How do you balance the power and flexibility of such an expressive API with the usability issues that can be introduced by having so many options of how to manipulate the data?

There is a strong focus for use in science and data analytics, but there are a number of other areas where Pandas is useful as well. What are some of the most interesting or unexpected uses that you have seen or heard of?

What are some of the biggest challenges that you have encountered while working on Pandas?

Do you find the constraint of only supporting two dimensional arrays to be limiting, or has it proven to be beneficial for the success of pandas?