Getting Started with Python [Part 1]

Getting Started with Python [Part 1]

Python is a splendid, flexible, open source language that is easy to learn, easy to use, and has powerful libraries for data analysis and data science. I have been meaning to write this article for over a year now. It is looooooong overdue. Time flies when you’re having fun exploring the constantly evolving world of analytics. Along with the weeds in my garden, there seems to be an endless list of neglected article topics piling up. Python is one of them! Without any further delay, let’s get started.

Why Python

Earlier this year, I wrote about top programming languages to learn. Python consistently ranked highly. In analytics and big data realms, it is one of the most popular programming languages in the world. Python is a general-purpose programming language with rich libraries such as scikit-learn for analytical and quantitative computing. It is used in scientific computing and highly quantitative domains such as finance, oil and gas, and physics.

Although I have tried PyCharm and liked it. I currently use the free version of Anaconda by Continuum Analytics. Anaconda Navigator’s UX makes it simple to manage analytic environments (even R Studio), launch interactive, iPython Jupyter Notebooks, find samples, training material, and community events.

It is also simple to set up, install (conda install x) or update (conda update x) commonly used analytics packages.

Pandas makes it easy to work with data and data tables called DataFrame. If you have worked with R or Spark in the past, data frames in Python are similar.

NumPy is used for scientific computing. It is fast but not as easy as Pandas.

SciPy is has statistics functions. It is also used for mathematics, science, and engineering functions.

Statsmodels is for statisticians. It has functions for exploring data and performing descriptive statistics.

MatPlotLib is a plotting library for the Python programming language.

Seaborn used with MatPlotLib for better looking visualizations.

Scikit-learn is for data science, it includes functions for preprocessing, supervised and unsupervised machine learning algorithms, model selection, and more.

Bokeh is a Python interactive visualization library.

Anaconda datashader for big data visualizations. This totally awesome library can be used in conjunction with Bokeh. It is amazing! With datashader, you can visualize millions or billions of points of points with no downsampling required. Stay tune = datashader will get a dedicated blog soon. Check out the white paper by Dr. James Bednar.

Unlike other getting started with Python articles that explain Python 2 versus 3 version decisions, I’ll let you look that up and decide for yourself. I have been staying far away from compare articles these days.

Other good resources that I found when getting started with Python include:

Python in Action

To see Python working you can run commands in a command prompt, IDE, or Python Notebook. I personally enjoy using the iPython notebooks in Jupyter. Let’s start with the programming classic “Hello World”.

In Anaconda Navigator, click Launch on the Jupyter option. Then navigate to Files and choose New > Python 3 kernel.

iPython notebooks in Jupyter

When your first notebook is displayed, I highly recommend navigating to Help > User Interface Tour first to see how to interact with the notebook, run and cancel commands, get help, etc.

iPython Jupyter tour

Now go ahead and type in print (“Hello World”) and use the Ctrl-Enter keys or the arrow button to run the code in that cell. The result of your code or an error message will be displayed below.

Python Hello World

You could continue to type in more Python code snippets or open example notebooks (*.ipynb files) and run them with File > Open. I find running samples

Since “Hello World” is not thrilling for my data savvy, analytics audience, let’s see a few analytics related examples. In future series articles, I’ll cover Python analytics libraries in more depth.

Jen Underwood is a Senior Director at DataRobot and founder of Impact Analytix, LLC. She has a unique blend of product management and “hands-on” experience in data warehousing, reporting, visualization, and advanced analytics. In addition to keeping a constant pulse on industry trends, she enjoys digging into oceans of data to solve complex problems with machine learning.
Over the past 20 years, Jen has held worldwide product management roles at Microsoft and served as a technical lead for system implementation firms. She has experience launching new products and turning around failed projects. Most recently she provided advisory, strategy, educational content development, and marketing services to 100+ technology vendors through her own firm. She has been mentioned by KD Nuggets, Information Management and Forbes for her work. She also has written for InformationWeek, O’Reilly Media, and numerous other tech industry publications.
Jen has a Bachelor of Business Administration – Marketing, Cum Laude from the University of Wisconsin, Milwaukee and a post-graduate certificate in Computer Science – Data Mining from the University of California, San Diego. She was also honored to be a former IBM Analytics Insider, Tableau Zen Master, and Top 10 Women Influencer.