Acquire and ingest data from files, data stores, and directly from the Web

Clean, munge, and manipulate data into shape so that it is ready for analysis

Draw insights from the data and conduct analyses that will deliver those insights

Determine and apply the most appropriate model to your data

Interpret the results of your analysis and modeling

Communicate your results through a visualization, report, or application

About

As increasing amounts of data is generated each year, the need to analyze and operationalize it is more important than ever. Companies that know what to do with their data will have a competitive advantage over companies that don't, and this will drive a higher demand for knowledgeable and competent data professionals.

Starting with the basics, this book will cover how to set up your numerical programming environment, introduce you to the data science pipeline (an iterative process by which data science projects are completed), and guide you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples in the two most popular programming languages for data analysis—R and Python.

Features

Learn about the data science pipeline and use it to acquire, clean, analyze, and visualize data

Understand critical concepts in data science in the context of multiple projects

Expand your numerical programming skills through step-by-step code examples and learn more about the robust features of R and Python

Authors

Tony Ojeda

Tony Ojeda is an accomplished data scientist and entrepreneur, with expertise in business process optimization and over a decade of experience creating and implementing innovative data products and solutions. He has a master's degree in finance from Florida International University and an MBA with a focus on strategy and entrepreneurship from DePaul University. He is the founder of District Data Labs, is a cofounder of Data Community DC, and is actively involved in promoting data science education through both organizations.

Sean Patrick Murphy

Sean Patrick Murphy spent 15 years as a senior scientist at The Johns Hopkins University, Applied Physics Laboratory, where he focused on machine learning, modeling and simulation, signal processing, and high performance computing in the Cloud. Now, he acts as an advisor and data consultant for companies in San Francisco, New York, and Washington DC. He completed graduation from The Johns Hopkins University and got his MBA from the University of Oxford. He currently co-organizes the Data Innovation DC meetup and co-founded the Data Science MD meetup. He is also a board member and cofounder of Data Community DC.

Benjamin Bengfort

Benjamin Bengfort is an experienced data scientist and Python developer who has worked in the military, industry, and academia for the past 8 years. He is currently pursuing his PhD in Computer Science at the University of Maryland, College Park, doing research in Metacognition and Natural Language Processing. He holds a Master's degree in Computer Science from North Dakota State University, where he taught undergraduate Computer Science courses. He is also an adjunct faculty member at Georgetown University, where he teaches Data Science and Analytics. Benjamin has been involved in two data science startups in the DC region: leveraging large-scale machine learning and Big Data techniques across a variety of applications. He has a deep appreciation for the combination of models and data for entrepreneurial effect, and he is currently building one of these start-ups into a more mature organization.

Abhijit Dasgupta

Abhijit Dasgupta is a data consultant working in the greater DC-Maryland-Virginia area, with several years of experience in biomedical consulting, business analytics, bioinformatics, and bioengineering consulting. He has a PhD in biostatistics from the University of Washington and over 40 collaborative peer-reviewed manuscripts, with strong interests in bridging the statistics/machine-learning divide. He is always on the lookout for interesting and challenging projects, and is an enthusiastic speaker and discussant on new and better ways to look at and analyze data. He is a member of Data Community DC and a founding member and co-organizer of Statistical Programming DC (formerly R Users DC)

FREE with a Subscription

Find out more >

$9.99 a month after trial

Buy

eBook

$20.99

RRP$29.99 Save 30%

Buy

Print + eBook

$49.99

Qty

What do I get with a Packt subscription?

Exclusive monthly discount - no contract

Unlimited access to entire Packt library of over 7000 eBooks and Videos

120 new titles added every month on new and emerging tech

What do I get with a Video?

Download this Video course in MP4 format

DRM FREE - read and interact with your content when you want, where you want, and how you want