This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the Series and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as groupby, merge, and pivot tables effectively. By the end of this course, students will be able to take tabular data, clean it, manipulate it, and run basic inferential statistical analyses.
This course should be taken before any of the other Applied Data Science with Python courses: Applied Plotting, Charting & Data Representation in Python, Applied Machine Learning in Python, Applied Text Mining in Python, Applied Social Network Analysis in Python.

SI

overall the good introductory course of python for data science but i feel it should have covered the basics in more details .specially for the ones who do not have any prior programming background .

GS

Feb 20, 2017

Filled StarFilled StarFilled StarFilled StarFilled Star

This course was fast paced but the material was interesting and not to complex. I can only recommend this course to anyone interested in Data Science and who already has a basic knowledge of Python.

從本節課中

Week 1

In this week you'll get an introduction to the field of data science, review common Python functionality and features which data scientists use, and be introduced to the Coursera Jupyter Notebook for the lectures. All of the course information on grading, prerequisites, and expectations are on the course syllabus, and you can find more information about the Jupyter Notebooks on our Course Resources page.

教學方

Christopher Brooks

腳本

Welcome to an introduction to Data Science with Python. This course is the first course out of five in a larger Python and Data Science Specialization. Each course progressively builds on your knowledge from previous courses to give you a well-rounded view of what Data Science is, while helping you to develop skills to practice data science. The specialization is of an intermediate level or difficulty, and we expect that you have studied some basic programming and statistics in the past. In this specialization, we're focused on teaching applied skills using the Python programming language. There are many other tools that one can use in data science, such as specialized statistical analysis languages like R, or more general purpose programming languages like Java and C. We chose Python as the basis for this specialization for three reasons. First, it's easy to learn. Python is now the language of choice for introducing university students to programming. It's used in eight out of 10 of the US's top computer science programs. Python programs tend to have minimal templating that you've might have seen in other languages, and have more natural constructs for typical tasks you might need to accomplish. If you have programming experience, but not Python-specific experience, you can pick up Python very quickly. Second, it's full featured. Python is a very general programming language with a lot of built-in libraries and excels at manipulating data, network programming, and databases. It's mature, and there's plenty of resources available from books to online courses. Finally, Python has a significant set of data science libraries one can use. The base of these is called the SciPy Ecosystem, and it even has its own conference series. Both the interface that we're going to use for doing assignments, called Jupiter Notebooks, and the main libraries for the first two courses, Pandas and Matplotlib, are part of the SciPy stack, and provide an excellent basis for moving into machine learning, text mining and network analysis. This first course is broken into four modules. The first module focuses on getting prerequisites in place and reviews some of the basics of the Python language. Don't worry, if you already have Python down and you want to be challenged, we have some advanced Python in here as well. The advanced Python isn't strictly necessary for the rest of the specialization, but many of these examples you might see on the web or broader data science topics like Big Data and real-time analytics, might require a knowledge of some of these more specialized features. In the second module we're going to dig into the pandas Toolkit. The pandas Toolkit is a fundamental in Python data science, and provides a data structure for thinking about data in a tabular form. This Toolkit helps bring functionality that exists in R into the Python world. It's seen significant adoption over the last five years. Much of the thinking behind pandas is similar to relational theory. So if you have a background in databases, you'll find the pandas environment fairly natural to work in. At the same time, some of the more advanced ways to query and manipulate pandas' data frames like boolean masking and hierarchical indexing are different than in databases and require some careful discussion. So we'll discuss these in module three of this course. The final module of the course is dedicated to the course project where you'll take some datasets, merge and clean them, then process the data and answer some questions. In this week we'll discuss basic statistical tests and methods that ensure you have a solid grasp going forward into the next course. At the same time the intent is for your course project to be a demonstration of the skills that you've gained in manipulating messy data into something of coherence. Before we go into programming fundamentals, though, we'll talk a bit more about what data science is, and why it's sweeping over the world.