Other sites

The Team Data Science Process

As more and more organizations are setting up teams of data scientists to make sense of the massive amounts of data they collect, the need grows for a standardized process for managing the work of those teams. To help with this, the data science team at Microsoft has drawn on their experience with large-scale data science projects to develop the Team Data Science Process. The process is built around this data science lifecycle:

The Team Data Science Process proposes a standardised directory structure for managing the data, code and documents for a data science project, and provides for tracking of those artifacts using a version control system such as Git. It also proposes a shared distributed analytics infrastucture to provide the computational and storage resources that the data scientist tools rely on. It also provides two open-source utilities to support data scientists: