Exploratory Analysis

(200 hours)

Once the Foundations is completed, fill out an application and we'll have an admissions call to learn more about you. If accepted, you can enter the Exploratory Analysis section. You'll join other students with more programming background, and get help from teaching assistants when needed.

Work on projects with tech companies

Automate Updates and Enrichment of Client CRM

Automate Updates and Enrichment of Client CRM

Details

Discrete Data Solutions provides software and data solutions to clients facing technology hurdles. In this instance, their client was a consulting firm with a large CRM, filled with thousands of outdated contacts. The client asked for a method to automate the updating of the contact information and addition of data from various social media sources.

In order to address this problem, Michael developed the following:

Several automated web scraping tools

Data processing pipeline that integrates data from multiple sources

A series of data matching ML models that leverage word vectorization and Levenshtein or Mallow distance as well as several other methods

Email format simulator based on the supervised learning models and trained with email addresses from others in the company and classified by when they joined the company

Tasks

Web Scraping

Data Cleaning and Processing

Record Linkage

Social Media Mining

Email Format Prediction

Michael NemkeDS ConsultantDiscrete Data

Examining Treatments and Healing Outcomes

Examining Treatments and Healing Outcomes

Details

Parable reduces the cost & inconvenience of wound care through a platform that allows healthcare providers to better measure, monitor, and manage their wound patients. The platform facilitates care coordination and offers a mechanism to catch wound complications early. Specifically, Parable checks for clinical indicators of expected healing/development, captures a visual time lapse of the progression, and reminds patients (in ambulatory settings) about proper care instructions to ensure adherence.

For each case, a patient might check-in multiple times. We used data wrangling to capture the improvement of the wound between check-ins and later modeled that with the XGBoost implementation of gradient boosted trees to try to classify potential treatments either as an improvement or worsening of the wound, as well as to try to quantify what the improvement would be in an attempt to be able to recommend the best possible treatment for each case.

Tasks

Data Wrangling

Exploratory Data Analysis

Problem Formulation

Time Series Analysis

Classification

Gabriel CyprianoData ScientistCreditas

Product Taxonomy Classification for Retailers

Product Taxonomy Classification for Retailers

Details

Havenly provides remote interior design solutions for users through their online platform. The design process includes product selection from the catalogs of partner retailers. Often the product classification provided is not consistent. This is due to many retailers maintaining their own hierarchy for product classification.

To address this problem, a process was developed for taxonomy classification using product descriptions provided by partner retailers. Natural language processing tools were used in conjunction with a support vector machine classifier to achieve an accuracy score of 0.99 on the test data set.

Tasks

Natural Language Processing

Taxonomy Classification

Christopher McLaughlinBI AnalystPinnacle Agriculture

Get help from your data science mentor

Learn fast with an experienced data scientist there to guide you each step of the way.

These are example questions you might ask:

"I'm a bit fuzzy on concept X, can you share practical use cases for it?"

"Here's a machine learning project I'm working on. Can you review it and give me pointers on what to change or improve?"

Anything that helps you become a better data scientist is fair game.

Read what alumni have to say about us

Daniel CardellaPortfolio ManagerKLR Group

"I was in the first cohort and have witnessed the development of the Curriculum since I first began and believe it to be continuing to improve from a very good to an excellent base. The highlight of the program, undoubtedly, was the mentorship experience. Meeting with my mentor twice a week for numerous months while I coded my projects has proven to be invaluable in shortening the learning cycles. The program surpassed my already high expectations."

William RyanResearch AssociateU.C. Berkeley

"They have produced an excellent curriculum which teaches data science effectively — lectures and exercises get you familiar with the material, and then project-based work helps you apply it. The mentors and TAs were responsive and helpful, and proactive in offering help and advice. By the end of the program, you’ll be familiar with pretty much every tool and technique used by data scientists in their day-to-day work."

Fidel CuevasQuantitative DeveloperUBS

"You'll work with Python and learn to use machine learning to predict a variety of outcomes in computer vision, forecasting, clustering, and classification. For some perspective: I took approximately 4 months to finish the curriculum and found a new role with a substantial increase in compensation after just 22 days of searching. This is a great option for any student or professional with the motivation to do great work."

What you need to get accepted and hired

This could be as a data analyst, software engineer or applied scientist, among many other careers.

A quantitative academic degree

Most companies prefer candidates with strong academic coursework and research experience. A MS degree or PhD is usually required for most positions.

Experience with computer programming

You do not need professional experience, however, you should have spent time on your own learning and building programs.

Live in or be willing to move to a major tech hub

Approximately 80% of data science positions are located in the metropolitan areas of San Francisco and New York City. Another 15% are located in and around Boston, Chicago, Seattle, Washington DC, and Southern California (Los Angeles and San Diego). The remaining 5% are scattered throughout the country at large corporations, mid-sized companies, consulting firms and tech startups. If you are not in a large tech hub, you should be open to relocation in order to secure a data science role.