How to Build a Career in Data Science

What is Data Science

Data science is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in either structured or unstructured forms.

From scientific discovery to business intelligence, data science is changing our world. The dissemination of nearly all information in digital form, the proliferation of sensors, breakthroughs in machine learning and visualization, and dramatic improvements in cost, bandwidth, and scalability are combining to create enormous opportunity.

The field also presents enormous challenges, thanks to the relentless increase in the volume, velocity, and variety of information ripe for mining and analysis.

“Data scientist” has become a popular occupation with the Harvard Business Review dubbing it “The Sexiest Job of the 21st Century” and McKinsey & Company projecting a global excess demand of 1.5 million new data scientists.

How to Build Your Profile for MS in Data Science?

Kaggle is a platform for predictive modelling and analytics competitions in which companies and researchers post data and statisticians and data miners compete to produce the best models for predicting and describing the data. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know at the outset which technique or analyst will be most effective.

Kagglers come from a wide variety of backgrounds, including fields such as computer science, computer vision, biology, medicine, and even glaciology. It also includes many of the world’s best-known researchers, including members of IBM Watson’s Jeopardy-winning team and the team working on Google’s DeepMind. Many of these researchers publish papers in peer-reviewed journals based on their performance in Kaggle competitions.

How does Kaggle Competitions Works?

Companies and organizations prepares the data and a description of the problem. Kaggle frame the competition, anonymize the data, and integrate the winning model into their operations.

Participants, like you, experiment with different techniques and compete against each other to produce the best models. Work is shared publicly through Kaggle Scripts to achieve a better benchmark and to inspire new ideas. Submissions are made through Scripts or through private manual upload. For most competitions, submissions are scored immediately (based on their predictive accuracy relative to a hidden solution file) and summarized on a live leaderboard.

After the deadline passes, the host company pays the prize money for the winning solution. many companies recruit participants based on their place on the leaderboard, final score, and submitted scripts.

Alongside its public competitions, Kaggle also offers private competitions limited to Kaggle’s top participants.

What Kaggle competition should a beginner start with?

I’d start with the tutorials first just to make sure you have a good grasp of the primary tools and techniques that most people use: https://www.kaggle.com/wiki/Home

Afterwards, Titanic: Machine Learning from Disaster is a good competition to start. It will prep you with fundamentals of data science – the data size is manageable, the problem is interesting, and you need minimum overhead in terms of computational requirements.

Since your objective is learning, the most important place for you is the Kaggle forum. There is just tons of valuable information buried in those posts. What worked, what didn’t work, the issues others are facing, interesting patterns and visualizations, and neat tricks. I find it to be the best “practical” data science guide out there.

Career in Data Science

A career in Data Science involves statistics, mathematics, business, economics and Computer Science.

After a Master’s in Data Science, you can work in various sectors such as finance, healthcare, consulting, retail or consumer products – basically any field where there is lots of data and there is a requirement to analyze large data sets to develop custom models and algorithms to drive business solutions.

With regard to Data Science, the primary focus is on applications rather than research. You use some knowledge from Computer Science (data structure, deep learning, computer vision, natural language processing, machine learning) in your data science role.

The average salary for a job in Data Science in the US is about $113,000 as per Glassdoor. Another source – Payscale – puts the median salary at about $93,000.

Let’s have a look at the application of data science in different fields.

#1 Data Science in Retail

With online commerce, retail data is increasing exponentially in terms of volume, the velocity at which data is being generated and their value for the kind of insights and profit they could offer. As per McKinsey’s report on Big Data, retailers using big data analytics could raise their operating margins by as much as 60 percent.

The following points are a few of the applications of big data in retail:

#2 Data Science in Health Care

In the US, health care expenses represented 17.6% of the GDP in 2013 with annual spend of $2.6 trillion. Out of this, $600 billion was consumed by waste and fraud. By 2020, this figure is estimated to rise to nearly 20%.

Big Data has the potential to help physicians make better decisions across the board – from personalized treatments to preventive care, while, at the same time, slashing the cost of providing health care services.

The following list details some of the applications of big data in retail:

Genomics: Inexpensive DNA sequencing and next-generation genomic technologies are changing the way health care providers do business. They are getting better understanding of the genetic bases of drug response and disease by combining genomic data with other data in disease research.

Patient monitoring and home devices: Wearable body sensors – sensors tracking everything from heart rate to testosterone to body water – can take vital stats of the patients every minute of the day. Personal ECG heart monitor, medical monitoring devices and mobile applications are cropping up daily.

#3 Data Science in Finance

There has been a flood of financial data in recent times from various sources such as social media activity, mobile interactions, server logs, real-time market feeds, customer service records, transaction details and, of course, information from existing databases.

The following list details some of the applications of big data in finance:

Sentiment analysis: Use natural-language processing, text analysis and computational linguistics to discover what people really think.

Automated risk credit management: Alibaba has successfully used big data to offer loans to entrepreneurial online vendors without any collateral by using their transaction records, customer ratings, shipping records and a host of other info.

Predictive analytics: For example, whether certain customers are likely to pay off their credit cards using the demographic characteristics of customers’ neighborhoods and making calculated predictions.

#4 Big Data in Telecom

Mind Commerce, a market research firm, predicts that the big-data-driven telecom analytics market will grow by nearly 50 percent from 2014 to 2019 and forecasts that by the end of 2019, the market will be up to $5.4 billion in annual revenue.

Location-based initiatives: use geo-fencing and sensor technology data scientists can predict a subscriber’s location and specific data needs with stunning accuracy to, for example, create targeted offers, when a subscriber is in a super market

Churn prevention: combine variables such as calls made, minutes used, number of texts sent, average bill amount and behavior such as visiting competitor’s website to predict the likelihood of subscriber changing to a competitor for bargains

There are similar applications of big data in other domains such as Utilities, Travel and Transportation, Insurance, Pharmaceutical, Manufacturing, Gaming, Hospitality, Biotech and Energy.

Let’s quickly compare a career in Data Science with a career in Machine Learning.

Career in Machine Learning

Machine learning is the study of how computers can learn complex concepts from data and experience, and seeks to answer the fundamental research questions underpinning the challenges outlined above.

The field of machine learning crosses a wide variety of disciplines that use data to find patterns in the ways both living systems, such as the human body and artificial systems, such as robots, are constructed and perform.

Whether it’s being applied to analyze and learn from medical data, or to model financial markets, or to create autonomous vehicles, machine learning builds and learns from both algorithm and theory to understand the world around us and create the tools we need and want.

In a Machine Learning job, you are expected to solve new and emerging technical challenges related to human-machine interactions.

In your role, you will utilize core computer science and engineering skills like high-performance computing, distributed systems and applied math.

You are expected to have 5+ years of experience in programming parallel and distributed systems, debugging low-level problems, performance analysis and optimizations, and numerical methods.

Also include – experience in using machine learning techniques for classification, regression, or ranking problems, experience in building predictive models for recommendations or personalization, design and implementation of shipping, innovative consumer products etc.

How to shortlist universities for MS in Data Science?

Factors important to identify best universities in machine learning / data science

1) University reputation (rankings)

This factor is important in general but more so for the data science programs. This is because most of them are relatively new, i.e. around 2-4 years old and it’s difficult to establish credibility in the industry in such a short duration. – Thus, the university brand name plays a key role on how your candidature will be perceived in the industry after completing the degree. No doubt, your knowledge would always matter more, but university reputation plays a crucial role for new courses.

2) Location

Location plays a pivotal role in practical learning opportunities outside the campus. Practical training typically comes in the form of internships, capstone projects, weekend hackathons, etc. Given that data science is a highly application-oriented domain, practical training would play a crucial role in your overall development. – While you are in the program, its location can have quite an impact on your profile in terms of getting good internship opportunities. Also, a strong data science community gives access to specialized skill meetups and hackathons. For instance, the data science communities in cities like New York or Silicon Valley will be much stronger than other suburban locations. – After the program, a good location definitely helps with the job search as there will be ample employment opportunities.

3) Curriculum

I believe this is the most important aspect and the first thing which you should check out. The curriculum actually tells you what subjects you’ll be studying and gives an impression about the relevance of the program for you. Typically, coursework is divided into core courses (compulsory courses) and electives. You should also check out the list of courses from which you can choose the electives. – Curriculum flexibility i.e. the ratio of elective courses, is another important factor. It can vary from as high as 60-70% in some courses to almost none in others.

4) Industry Collaborations

Since most of the programs in data science related courses are professional, industry collaborations will play a key role in your experience through the program. You should check out the particular companies, which domain they belong to, what sort of activities are conducted like technical talks, research collaboration, capstone projects, etc.

5) First Hand Experience

The first step is to log into the university’s website and have a look at the details of the program. You can do a first level filtering based on the evident information on website. But, an equally important aspect is to talk to people who are already studying there as well as the university’s alumni. You can definitely apply to all the colleges you like, but for making the final choice, I can’t over-emphasize the importance of this step, which will give you a true picture about the college administration and recognition in the industry. These factors are really hard to judge from any university’s website. Also, given that these programs are mostly new, the amount of discussions on third-party websites like Quora are also limited. – If you’re wondering how to find these people, again LinkedIn and Facebook are your best friends!

6) Program name is not so important!

The traditional philosophy – ‘Don’t judge a book by its cover’ works in this case as well. Since Data Science (and Machine Learning) is a non-traditional program, you’ll find all sorts of names like Masters in Analytics, Masters in Business Analytics,Masters in Data Science, Masters in Predictive Analytics, Masters in Marketing Analytics, Masters in Information Systems, etc. Trust me, names can be very misleading. Although, they do give you an idea of what the program is all about, the name of the program should definitely be your last concern, if at all!

13 Schools for MS in Data Science that you can consider

The following schools are some of the best schools that offer programs in Data Science, and you can consider these for your reach, match and safe shortlist.

#1 University of Southern California

Program: MS Data Science

The MS in Computer Science – Data Science provides students with a core background in Computer Science and specialized algorithmic, statistical, and systems expertise in acquiring, storing, accessing, analyzing, and visualizing large, heterogeneous and real-time data associated with diverse real-world domains including energy, the environment, health, media, medicine, and transportation.

#2 Columbia University

Program: MS Data Science

The Master of Science in Data Science allows students to apply data science techniques to their field of interest. Our students have the opportunity to conduct original research, included in a capstone project, and interact with our industry partners and faculty. Students may also choose an elective track focused on entrepreneurship or a subject area covered by one of our seven centers.

Who should apply – Individuals looking to strengthen their career prospects or make a career change by developing in-depth expertise in data science. Candidates for the Master of Science in Data Science are required to complete a minimum of 30 credits, including 21 credits of required/core courses and 9 credits of electives.

#3 University of Rochester

Program: MS Data Science

The Goergen Institute for Data Science offers a STEM-accredited MS program in data science. This program allows students to study the broad area of data science or to concentrate their studies in one of the following areas:

Computational and statistical methods

Health and biomedical sciences

Business and social science

The program can be completed in either one year or one and a half years of full-time study. Each graduate will receive a degree conferred by the University’s School of Arts and Sciences.

#4 University of Massachusetts Amherst

Program: MS CS (With concentration in Data Science)

The Computer Science Masters with a Concentration in Data Science was created to help meet the need for expanded and enhanced training in the area of data science. It requires coursework in Theory for Data Science, Systems for Data Science, Data Analysis and Statistics.

Aerial photo of computer science buildingThe Masters Concentration in Data Science teaches you to develop and apply methods to collect, curate, and analyze large-scale data, and to make discoveries and decisions using those analyses.

#5 University of Washington Seattle

MS Data Science

The Master of Science in Data Science at the University of Washington gives you the technical skills to extract knowledge from large, noisy, and heterogeneous datasets — big data — to provide insights that people and organizations can use.

Our interdisciplinary curriculum was developed by leading faculty from six top-ranked departments and schools at the UW, with input from top companies looking to hire data science professionals. In this program, you’ll build deep expertise in managing, modeling and visualizing big data to meet the growing needs of industry, government, nonprofit and research organizations today.

#6 University of California Irvine

Program: Master of Science in Business Analytics

MSBA program at The Paul Merage School of Business at UC Irvine is a STEM-designated, intensive one-year, full-time degree program for students with or without work experience. Taught by world-class faculty and leading researchers in the field, graduates of this program will be eminently qualified for big data and analytics careers.

#7 New York University

Program: MS in Data Science

The Master of Science in Data Science is a highly-selective program for students with a strong background in mathematics, computer science, and applied statistics. The degree focuses on the development of new methods for data science.

We live in the “Age of the Petabyte,” soon to become “The Age of the Exabyte.” Our networked world is generating a deluge of data that no human, or group of humans, can process fast enough. A new discipline has emerged to address the need for professionals and researchers to deal with the “data tidal wave.” Its object is to provide the underlying theory and methods of the data revolution. This emergent discipline is known by several names. We call it “data science,” and we have created the world’s first MS degree program devoted to it.

The curriculum is 36 credits, and offers two ways to structure the graduate program that gives students the opportunity to pursue a specialization through tracks. NYU will offer a limited number of tuition scholarships to selected students admitted to the program. All applicants for admission will be considered for these awards on a competitive basis.

#8 University of Texas Dallas

Program: MS CS (Data Science Track)

STRATEGICALLY LOCATED in the middle of the Telecom Corridor, which is home to hundreds of hightech companies, the Computer Science Department is in the midst of a growth phase that includes addition of new programs in cybersecurity, information assurance, data sciences and interactive computing, hiring of a large number of new faculty, and a steep increase in external research funding.

#9 Michigan Technological University

Program: MS Data Science

Our degree will provide you with a broad-based education in data mining, predictive analytics, cloud computing, data-science fundamentals, communication, and business acumen. Additionally, you will gain a competitive edge through domain-specific specialization in disciplines of science and engineering. You will have the freedom to explore and develop your own interests in one or more domains.

#10 Illinois Institute of Technology

Program: MS Data Science

In Illinois Tech’s Master of Data Science program, you learn to explore data using high-level mathematics, statistics, and computer science. In particular, you learn how to analyze data, visualize your results, and articulate your discoveries. You will leave the program with the ability to think about the real problems that need to be solved, not to simply find technical solutions.

You will learn to question underlying premises and reformulate issues, explore and improve the structure of available data, create and evaluate models, construct and test hypotheses, draw conclusions, and determine if the results make sense in the real world. You will then learn to communicate these results to specialists and non-specialists alike.

The program is offered to full-time students on the Mies Campus, just minutes south of the Chicago Loop, a global finance center, whose businesses rely on amassing data in finance, healthcare, retail, manufacturing, consumer services, tourism, professional sports, and cultural activities. In this international city on the shores of Lake Michigan, data science students have the opportunity to engage in Chicago’s thriving tech community. Moreover, the City government allows and encourages access to its publically available wealth of municipal data.

#11 Cleveland State University

Program: MS Computer & Information Science

The Information Systems (IS) track in the Master of Computer and Information Science (MCIS) program at Cleveland State University is a specialized degree program designed to prepare students for careers as information professionals. The IS track is housed within the Monte Ahuja College of Business. The coursework combines a blend of technology and management-oriented courses designed to prepare the next generation of technology managers to lead enterprises in innovative ways. Specializations within the program allow students to build specific skill sets in areas such as: information security, information technology management and business analytics.

#12 Rochester Institute of Technology

Data science, a term first coined in 2008 by data analytics leaders at Facebook and Google is a new and inherently multidisciplinary field – combining computing, mathematics, statistics, and the sciences – devoted to the management and analysis of massive, mostly unstructured data. RIT’s approach to data science is distinctly different from the existing programs. Firstly, this degree is career focused, aiming to equip students with practical skills to handle large-scale data management and analysis challenges that arise in their daily work. The career-focused degree will also significantly benefit from one of the world’s largest co-op programs at RIT, which brings in practical problems, real world data, and software tools commonly adopted in industry to enrich our curriculum.

Secondly, the program is highly interdisciplinary and domain driven, focusing on domain specific problems and solutions. It also provides students the opportunities of interdisciplinary study. Important domains (e. g, biology, physics, and statistics) are judiciously selected and integrated as part of the curriculum to provide customized, domain-specific training to next-generation data scientists.

#13 Wayne State University

MS Data Science and Business Analytics

The Mike Ilitch School of Business and College of Engineering have developed a novel Interdisciplinary Master of Science in Data Science and Business Analytics program, which is designed to help students excel in both industry and academia.

This novel Master’s program is designed to provide students with a broad range of data science and business analytics knowledge and skills. Each student will need to select one of the three major concentrations to provide a specialized track to give them more in depth knowledge and skills in that area of specialization.

When applying to the program, applicants will have to select which Program and Major Concentration they want to apply for:

Data Science and Business Analytics – MS Business Program – Data-Driven “Business” Concentration

Get professional counseling to shortlist and apply to schools for Data Science that match best with your background and career goals

Admission Table continually refined my shortlist to cater my needs, pushed my thinking on how I presented myself in entirety to the Admissions Committee, and how I could get support of students and professors at schools that I applied.