Data Scientist vs Data Engineer, What’s the difference?

Data Scientists and Data Engineers may be new job titles, but the core job roles have been around for a while. Traditionally, anyone who analyzed data would be called a “data analyst” and anyone who created backend platforms to support data analysis would be a “Business Intelligence (BI) Developer”.

With the emergence of big data, new roles began popping up in corporations and research centers — namely, Data Scientists and Data Engineers.

Here’s an overview of the roles of the Data Analyst, BI Developer, Data Scientist and Data Engineer.

Data Analyst

Data Analysts are experienced data professionals in their organization who can query and process data, provide reports, summarize and visualize data. They have a strong understanding of how to leverage existing tools and methods to solve a problem, and help people from across the company understand specific queries with ad-hoc reports and charts.

However, they are not expected to deal with analyzing big data, nor are they typically expected to have the mathematical or research background to develop new algorithms for specific problems.

Business Intelligence Developers

Business Intelligence Developers are data experts that interact more closely with internal stakeholders to understand the reporting needs, and then to collect requirements, design, and build BI and reporting solutions for the company. They have to design, develop and support new and existing data warehouses, ETL packages, cubes, dashboards and analytical reports.

Additionally, they work with databases, both relational and multidimensional, and should have great SQL development skills to integrate data from different resources. They use all of these skills to meet the enterprise-wide self-service needs. BI Developers are typically not expected to perform data analyses.

Data Engineer

Data Engineers are the data professionals who prepare the “big data” infrastructure to be analyzed by Data Scientists. They are software engineers who design, build, integrate data from various resources, and manage big data. Then, they write complex queries on that, make sure it is easily accessible, works smoothly, and their goal is optimizing the performance of their company’s big data ecosystem.

They might also run some ETL (Extract, Transform and Load) on top of big datasets and create big data warehouses that can be used for reporting or analysis by data scientists. Beyond that, because Data Engineers focus more on the design and architecture, they are typically not expected to know any machine learning or analytics for big data.

Data Scientist

A data scientist is the alchemist of the 21st century: someone who can turn raw data into purified insights. Data scientists apply statistics, machine learning and analytic approaches to solve critical business problems. Their primary function is to help organizations turn their volumes of big data into valuable and actionable insights.

Indeed, data science is not necessarily a new field per se, but it can be considered as an advanced level of data analysis that is driven and automated by machine learning and computer science. In another word, in comparison with ‘data analysts’, in addition to data analytical skills, Data Scientists are expected to have strong programming skills, an ability to design new algorithms, handle big data, with some expertise in the domain knowledge.

Moreover, Data Scientists are also expected to interpret and eloquently deliver the results of their findings, by visualization techniques, building data science apps, or narrating interesting stories about the solutions to their data (business) problems.

The problem-solving skills of a data scientist requires an understanding of traditional and new data analysis methods to build statistical models or discover patterns in data. For example, creating a recommendation engine, predicting the stock market, diagnosing patients based on their similarity, or finding the patterns of fraudulent transactions.

Data Scientists may sometimes be presented with big data without a particular business problem in mind. In this case, the curious Data Scientist is expected to explore the data, come up with the right questions, and provide interesting findings! This is tricky because, in order to analyze the data, a strong Data Scientists should have a very broad knowledge of different techniques in machine learning, data mining, statistics and big data infrastructures.

They should have experience working with different datasets of different sizes and shapes, and be able to run his algorithms on large size data effectively and efficiently, which typically means staying up-to-date with all the latest cutting-edge technologies. This is why it is essential to know computer science fundamentals and programming, including experience with languages and database (big/small) technologies.

Related

Comments

Great article, it gives a lot of clarity as regards the developing role of the data driven community, however I wouldn't know which of this combination of skills is expected for a Chief Data Scientist, his he expected to, alongside the capabilities of Data scientists, have a bit of engineering and business intelligence? I actually feel the Business intelligence aspect is embodied in the Data Scientist package

[…] but they wanted me to do mainly Oracle work (for now). But by now, I was dead set to become a data engineer. But would anyone take that risk with me? Strangely as it sounds with 19 years of IT experience […]

Only to clarify. None on these roles are one above another, the difference is just the focus of interests beyond the tools they common use. My experience with "The alchemists" is that they have strong mathematical algorithm skills and a poor basic programming language knowledge or viceversa, lacking the mathematical support with a high level of computational background. This happens when we only describe a role only under the techinique scope.

That is not a general rule but I guess we usually mix terms, the term "scientist" generally applies to someone who uses the "scientific method" to probe theories, in this case regarding data. The engineer on the other side, uses this knowledge to build tends to be more pragmatic.

I am a data engineer for past 3 years working with hadoop and mainly doing things at infrastucture level, solving issues with hadoop platform. I have some experience with ML and using ML with spark and doing some basic data science stuff. I want to move into being a Data scientist.

is it good idea to do that? As I do not have the degree in maths/stats/any formal data science training. Though I am a CS graduate, have experieecne in variuos data bases/programming launguages, working in IT for 11 years?