When interviewing for any position, you should be evaluating the company just as much as they are evaluating you. While you can research the company beforehand on glassdoor and similar sites, interviews are the best place to get a deeper understanding of the company and ask important questions. Companies will never straight up tell you they are bad to work for, so you have to look for the signs yourself.

Here is our list of 12 signs the company you are interviewing with for a data scientist job should be avoided (and the questions to ask during the interview). The first six mainly apply to companies that already have multiple data scientists or analysts. If you’re thinking of joining a company as their first data scientist, you’ll face a whole different set of challenges, including most likely doing a lot of data engineering work (see flag 1) and spreading a data science mindset. Someone has to do it, but we generally advise against it for your first data science role unless you come from an engineering background and want to do that work. If there’s just a data science leader and they’re building out a team, ask how they plan to handle the issues raised below, but keep in mind it’s always easier to promise an ideal system than implement one.

Red flags on how the data science team runs

No data engineering or infrastructure. Data science requires data to be easily available for analysis. If the company doesn’t have a well-maintained data infrastructure, you won’t have what you need to do your job. A data engineer is a person who prepares data for analysis, and if your company doesn’t have them you’ll have to do the work yourself. If you feel qualified to take on the role of a data engineer that may be okay, but otherwise you’ll be struggling to deliver anything of value. Question to ask during the interview: what is your data infrastructure like and who maintains it? What format is the data typically in (Excel, a SQL database, csv)?

No peer review between data scientists. A strong data science team will have ways to ensure mistakes don’t slip through the cracks. These can include code reviews, practice presentations, and consistent check-ins with the team. If the team doesn’t consistently do these, mistakes won’t be found until the work is already delivered, which usually ends with someone getting reprimanded. Question to ask: what steps does the team take for QA and peer review?

No standard set of languages on the team: Many data science teams take the approach of letting anyone on the team use any language they want. The idea is that if everyone uses their favorite languages work will be completed faster. There is a huge problem with this: when everyone uses separate languages, no one will be able to pass off work to anyone else. Every data science task will have a single person responsible for it, and if they quit, get sick, or just need help no one will be able to do so, creating a very stressful environment. It’s fine to use R, Python, or even dare we say SAS, but just have a consistent set of languages amongst the team. Question to ask: what languages does your team use, and how do you decide whether to adopt a new one?

No understanding of the data hierarchy of needs: Similar to not having a data infrastructure, some companies get really excited about concepts like AI without having the foundation in place. Machine learning and AI require a company to have a high level of data science maturity, including understanding how to build models, their limitations, and how to deploy them. You might get blamed when their unrealistic expectations meet reality. Question to ask: how does the company balance spending time on complex approaches like AI with foundational work like cleaning data, checking data quality, and adding logging?

No version control: Mature data science teams use git to keep track of changes to analyses and code. Other teams instead use methods like shared network folders, which don’t let you see when things changed, why they are changed, or previous versions. Occasionally teams don’t share code at all and work just lives on the data scientists individual laptops. Avoid these last groups like the plague. Not having methods of sharing code means the team can’t work together. Question to ask: how to you share code amongst the team? Is all code shared or just some of it?

No clear delineation between people who run reports vs do analyses: The skillsets required to create and maintain reports, to build data science models, and to put machine learning models into production are all different. If the company doesn’t have a clear way of determining who does what work, you could start your job and end up doing work totally different than what you expected. You don’t want to walk in on your first day expecting to build a time-series forecast and find out your job is to refresh the monthly sales Excel spreadsheet. Question to ask: how are reporting, analysis, and production-model building tasks split?

Red flags on how they value people

A totally non-structured interview process: A structured interview process means that each candidate gets the same set of questions and can be more equally compared. Not only does it decrease bias, it also requires the team think through what’s important in the people they bring on. If the interview process is unstructured, with the interviewer seemingly asking questions off the top of their head, then it’s a strong sign they haven’t figured out what they want in a candidate and how to get it. If they don’t know what they want you are going to have trouble giving them what they want on the job. Suggestion: see if they bring a set of questions to the interview, or ask the meta-question of how they chose what to ask you.

No time for your questions: Since interviews are also for you to find out about the company, you need to have time to do so. If there isn’t time for you to ask questions, the interviewer isn’t interested in making you feel comfortable and allowing you to assess your fit. Suggestion: if you get to the interview and you didn’t have time to ask questions, make a note of it and ask the interviewer when would be a good time to ask them instead.

No coding required in the interview: While programming isn’t the most important skill for a data scientist, it is something you will have to do on the job. The coding part of the interview could be on-site or a take-home test, but it should definitely exist. If the interview process doesn’t include programming it could be for a few reasons: (1) The data science team is new so no one can run the interview. In this case, be aware you won’t have support on the job. (2) The team doesn’t have the time to create a programming interview. This is a sign they don’t value hiring. (3) They don’t program and use BI tools like Tableau and Excel for their work. (4) They trust your resume so much that they don’t need to test you. While flattering, this a sign that they are desperate to hire. Suggestion: if the interview doesn’t include a programming component, ask them how do they tell which candidates have the technical skills for the job.

No plan for your first few months: Companies put job postings out for good reasons. If they aren’t able to clearly articulate exactly what you’ll be doing in the first few months, that reason is probably “we are totally overwhelmed with work and we are going to throw people at the problem until we can handle it.” That’s an extremely dangerous way to grow a team. What’s worse, this usually happens at the companies that don’t have onboarding processes for new hires. So these situations are extremely stressful for the whole team and that usually falls on you too. Suggestion: ask if they have a clear project and on-boarding process for your start. If they don’t have an extremely precise answer for this, run.

No support for continuing education: Data science is a large and rapidly advancing field and if you don’t keep learning you’ll fall behind. Teams should have some way of helping people keep up, whether it’s by providing funding for online education or attending conferences, monthly meetings where you discuss industry blog posts, encouragement to attend meetups or get involved in open source, or a speaker series. This also shows they’re willing to invest in their people generally. Suggestion: ask how they support the continued education of the team. Is there funding for conferences or workshops?

Inconsistent answers between interviewers about the role: Usually interviews have you talking to many people within the company, including your future manager, teammates, and business stakeholders. If they each tell you different things around the level of responsibility, type of work, what the role delivers, and hours you’ll have to work, they themselves probably don’t agree. If they can’t agree, especially on things related to what work you’ll end up doing, your job will end up being full of conflict. Suggestion: keep track of what people say in different interviews. If you find an inconsistency, ask why.

While these twelve flags may feel like a lot, companies tend to show none of them or most of them, with only few in the middle. By keeping an eye out of them you can avoid the problem of getting to a job you dislike. If you want more data science thoughts and tips, we wrote a book on data science careers, “Build a Career in Data Science,” available for 40% off with the code buildbook40%.

Emily Robinson

I recently started playing Pokémon again - “Pokémon Let’s Go Eevee” on the Nintendo Switch to be specific. In the classic Pokémon games, you have a team of 6 Pokémon that you use to battle against other trainers. In battles, type match-ups are very important, as some types of moves are “super effective” against other types. For example, fire moves are super effective against grass Pokémon, which means they do double the damage they normally would.

In this post, I try something new and share an analysis I did without stopping to explain the code along the way (with a few exceptions). I analyze a dataset on Bob Ross paintings from last week’s Tidytuesday, an initiative by the R for Data Science online learning community. Each Monday, a new dataset is posted on GitHub with a short description. You can see some analyses and visualizations people have done by searching for the #tidytuesday hashtag on Twitter.

Have you ever had a “first this then that” question? For example, maybe you’re an e-commerce business and you want all the times people clicked on an item and then added it to their cart within 2 days, or the last page they visited before registering. Or you work with pharmaceutical data and need to know what drugs people took before drug x and which drugs they took afterward and when.

I recently completed Colin Fay’s excellent DataCamp course, Intermediate Functional Programming with purrr (full disclosure: I work at DataCamp, but part of why I joined was that I was a big fan of the short, interactive course format). Although I’ve used the purrr package before, there were a lot of functions in this course that were new to me. I wrote this post to hopefully demystify purrr a bit for those who find it overwhelming and illustrate some of its lesser known functions.

In early 2018, I gave a few conference talks on “The Lesser Known Stars of the Tidyverse.” I focused on some packages and functions that aren’t as well known as the core parts of ggplot2 and dplyr but are very helpful in exploratory analysis. I walked through an example analysis of Kaggle’s 2017 State of Data Science and Machine Learning Survey to show how I would use these functions in an exploratory analysis.

When I was working at Etsy, I benefited from a very robust A/B testing system. Etsy had been doing A/B testing for more than 6 years. By the time I left, Etsy’s in-house experimentation system, called Catapult, had more than 5 data engineers working on it full-time. Every morning, I was greeted with a homepage that listed all the experiments that Etsy had run in the prior four years. When you clicked on one, you got a summary of what the experiment was testing (usually written by the product manager).

This post was co-written with Jacqueline Nolis, Principal at Nolis, LLC. Check out the rest of her blog posts, including ones on prioritizing data science work, hiring data scientists, and what to do when your data science project isn’t working.
When interviewing for any position, you should be evaluating the company just as much as they are evaluating you. While you can research the company beforehand on glassdoor and similar sites, interviews are the best place to get a deeper understanding of the company and ask important questions.

Following Dave Robinson’s sage tweet to write a blog post when you’ve given the same advice three times, this post is a collection of my thoughts and recommendations for people interested in applying to data science jobs in the US. Many of these principles also apply to tech jobs in general.
A disclaimer: I have never worked as a recruiter or career coach. This knowledge comes from mainly from my study of Organizational Behavior (including negotiations and women in tech) in graduate school and my own career.

In my last post, I discussed the importance of building your network and some strategies for effectively reaching out. I closed with emphasizing how helpful your peers or people one step ahead of you can be. But there’s a specific area where people with more resources, status, or experience can help you: sponsorship.
What is sponsorship? When people discuss what they’re seeking from a more senior person in their field, they usually talk about “mentorship.

So you’ve heard you’re supposed to network. That’s the key in getting a job or establishing a reputation in your broader field, right? And it’s true that the importance of having a good network is supported by a lot of social sciences research. But if the thought of networking makes you cringe, you’re not alone. Many people equate networking to sending out millions of unsolicited Linkedin requests with no message, handing out 20 business cards at a meetup once a week, or sending emails to prominent data scientists with the subject line “Can I pick your brain?