Posts categorized "Hiring"

Business Insider reports on the bottom 20 college majors, with the warning "what not to study". The list looks extremely random including everything from visual and performing arts to international relations to math and computer science (!).

The author explains these majors have the highest unemployment rates.

Here are a few reasons why you can and should ignore this study.

Presumably the journalist is advising college students, who will soon become new college graduates. But the analysis references college graduates which include anyone with a college degree, ranging from a new college graduate to someone who's worked for 40 years. New college graduates have a much higher unemployment rate than all college graduates. (See this recent Huffington Post article.)

Students do not randomly select a college major. Thus, students who choose to major in international relations are not the same type of people who decide to major in engineering. If an international relations student were to switch major to engineering, it would not follow that this person's employment prospect had suddenly brightened. It might even worsen because (a) the student could be relatively less competitive against other engineering majors; and (b) the student and the major might now be mismatched.

If everyone followed this journalist's advice, then each college would need a very small number of majors. Unless the economy suddenly produced a bonanza of jobs, where would the unemployed go? You got it... the unemployment rate would accrue to the several remaining majors, instead of being spread out over hundreds of majors. The unemployment rate of these "good" majors would rise to the level of the average unemployment rate. Oops.

Not everyone needs or wants a job. Changing majors doesn't change that reality.

***

There is an even bigger howler in the article. The analyst at Bankrate equated "job stability" with low unemployment rate, using language such as "Having a high-paying job doesn't necessarily mean you'll have job stability, and vice versa." So, majors with high unemployment rates are bad because people job-hop.

But the unemployment rate is not a measure of job tenure, and thus not an indicator of job stability. In fact, if people job-hop a lot, the unemployment rate will be relatively low, because the same job can be held by multiple people in the course of a year.

Let's imagine a country of 10 people with 5 jobs. The unemployment rate is 50%. If no one changes jobs, the same five people are employed always, and the rate is stable at 50%. The government stipulates that no person can hold on to the same job for longer than 6 months. We put the 10 people in a circle, and every other person is given a chair and seated. Every 6 months, each person moves clockwise by one slot: the five previously seated no longer have a seat, the five seats now occupied by the five previously standing. Everyone is now employed 6 months out of the year. The employment rate is now 100% (counting part-timers).

Thus, unemployment rate of zero can coincide with high job instability.

In Part 1 of my KDnuggets article, I explained what hiring managers mean when they look for critical thinking in the arena of data science and analytics. These requirements relate to the nature of data problems found in industry and business settings. The datasets are generally observational, self-selected, non-random, with hidden biases, and increasingly OCCAM (link); the business leaders have high-level objectives ("we want to increase customer loyalty"). The data scientist/analyst is the person in the "middle," trying to figure out how to make the problem precise, and solvable by a systematic analysis of available data.

In Part 2, I offer some practice case interview questions, based on three recent news events

the college admissions scandal

IPOs of ride-sharing companies like Lyft and Uber

the Blue Apron post-IPO doldrums.

Long a staple of the management consulting hiring process, the case interview is a free-flowing dialogue between the interviewer and the interviewee. The interviewer is holding back some data to simulate what is known at the beginning of a data analysis process. The interviewee must be willing to probe, digging out more data, and shaping the structure of the analysis. The end product is an analytical framework. No one knows if the framework would be successful until it is implemented.

Those who do well in case interviews are good at (a) thinking on their feet (b) embracing uncertainty e.g. by making appropriate assumptions (c) listening to the interviewer's hints and (d) persuading.

***

As I mentioned in Part 2, the best way to practice is to form a group of 3-5 people, and interview each other. If there is enough interest, we can start a group in the comments below.

When I ask job-seekers what their biggest obstacle is to finding a job in data science and analytics, one of the most frequent answers is performing during the interview. Some of them are stumped by technical interviews (coding) while even more are worried about the case interviews.

The purpose of the case interview is to test critical thinking. It is as challenging for the job candidate as for the hiring manager! Technical questions have pretty standard answers, and it's easy to score the answers. Case interviews are like essays - the hiring manager has to make judgment calls.

My piece on critical thinking is featured at the KDNuggets blog, which I've followed since I was an analyst. In this first part, I explain the two aspects of critical thinking that the case interviewer is typically looking for. There will be a part 2 in which I provide some practice examples.

This past weekend, I found my way to West Lafayette, Indiana, to speak at the Math, Data Science and Industry Conference, organized by Math Prof. Aaron Yip and Drew Swartz. I was very impressed with the quality and diversity of the talks there. They managed to strike a nice balance between academic talks and industry talks, and the BS quotient was minimal.

I will first outline my own talk, and then in a next post, I will highlight some things from the other talks I attended.

The goal of my talk is to paint a broad-brush picture of the scope of jobs that are part of the current Data Revolution, and to give a flavor for the nature of the work so that graduate students may decide for themselves whether this "data science" industry is a good fit for them.

One key takeaway is the distinction between research jobs and industry jobs. Research jobs lead to innovative research that can be published in scholarly journals. Most industry jobs demand short-term results that impact the business, and it does not matter whether the methods used are innovative. The boom in data jobs, however, is in industry jobs. Only large corporations in cash-rich industries can afford research jobs, and even at those firms, there are hundreds if not thousands of industry jobs for each research position. Math graduates can totally get hired for industry positions, if they put in a little effort to prepare for this career path.

Within industry jobs, I like to think of three job types.

Data science jobs - these are the headline-catching jobs because they are disproportionately found in the high-tech industry. Think of these as software developers with advanced database skills. The culture here is automation, removing human beings from the process.

Business analytics jobs - these jobs are tethered to business teams, such as marketing, finance, operations and customer service. They are the champions of embedding data analyses in the everyday decision-making processes. They interact constantly with business managers, providing a form of consulting service.

Data IT jobs - these people keep the data flowing in the organization, so to speak. They are also responsible for "data governance" and standardizing the formats, definitions, quality, etc. of the data. This sector is experiencing rip-roaring growth.

There is a huge need for scientific thinkers and data-savvy people in all three job types but at least half the open positions are in "business analytics." I discuss two particular gaps in skills that hiring managers often complain about in university graduates: (a) inability to develop the question and (b) not knowing how to question the data.

There is a clear reason why such gaps exist. A typical question we pose to students in a problem set first lays out the problem to be solved, then presents the set of data to be used, and finally challenges the students to plug the data into an appropriate method or framework so that the solution to the problem drops out.

The professor is not going to look kindly on the student if s/he criticizes or revamps the question or points out flaws in the data! (University classes teach theory, models and frameworks so this is not surprising.)

This brings me full circle to the distinction between research and industry jobs. In research, you can "choose your battles" by making certain assumptions to move past obstacles. For example, you assume that the (biased) dataset that you obtained is representative - lots of research papers that use observed social-media data do this. You just argue that bias correction is a separate problem to be tackled at some other time, perhaps by some other research team.

In industry, you don't have that luxury. A great solution to the biased problem may turn out to be a horrible solution to the unbiased problem. When I was at SiriusXM, we had some data on people's online listening patterns but almost nothing on their in-car listening. Building great models using the online data isn't going to do much good because most of the listening happens in the car, and people who listen online are quite different from those who listen in the car.

Towards the end of the talk, I pointed out that in order to do well in these data jobs, one must be comforable to live in the "gray" areas. There is the gray between science and social science, between models and heuristics, between data and intuition.

People were very friendly and we had some fine conversations at a bar after the day was over. I'm happy to report that at least a few people have indicated that they want to pursue these industry jobs.

I have been contributing to Andrew's thread on how to get into the data science field. A recent college grad with a degree in environmental science and minor in statistics wants a job. Andrew suggests getting a job in industry - which I think is an excellent suggestion.

Here is my advice:

Figure out what he enjoys doing – is it coding or is it problem solving? Those are two different jobs, one is software engineering, the other is more statistics and analyses. If he is in NYC, come to one of my public lectures at NYPL in which I explain how to pick a career path within this wide and exciting field. [The next one is on the May schedule.]

Once he has picked an area, and hopefully also an industry, then he needs to reach out and talk to as many people in industry as possible. Go to networking events and meetups.

Then apply to jobs. The job search is a job in itself; keep applying until someone gives you a chance. You will encounter lots of rejection but keep trying.

If nothing is working, consider going to a bootcamp. They are set up to give you practical skills that appeal to hiring managers. Talk to the bootcamp organizers to get a sense of what their vision is, and see if it’d help you make your case.

One reason I have organized a bootcamp is that for some, it will be very difficult to break into the field without extra help - both filling knowledge gaps and making industry connections. I give the above advice to my students as well. They need to find a job that matches their temperament, and then work hard at convincing hiring managers to take a chance.

***

Next Tuesday, we are hosting an Open House.

If you're interested in learning about our vision, drop by and say hello.

Last week, I got served a dose of predictive analytics. I got an email solicitation from LinkedIn, presenting a list of jobs that they think I might be interested in. This email is algorithmically generated, and LinkedIn tells me that I received it because I clicked on a job posting for a senior data analyst position at Ogilvy & Mather, a top advertising agency based in Manhattan.

Yes, I did click on that job posting while preparing for the information session for Principal Analytics Prep a few days before the email arrived. A podcast of the event is available here.

Here is what the email looks like.

There are several things one can learn from this email:

The field of analytics is absolutely exploding. There are six pages of jobs related to the one job I clicked on. Most of these jobs are junior positions (“senior analyst”), because the ad I clicked on is at that level.

Many top companies are hiring. The competitors of Ogilvy – AKGA, MRM/McCann, OMD USA, RAPP/Omnicom, J Walker Thompson – are also competing for talent. Not just advertising but other related industries are also hiring senior analysts. I recognize Shazam (famous mobile song-recognition app), frog (top design agency), A+E network, Etsy (noteworthy startup retailer), Luxottica (high-end makers of glasses), AIG, Mercer (top management consultancy), Mastercard, S&P, and Burtch Works (a top executive recruiter in the data space – also our guest at the info session!) So, the jobs are at top companies, startups and small businesses.

The jobs are spread out over all industries. Just in that small, nonrandom sample, we have representation of advertising, retail, technology, media and entertainment, e-commerce, credit cards, finance and insurance, human resources, management consultancy, and graphic design.

Analytics are needed in all job functions. The job seeker should also consider unconventional career paths e.g. the analyst at Burtch Works is a recruiter but with a specialty in data science and analytics. Another unusual path is to become an account manager at a digital advertising agency – you may not be running analyses all day long but if you have superior analytical skills, you’d be much better at explaining results and data-driven recommendations to your clients. Similarly, a salesperson for an analytics product company should definitely use foundational analytics knowledge.

One of the key reasons I started Principal Analytics Prep is to open doors in the job market for people of diverse backgrounds and diverse career paths. Analytics and data jobs are not limited to technical people who are coders and engineers. There are plenty of exciting job opportunities across all industries and job functions for data wizards with unconventional backgrounds. Please contact us if you want to learn more about how we can help guide you to your next career in data.

For many years now, the field of Data Science and Business Analytics has been booming, and hiring managers are finding a severe dearth of high-quality job-seekers. Meanwhile, there are a good number of people interested in entering the field but keep bumping into walls. Hiring managers like to hire experienced people for a host of reasons, including the fear of other hiring managers poaching their trained employees. For a number of years now, I have been interested in solving this problem.

In the next two weeks, I am collaborating with the New York Public Library's Job Search Central unit to offer two free events.

The first is a public lecture on "How to Start a Data Science and Analytics Career." I will be spending significant time explaining what this field is about, and what data science and analytics teams do. I will also discuss the structure of the job market, and provide tips for how to land a job. You can find more information about it here. The talk is scheduled for April 1.

The second is a resume review workshop. From my experience as a hiring manager, I know that many job-seekers fall short in selling themselves to organizations. It is particularly challenging for those who are seeking to entering this field, and thus do not have direct experience to call upon. This is a hands-on workshop in which I will offer advice on individual resumes. The event is free but obviously, we have limited space, so you must pre-register here.Note that you will need to upload your resume to complete the registration. The Workshop is being held on April 8.

***

I have updated the Events list on the right column of the blog. This is where you learn whether I will be appearing in a city near you. You can check out future and past events here as well.

In class last week, I discussed this New York Timesarticle with the students. One of the claims in the article is that the U.S. News ranking of colleges is under threat by newcomers whose rankings are more relevant because they more directly measure outcomes such as earnings of graduates.

This specific claim in the article makes me head hurt: "If nothing else, earnings are objective and, as the database grows into the millions, reliable."

The entire Chapter 1 of Numbersense (link) is devoted to blowing apart the myth that school rankings are "objective." In fact, I go on to assert that even objective-sounding metrics like company revenues are not objective at all... if you know GAAP and the games accountants play with those numbers. If someone buys a car on eBay from another person for $20,000, does eBay book $20,000 in revenues or just x% of the revenues that the seller pays to eBay?

Objective implies there is a ground truth that can be verified. There is no true school ranking, nor is there true revenue.

Where does this "objective" earnings data come from? Apparently a company called Payscale, whose methodology is explained here. They say their data come from "individuals who fill out the PayScale Salary Survey." How did these people discover the survey? We don't really know. Are these people representative of the universe of employed people? Most likely not but we again do not know.

Do people give out their real salaries voluntarily? Not in my experience. In fact, none of my co-workers in my 15+ years in the corporate world has ever told me how much money they make. PayScale claims that the salary number "combines base annual salary or hourly wage, bonuses, profit sharing, tips, commissions, overtime, and other forms of cash earnings, as applicable." Do people have their aggregate salaries at their finger tips? I highly doubt it (unless they make just a base salary).

In return for filling out the surveys, the individuals receive a free salary report. Does this encourage people to make up fake data just to obtain the salary report? Take a guess.

PayScale claims that it "rigorously tests and verifies" the data. Given that most employers wouldn't even do salary verification for other employers, I don't believe the salary data can be verified, and absolutely not "every data point" as claimed in their marketing materials.

The other ludicrous claim is that the "reliability" of the data improves with scale. This is only true if the data is a proper random sample of the population of all salaries, which it clearly isn't. Dumping more garbage on top of garbage is still garbage.

Consider this sequence of scenarios: if

(a) you are the Devious Dean of Admissions at a college, and(b) improving your college's ranking is on your annual performance management plan, and (c) you know that the Economist ranking is largely based on the PayScale Salary Report, and (d) PayScale's data come from the Salary Surveys which do not require explicit identity verification, and(e) you have access to various devices that can access these Salary Surveys

why are you not sending in a bunch of fake reports of outsized compensation?

Oh, my Mom reads this blog so let me not promote unethical behavior. You don't need to fake data. You can just target a bunch of alumni who have had successful careers, and encourage them to send in their true reports.

***

The other key sentence in the article is: "[The Economist] took the College Scorecard earnings data and performed a multiple regression analysis to assess how much a school’s graduates earn compared with how much they might have made had they attended another school."

I asked the students what are the explanatory (X) variables that might be found in such a regression. Some of the answers were: job title, gender, location of job, GPA, college major, number of years of work experience, family background. Essentially the "all else equal" requires a lot of covariates.

It turns out PayScale has a product called MarketMatch (link) which gives us some hints. Here is a description of this product:

The MarketMatch algorithm uses a two-step process for producing compensation data in a PayScale report. The first step is to understand which of our more than 250 compensable factors are important when it comes to pricing a job and how that job's pay is affected by these compensable factors. This is done in order to define a pay distribution for this job. The mix of compensable factors and their effect on pay is highly dependent upon the job. For example, coding languages and locations are important compensable factors for a Software Developer, while average sales prices and annual sales are important for an Account Executive.

This description leads to a very complex multiple regression model, with "more than 250" covariates, and a host of interaction effects (e.g. allowing the effect of location to depend on the job title). This model has at least 250 main effects. If it has all pairs of 2-way interactions, the regression equation has over 31,000 more terms.

PayScale did earlier impress us with their 1.4 million salary profiles (which, for the following discussion, we assume to be objective and reliable.) This, they say, translate to anywhere from 50 to 4000 profiles per school. While the lower limit is 50 students, PayScale actually does not publish results for schools with fewer than 325 profiles.

Even with 4000 profiles, you can't estimate tens of thousands of regression coefficients with any resemblance of accuracy. If each of the 250 factors were binary ("Yes"/"No"), you would have created 2^250 unique types of individuals. That number has 75 zeroes in it, and you only have 4,000 observations. For overwhelming majority of these types of individuals for which you are issuing predicted salaries, you have zero data.

The New York Times has been making waves this week featuring management practices at Amazon and workplace tracking practices at various companies (link). These are essential references for how data make us dumber.

I am going to ignore the shocking claim by the journalist who stated that GE is "long a standard-setter in management practices." To give him some credit, he did not say "good" management practice. It is true that business schools like to glorify GE managers. But the most famous GE doctrine is to line all employees up at the end of the year, and give the bottom 10% pink slips. (See Jack Welch's Wiki page.) This practice is of the same cloth as the "purposeful Darwinism" that was vilified in the article about Amazon.

What I want to focus on is the completely bonkers line of argument paraded by software vendors who sell workplace tracking (i.e. surveillance) tools.

1. The performance of your workers is completely measured by our continuous and usually stealthy tracking of data.

2. Because of the continuous and stealthy nature of tracking, the data are objective, unbiased, trustworthy, and accurate.

I couldn’t imagine living in a world where I’m supposed to guess what’s important, a world filled with meetings, messages, conference rooms, and at the end of the day I don’t know if I delivered anything meaningful.

So what are the data that would allow each worker to know every day whether they "delivered something meaningful"? The article mentioned just two types of data: the usual tracking of how people spent their time at work; and little notes workers are encouraged to send to bosses to "nudge" or "cheer" each other.

Just because you can count "nudges" or "cheers", or you can count the words, or pairs of words, or triplets of words, most frequently associated with someone, doesn't mean you know anything meaningful about their performance.

In fact, a lot of this data are manipulated, and probably worthless.

Even within the Times articles, there are multiple examples of why employee notes are not to be trusted. "People wouldn't put something negative in a public forum, because it would reflect poorly on them," said one vendor. At Amazon, employees reported that the secret feedback system is "frequently used to sabotage others". I find it hard to believe that we even need proof of such behavior. In fact, that is one of the key points I made in Numbersense.

Counting emails, or minutes spent on the work computer, is similarly pointless. Someone who spent 20 minutes on the computer is not necessarily more productive than someone who spent 10 minutes working and 10 minutes web-surfing random things. The former employee might be slower, or confused, or learning on the job, or day-dreaming. Again, it's hard to believe that we even need proof of this point.

There is a tendency to believe that data have intrinsic value. One of the worrying trends in the age of Big Data is insufficient time spent understanding if the data collected measure the right things, and whether the analyses provide even marginally trustworthy answers to the questions being asked.

In our newest column, we take on the recent media obsession with companies who make robots that hire people. (link)

As with most articles about data science, the journalists failed to dig up any evidence that these robots work, other than glowing quotes from the people who are selling these robots. We point out a number of challenges that such algorithms must overcome in order to generate proper predictions. We also discuss why measuring the outcomes of these predictions is so hard: one problem is we have no objective standard for someone being the "correct" hire; another is the action we take based on the predictions affects the outcome that was predicted.