Insight

AlixPartners' Analytics Challenge

At AlixPartners, we’re always looking for smart, accomplished, and quick learners with high energy and a demonstrated ability to get results. We put a lot of effort into the screening, interview, and assessment process to ensure not only that you have the right skills and experience, but also that our culture and core values are a good fit for you.

Data analytics is an important capability for our team. To see job descriptions and open positions in this area, please visit the Careers: Search Jobs page of our website and select Digital from the categories dropdown menu.

If you do not find an open position that suits you, but you are still interested in the firm, you may also submit your CV and cover letter to Digital-Recruiting@alixpartners.com.

If you are interested in joining our Digital team, consider taking the AlixPartners’ Analytics Challenge.

Data: at AlixPartners, we deal with a lot of it. Whether it’s parsing terabytes of forensic information in an electronic discovery setting or building a predictive model to improve the revenue forecasting ability for one of our Fortune 500 clients, the circumstances are always different but the challenge remains the same. What actionable insights can we provide, from the data we are able to obtain, to improve our client’s situation—when it really matters?

That’s where you come in. The high-pressure situations we typically operate under are rarely conducive to luxurious, clean, normalized data sets. If you love working with ambiguous, incomplete, duplicative, or otherwise outdated data, then tackling one of our challenges might be for you.

Challenge #1: P>N

As consultants, we are often required to make inferences based on limited amounts of data. In this dataset, with more variables than observations, traditional tools such as regression may fail.

You'll receive a dataset with 300 random variables (each drawn from [0,1]). A secret algorithm was used to compute a binary target variable based on these data.

The training dataset has 250 rows, and the test dataset has 19,750 rows. The goal is to build a model based on the training dataset that accurately classifies the test dataset. Be careful what variable selection techniques you use, and don’t overfit. You'll be evaluated using>area under the ROC curve.

Submit a brief report describing your methodology and results. Apply your best model to the 19,750 Target_Evaluate rows and submit the probabilities that each is equal to 1. The format of this file must be a comma separated values file (.csv) with 19,750 rows, the first column being the row id, the second being the predicted probabilities.

As consultants, we are often required to make inferences based on limited amounts of data. In this dataset, with more variables than observations, traditional tools such as regression may fail.

You'll receive a dataset with 300 random variables (each drawn from [0,1]). A secret algorithm was used to compute a binary target variable based on these data.

The training dataset has 250 rows, and the test dataset has 19,750 rows. The goal is to build a model based on the training dataset that accurately classifies the test dataset. Be careful what variable selection techniques you use, and don’t overfit. You'll be evaluated using>area under the ROC curve.

Submit a brief report describing your methodology and results. Apply your best model to the 19,750 Target_Evaluate rows and submit the probabilities that each is equal to 1. The format of this file must be a comma separated values file (.csv) with 19,750 rows, the first column being the row id, the second being the predicted probabilities.

Challenge #2: Fuzz-ography

Lack of data validation is commonly pinpointed as the most common failure when assessing application security weakness. Employing it correctly, however; has a significant effect on not only ensuring security but also encouraging input completion, efficiency, consistency and the minimization of errors in data captured by information systems.

Understanding these identified benefits raises the obvious question: “Why isn’t good validation employed in all places”? The simple truth is that it isn’t easy. As the demands of data capture and tracking grow, validation rules must be developed in sync. Doing so – on both the client and server side of applications -- is a tedious process and tedium begets errors.

In this problem, you’ll be provided with the input data set that is nearly three million City/Country pairs. To validate this data use the city name and country code from the free world cities database (http://www.geodatasource.com/download). The output of your program should accurately match each entry to its correct/cleaned city spelling and country code.

The output file should be distinct data set (no duplicates) provided as a pipe '|' delimited .txt file. The output file should contain five fields: Input_City, Input_CountryCode, Output_City, Output_CountryCode, Output_BlankCity (Yes/No). Other descriptions or Lat Long pairs may be added as additional output fields.

You’ll be evaluated solely on the basis of a correct match percentage—although we can’t say we won’t give bonus points for geo-coding your results!

Lack of data validation is commonly pinpointed as the most common failure when assessing application security weakness. Employing it correctly, however; has a significant effect on not only ensuring security but also encouraging input completion, efficiency, consistency and the minimization of errors in data captured by information systems.

Understanding these identified benefits raises the obvious question: “Why isn’t good validation employed in all places”? The simple truth is that it isn’t easy. As the demands of data capture and tracking grow, validation rules must be developed in sync. Doing so – on both the client and server side of applications -- is a tedious process and tedium begets errors.

In this problem, you’ll be provided with the input data set that is nearly three million City/Country pairs. To validate this data use the city name and country code from the free world cities database (http://www.geodatasource.com/download). The output of your program should accurately match each entry to its correct/cleaned city spelling and country code.

The output file should be distinct data set (no duplicates) provided as a pipe '|' delimited .txt file. The output file should contain five fields: Input_City, Input_CountryCode, Output_City, Output_CountryCode, Output_BlankCity (Yes/No). Other descriptions or Lat Long pairs may be added as additional output fields.

You’ll be evaluated solely on the basis of a correct match percentage—although we can’t say we won’t give bonus points for geo-coding your results!