By the deadline of submission, you are required to submit 3 artifacts:

An
executable, which can take an input file with the same format as in the
validation set and output a file with the classification results. The output
file should not contain any header. Each line represents a classified label
according to the order of the input data. “1” should mean “access” and “0” for
“no access”.

A
short manual. The manual should have 3 parts: 1) running environment, e.g.
windows/linux, required packages such as matlab, R, etc. 2) tutorial how to run
the program to get output; 3) metrics on the validation set, the metrics
include classification error (required), confusion matrix (required), ROC
(optional).

Your validation process will
provide a ‘YES’/’NO’ [action] for a [resource] given a [mgr_id,
role_rollup_1, role_rollup_2, role_deptname, role_title, role_family_desc,
role_family, role_code] tuple. This data set has the [action] column filled in so you can validate your model.

The objective of this competition is to build a model, learned using
historical data, that will determine an employee's access needs such
that manual access transactions (grants and revokes) are minimized as
the employee's attributes change over time. This is a
clustering/collaborative filtering exercise. The model will take an
employee attribute record and a resource code and will return true if
the employee should be given access this resource and false if the
employee should not be given access to this resource.

The problem can be formulated as follows:

At time T, create a snapshot of STATUS(EMPLOYEE_ID, RESOURCE_ID), which is either 1(access) or 0 (non-access). Build a system F, which models STATUS ~ {EMPLOYEE ATTRIBUTES, RESOURCE ATTRIBUTES}.

Therefore at time T, for each employee, we have an access profile PROFILE(EMPLOYEE_ID, T).

The measure of success is to minimize the cost of add/remove actions for all employees for a given time perdiod.

add action: a manual add_access during the test period results in a penalty if EMPLOYEE_ID-RESOURCE_ID or RESOURCE_ID is not in PROFILE(EMPLOYEE_ID)

remove action: a manual remove_access action results in a penalty if EMPLOYEE_ID-RESOURCE_ID or RESOURCE_ID is in PROFILE(EMPLOYEE_ID)

No formal sign up is required. Just download the data and make a submission according to the
guidelines provided in the documentation.

Why does the data show that some employees report to a different manager at the same time?

You can ignore the employees that show they report to a different manager at the same time. There
are some employees that serve multiple purposes for the company and are the source for these data
points. This is real industry data; it doesn't always behave the way you think it should.