About

The ACM RecSys Challenge 2017 is focussing on the problem of job recommendations on XING in a cold-start scenario. The challenge will consists of two phases:

offline evaluation: fixed historic dataset and fixed targets for which recommendations/solutions need to be computed/submitted.

online evaluation: dynamically changing targets. Recommendations submitted by the teams will actually be rolled out in XING's live system (details about the online evaluation such as the approach for ensuring fair conditions between the teams will follow).

Both phases aim at the following task:

Task: given a new job posting p, the goal is to identify those users that (a) may be interested in receiving the job posting as a push recommendation and (b) that are also appropriate candidates for the given job.

For both offline and online evaluation, the same evaluation metrics and the same types of data sets will be used. The offline evaluation is essentially used as an entry gate to the online evaluation:

the top teams (which also pass a XING baseline) will be allowed to participate in the online evaluation.

The online evaluation focus on a push recommendation scenario in which new items (job postings) are given and users need to be identified...

who are interested in job postings in general (e.g. open to new job offers, willing to change their job)

who are interested in the particular job posting which they are notified about

who are an appropriate candidate for the given job posting (e.g. recruiters who own the job postings indicate that they are interested in the candidate)

In the online challenge, teams will only submit their best user for an item to the system.
For each target item users are allowed to submit one or more target users. However, each user can only be submitted once.
Since push recommendations are presented to the users in a more prominent way, we decided on this restriction. These
recommendations are then played out to the user over the following channels.

Channels: given the list of recommendations such as (p1, u42), (p1, u23), ... where pi is the i-th target posting and uj is the j-th target user, the recommendations are delivered to users through the following channels:

activity stream: "Vacancies matching your profile" story in the stream on xing.com and in the mobile apps. (see screenshot)

jobs marketplace: an orange notification bubble in the side-bar and an orange label "new" highlights the new job recommendation, e.g. on xing.com/jobs or in the mobile apps (see: screenshot)

emails: if the user did not see the push recommendation then the user may receive an email that points him/her to the job recommendation (see screenshot)

recruiter tools: users which receive a job posting as push recommendation are also likely to appear as candidate recommendations to recruiters, for example, in the so-called XING talent manager (see screenshot)

Balancing user interest and recruiter demands: In contrast to last year's challenge which was solely focusing on estimating how relevant a job is for a given user, this year we will focus on both: Job recommendations should be relevant to the users and at the same time the users who receive those job recommendations also need to be appropriate candidates for the given job (e.g. the fact whether a user received interest from a given recruiter is part of the evaluation measure).

Balancing relevance and revenue: Some of the content is paid and some users pay for subscriptions. Teams will need to balance between relevance of recommendations and monetary aspects (i.e. the money that is earned with the recommendation).

Novelty / sparsity: recommendations need to be computed particularly for newly created job postings (those postings have not received any interaction).

Smart targeting of push recommendations: also, teams will need to estimate how likely it is that a user is actually interested in job recommendations, e.g. users that are not interested in job recommendations may delete recommendations or disable push recommendation notifications in case they receive too many (the latter is primarily relevant for teams that participate in the online challenge).

The above evaluation metrics will be applied for both offline evaluation and online evaluation (in the offline evaluation, the target items won't change during the challenge while in the online evaluation, new target items are relased on a daily basis).

The training dataset is supposed to be used for experimenting and
training your models. You can split the interaction data into training
and test data. For example: you can leave out the last complete week
from the interaction data and then try to predict whether for a given job posting, you can predict the users that will positively interact with the
posting.

interactions.csv:
Interactions are all transactions between a user and an item including
recruiter interests as well as impressions. Fields:

user_id ID of the user who performed the interaction (points to users.id)

item_id ID of the item on which the interaction was performed (points to items.id)

created_at a unix time stamp timestamp representing the time when the interaction got created

interaction_type the type of interaction that was performed on the item:

0 = XING showed this item to a user (= impression)

1 = the user clicked on the item

2 = the user bookmarked the item on XING

3 = the user clicked on the reply button or application form button that is shown on some job postings

4 = the user deleted a recommendation from his/her list of recommendation (clicking on "x") which has the effect that the recommendation will no longer been shown to the user and that a new recommendation item will be loaded and displayed to the user

5 = a recruiter from the items company showed interest into the user. (e.g. clicked on the profile)

targetItems.csv: contains the list of item IDs (items.id) for which recommendations should be computed and submitted.

targetUsers.csv: set of user IDs (users.id) which are allowed to appear in the recommendations/solutions that are submited. Hence, recommending an item to users that are not in targetUsers.csv will not gain any points during the offline challenge.

Note: solutions that are submitted are only allowed to conatin items and users from the above files.

Data

Datasets that are released as part of the RecSys challenge are
semi-synthetic, non-complete samples, i.e. XING data is sampled and enriched with noise. Regarding the released datasets, participants have to stick to the following rules:

Attempting to identify users or to reveal any private information about
the users or information about the business from which the data is coming
from is strictly forbidden.

It is strictly forbidden to share the datasets with others.

It is not allowed to use the data for commercial purposes.

The data may only be used for academic purposes.

It is not allowed to use the data in any way that harms XING or XING's customers.

Final Paper

Each team should submit a paper describing the algorithms that they developed for the task (see paper submissions & workshop). Teams without a paper submission to the RecSys Challenge workshop will be removed from the final leaderboard.

No Crawling on XING

It is not allowed to crawl additional information from XING (e.g. via XING's APIs or by scraping details from XING pages).

L’esprit sportif / Fair-play

Please stick to the rules above, only sign-up for one team and stick to the submission limits: you can upload at maximum 20 solutions per day (for the offline challenge). We may suspend a team from the challenge if we get the impression that the team is not playing fair.

Ask us

If you are unsure of whether something is allowed or not, contact us (e.g. create an issue on github) and we will be happy to help you. Above all remember it's all for science, so be creative, not evil!

Each team - not only the top teams - should submit a paper that describes the algorithms that they used for solving the challenge. Those papers will be reviewed by the program committee (non-blind double review). At least one of the authors is expected to register for the RecSys Challenge workshop which will take place as part of the RecSys conference in Como, Italy.