Online Service for Mapping PhD Skills to Industry Demands on Job Ads

Research areas

Description

Regardless of over 60 per cent of Australian PhDs landing their first job after graduation outside academia, how having this qualification benefits job seeking is largely unknown. In this project, we are building a service for mapping industry demands for PhDs, as portrayed by nearly 30,000 authentic job ads a year on an Australian website.

In 2015-16, we have already developed a schema to rank each ad with respect to its PhD relevance and highlight its text relevant to Ability work with minimal supervision, Creative approach to problems, and seven other PhD skills. We have also established a gold standard dataset of approximately 500 ads that have been annotated with respect to these document ranking and text classification tasks by two or more experts until reaching consensus.

This visual-interactive system holds a potential for increased understanding of which industries absorb PhD graduates and why. This can be used to not only improve doctoral education and recruitment but also address the claim that Australia has a brain-drain.

This interdsiciplinary project is co-supervised by Adj/Prof Hanna Suominen, Assoc/Prof Inger Mewburn, and Dr Will Grant in The ANU.

Goals

In this project extension, we are evaluating the applicability of Machine Learning (ML) to automate these tasks, with a desire to improve our current performance of the average swapped pairs percentage in ranking from 21.22 for a system using the advertisement text alone to promising 9.42 for one that enriches it with the skills highlights on a hold-out test set of 105 advertisements. In particular, we are focusing on human-in-the-loop approach of active ML and deep/transfer ML to maximise processing correctness whilst minimising the amount of training data. The concrete goal is to engineer an an online visualisation system that predicts the ranking and highlighting; quantifies what the most sought-after PhD skills; characterises how the advertising that seeks for PhD skills in Australia is in terms of the geographic location, industry sector, job title, working hours, continuity, and wage; and uses interactive visual feedback to train both people and machines to improve their inter- and intra-annotator agreement.

Requirements

This project will appeal to students with excellent skills in experimentation, programming, and teamwork. The preference is on students who have finished/are taking the units of Artificial Intelligence, Document Analysis, and/or Machine Learning in The ANU or similar.

Background Literature

Gain

This student project is a part of the activities of the NLP Team within ML Group in The Australian National University (ANU) and Data61 in Canberra, the capital of Australia. The OECD Regional Well-Being Report 2014 evaluated Canberra as the most livable city in the world.

The ML Group has been recently (in 2014) ranked among the top five in the world in ML, the others being Microsoft Research, Max Planck Institute Tübingen, University of Berkeley, and University of Cambridge. According to the QS World University Rankings for 2015-16, The ANU ranks within the top-20 universities globally with the overall score of 91.0 out of 100.0 (19th) whilst the next best Australian university scored 83.1 (42nd) and for the field of research (FOR) code of AI and Image Processing, applicable to ML and NLP, under Information and Computer Sciences, The ANU has obtained the top 5 out of 5 score in the Excellence in Research for Australia (ERA) evaluations, both in 2010 and 2012.

The NLP Team is experienced in developing powerful low-cost techniques to free-form text them into structured representations. Our deep and transfer ML methods are able to use less than a hundred expert-annotated sentences to achieve performance comparable to the state-of-the-art systems, initialised with ten times more data. Similarly, our language processing methods have been among the finest elite in the ALTA, CLEF, and TREC shared tasks on automated understanding, use, summarisation, and translation in difficult genres of “Doctors’ Latin” in electronic health records and “Lawyers’ French” in patents.