SENIOR DATA SCIENTIST

Detego GmbH

Simon Walk currently works as Senior Data Scientist at Detego GmbH.
From 2012 to 2013, he was working as a scientific developer at the Know-Center GmbH and as a project assistant at the Knowledge Management Institute at Graz University of Technology. From 2013 to 2014, Simon Walk worked as a scientific developer at Virtual World Services GmbH. Additionally, he has been a visiting researcher at the Stanford Center for Biomedical Informatics Research from November 2011 to February 2012 and September 2013 to December 2013. In 2014, he started to work as University Assistant at the Institute for Information Systems and Computer Media at Graz University of Technology where he received his PhD in 2016.
From 2016 to 2017, he worked as Post-Doctoral Researcher at the Stanford Center for Biomedical Informatics Research at Stanford University. In 2017, he returned to Austria and started to work as Post-Doctoral Scholar at the Institute of Interactive Systems and Data Science at Graz University of Technology from June 2017 to January 2018.

M. Vitiello, S. Walk, R. Hernández, D. Helic and C. Gütl (2016). Classifying Students to improve MOOC dropout rates. In Proceedings of the European Stakeholder Summit on experiences and best practices in and around MOOCs (EMOOCS 2016), 501-508. [PDF]

Narrative-Driven Recommendations Dataset

This dataset contains crowdsourced and manually curated annotations for submissions and comments to r/MovieSuggestions. Specifically, the annotations include movies (IMDb IDs), keywords, actors and genres for more than 1,400 submissions and 20,000 comments.

The dataset was generated for the purpose of analyzing narrative-driven recommendations, using data dumps available at pushshift.io/reddit/.

Data Structure

submissions.csv: contains several different crowdsourced and manually curated annotations for movie suggestion requests on r/MovieSuggestion. Specifically, the file includes the reddit submission id, positively mentioned movie ids (IMDb), negatively mentioned movie ids (IMDb) as well as desired and undesired keywords, genres and actors.

comments.csv: contains annotations for comments posted on r/MovieSuggestions. Each line in comments.csv contains the reddit submission is was posted under, the individual reddit comment id as well as the IMDb movie ids annotated in each comment.

movie_titles.csv: includes a mapping between IMDb movie ids and their original titles (both found on IMDb)

A more detailed description of the dataset can be found in our publication below. Note that the dataset is free to use for research purposes but requires citing our paper as the source of the data.

setup: "2d" for 2d-setup, "2da" for 2d-asymmetric-setup, "3d" for 3d-setup

experiment: either "walking" or "random"

tags: number of tags involved in the experiment

iterations: number of iterations

milliseconds: milliseconds since beginning of the experiment

serial: the serial number extracted from the epc

rssi: the measured rssi value

The corresponding ground truth dataset are located in the files 2d.npy, 2d_asymmetric.npy, and 3d.npy. The files contain the ground truth coordinates of the tags, relative to the tag with serial number 0.

The dataset is free to use for research purposes but requires citing our paper as the source of the data.

STUDENTS

I would like to thank all the students who currently work or worked with me on interesting research topics: Lukas Eberhard, Thomas Hasler, Clemens Hofer, Tomas Karas, Patrick Kasper, Philipp Koncar, Dietmar Maurer, Thomas Niedermair, Tiago Santos, Massimo Vitiello and Matthias Wölbitsch.