5.
Page: Which means:● Picks the top 30 terms of the document to categorize● Build a fuzzy full-text query● Search for indexed articles that share most terms● Rank results according to similarity score● Use the top-related Wikipedia articles as “Topics” www.iks-project.eu

7.
Page: Hierarchical Wikipedia Categorization● Group text of all articles categorized for a given Topic● Use Wikipedia Categories as Hierarchical Taxonomy● Categorize new document with MoreLikeThis on the aggregate text of articles● Available DBpedia dumps provides: ● Text summaries for each article ● “subject” relationships between articles and topics ● “broader” / “narrower” SKOS hieararchy between topics www.iks-project.eu

17.
Page: Track & Hack● https://github.com/ogrisel/pignlproc● https://issues.apache.org/jira/browse/STANBOL-201● Help integrate into Stanbol EntityHub / Enhancer during the Hackathon● IKS User Story S10: Automated document categorization ● I create new document in my CMS by typing in a HTML edit form or by uploading a document with textual content (PDF, office file, XML file, ...). I want the CMS to suggest me a list of maximum 3 controlled properties such as subjects/topics or geographical coverage out of list of standardised options (IPTC subjects or world countries), based on the text content I gave. www.iks-project.eu