Users often describe what they want to accomplish with an application in a language that is very different from the application's domain language. To address this gap between system and human language, we propose modeling an application's domain language by mining a large corpus of Web documents about the application using deep learning techniques. A high dimensional vector space representation can model the relationships between user tasks, system commands, and natural language descriptions and supports mapping operations, such as identifying likely system commands given natural language queries and identifying user tasks given a trace of user operations. We demonstrate the feasibility of this approach with a system, CommandSpace, for the popular photo editing application Adobe Photoshop. We build and evaluate several applications enabled by our model showing the power and flexibility of this approach.

photoshop.dat.bin.gz (46Mb). This is version of our vector dataset (length=200). It should be suitable for most applications but you'll need word2vec (or another implementation) to use it.

indexedcommands.dat has a list of the indexed system (Photoshop) features. These will appear in the vector file above. Note that some features are also standard terms (e.g., "nudge").

commonwords.tar.gz has pairs of commands and the most common verbs, nouns, etc. between them. There is some noise due to multiple word meanings. We selected the most common form of the word based on POS tagging in the dataset.