Allow me to introduce Gist -- an online game inspired by Up Goer 5/Thing Explainer. The goal is to come up with simple definitions for complex concepts.

In addition to being fun (and introducing people to some unusual vocabulary), Gist is a part of a research project; the results will help computers deal with some tricky natural-language-processing tasks.

Any particular reason for scoring plurals and so on separately? The original Up Goer 5 makes quite a lot of use of plurals and the -ing suffix on top of the base list of words. Whereas Gist rewards me for writing things in more convoluted ways:

likes to eat ants would like to eat an anteats ants will eat an antplace for buying carrots place for you to buy a carrotcity's pole that holds two flags pole in a city that can hold a flag and another flag

Oh. Great point. As a starting point for a scoring function, we used a list of the top 10,000 frequent words in English. The list includes inflections, so they are scored separately, but you're right -- we probably don't want to punish people for using plurals. Will throw some morphology on top of the algorithm, thanks.

P.S. Now that I think about it, there's no way "Goer" is a popular word, is it? Do people actually use it?

buttered_cat_paradox wrote:P.S. Now that I think about it, there's no way "Goer" is a popular word, is it? Do people actually use it?

Perhaps the single most frequent individual use of it in public is people directly quoting Python.

Then there's discussions in scrapyards and automotive repair establishments, perhaps: "That blue hatchback with the rear imact damage... is it a goer, do you think?"

After that there's likely only increasingly more obscure uses like the discussion about the current boy/girl-friend in a wobbly relationship, where the conversation with more platonic and long-term acquaintances hark back to the lyrics of The Clash, where (by dint of comparison against "a stayer"), and then it may be deemed an autoantonym to the Pythonesque version.

(It's a valid and viable ad-hoc construction atop a common root, I'd say, but not itself a common word. Dialect usages may be more common.)

buttered_cat_paradox wrote:P.S. Now that I think about it, there's no way "Goer" is a popular word, is it? Do people actually use it?

The most common usage in news articles seems to be things like Church-goer or party-goer. Apart from that it's quite popular here to use goer to refer to a motivated person or successful thing (like this or this), an event or thing that will go ahead/start (like Soupspoon's automobile example), or just a person/thing that goes fast (athlete, race car, the Up Goer 5).

Obviously it's only been deemed permissible because 'go' is in the ten hundred words.

Bloopy wrote:Any particular reason for scoring plurals and so on separately? The original Up Goer 5 makes quite a lot of use of plurals and the -ing suffix on top of the base list of words.

... and issue fixed (at least mostly).

Interestingly, it seems like there are quite a few tools that are good at lemmatization, but not that much for the other way around (pattern.en seemed promising, but it outputs words like "mutualed" and "mutualing". Adding a dictionary filter on top of it gives ok results).