Corpus Colorum

Digging through the sources on color explorers for Benjamin Moore, Sherwin-Williams, and Behr, I was able to find some JSON endpoints that could give us names, RGB values, and color families for all of their currently available colors. There’s some other information (e.g. “color collection”, “goes great with”) that might be fun to play around with, but for now we’ll just grab this simple information, e.g.:

Behr is a bit tricky as the data is inside of a JavaScript source file instead of a JSON endpoint.
Also, the color family data is stored separately from the color information, so we’ll have to join the two together.

importcolorsysimportsysimporthuslfromcolormath.color_conversionsimportconvert_colorfromcolormath.color_objectsimportCMYKColor,LabColor,sRGBColor# we'll run each classifier multiple times and look at the# mean and standard deviation over all of the runsRUNS_PER_CLASSIFIER=5classNaiveBayesHSVClassifier(NaiveBayesClassifier):defget_features(self,color):hsv=colorsys.rgb_to_hsv(*color.rgb)returndict(zip(("hue","saturation","value"),hsv))classNaiveBayesHLSClassifier(NaiveBayesClassifier):defget_features(self,color):hls=colorsys.rgb_to_hls(*color.rgb)returndict(zip(("hue","lightness","saturation"),hls))classNaiveBayesHUSLClassifier(NaiveBayesClassifier):defget_features(self,color):hsl=husl.rgb_to_husl(*color.rgb)returndict(zip(("hue","saturation","lightness"),hsl))classNaiveBayesCMYKClassifier(NaiveBayesClassifier):defget_features(self,color):rgb=sRGBColor(*color.rgb)cmyk=convert_color(rgb,CMYKColor)returndict(zip(("cyan","magenta","yellow","black"),(getattr(cmyk,v)forvinCMYKColor.VALUES)))classNaiveBayesLabClassifier(NaiveBayesClassifier):defget_features(self,color):rgb=sRGBColor(*color.rgb)lab=convert_color(rgb,LabColor)returndict(zip(("lightness","green-red","blue-yellow"),(getattr(lab,v)forvinLabColor.VALUES)))

k-Nearest Neighbor

importnumpyasnpfromsklearn.neighborsimportKNeighborsClassifierclassKNNColorClassifier(ColorFamilyClassifier):def__init__(self,color_corpus,train_percent=0.8,n_neighbors=5):# labels in the KNeighborsClassifier are integers, so we'll create# a unique integer label for each color family and map both ways for conveniencefamilies=color_corpus['family'].unique()self.family_map={f:ifori,finenumerate(families)}self.reverse_family_map={v:kfork,vinself.family_map.items()}self.n_neighbors=n_neighborssuper().__init__(color_corpus,train_percent)defget_label(self,color):returnself.family_map[color.family]definit_classifier(self):self.classifier=KNeighborsClassifier(n_neighbors=self.n_neighbors)[features,labels]=zip(*self.train_set)self.classifier.fit(features,labels)defaccuracy(self):[features,labels]=zip(*self.test_set)returnself.classifier.score(features,labels)defclassify(self,color):returnself.reverse_family_map[self.classifier.predict([self.get_features(color)])[0]]defget_neighbors(self,color):returnself.shuffled.iloc[[iforiinself.classifier.kneighbors(np.array(self.get_features(color)).reshape(1,-1),return_distance=False)[0]]]

# we'll try values of n_neighbors in this rangeMIN_N=1MAX_N=20classKNNRGBClassifier(KNNColorClassifier):defget_features(self,color):returncolor.rgbclassKNNHSVClassifier(KNNColorClassifier):defget_features(self,color):returncolorsys.rgb_to_hsv(*color.rgb)classKNNHLSClassifier(KNNColorClassifier):defget_features(self,color):returncolorsys.rgb_to_hls(*color.rgb)classKNNHUSLClassifier(KNNColorClassifier):defget_features(self,color):returnhusl.rgb_to_husl(*color.rgb)classKNNCMYKClassifier(KNNColorClassifier):defget_features(self,color):rgb=sRGBColor(*color.rgb)cmyk=convert_color(rgb,CMYKColor)returntuple(getattr(cmyk,v)forvinCMYKColor.VALUES)classKNNLabClassifier(KNNColorClassifier):defget_features(self,color):rgb=sRGBColor(*color.rgb)lab=convert_color(rgb,LabColor)returntuple(getattr(lab,v)forvinLabColor.VALUES)

On your Markov, get set, go!

To generate names for our mystery color, let’s try training a Markov chain on not only the names of these closest colors, but the product of all synonyms of all component words within the names to give us more variety. We’ll limit synonyms by part of speech so the generated names make slightly more sense.

fromnltk.corpusimportwordnetimportspacy# faster than wordnet for tokenizing and part-of-speech taggingnlp=spacy.load("en")# map spaCy POS to WordNetPOS_MAP={'ADJ':'a','ADV':'r','NOUN':'n','VERB':'v',}defget_syns(token):"""get synonyms for a spaCy token"""synsets=wordnet.synsets(token.orth_,pos=POS_MAP.get(token.pos_))ifsynsets:returnitertools.chain.from_iterable(s.lemma_names()forsinsynsets)return[token.orth_]defexplode(color_name):"""explode a color name into the product of all of its component words' synonyms"""returnset(' '.join(variant).replace('_',' ').upper()forvariantinitertools.product(*(get_syns(token)fortokeninnlp(color_name.lower()))))

importstringimportmarkovifydefmake_markov_model(colors):returnmarkovify.Text(None,# we're pre-parsing the sentencesparsed_sentences=[variant.split()forvariantinset(itertools.chain.from_iterable(colors['name'].apply(explode).values))])defname_color(color):model=make_markov_model(classifier.get_neighbors(color))returnstring.capwords(model.make_sentence(# we're generating short names and don't care about overlap with original texttest_output=False,max_words=3))

Most of the generated names will be nonsensical (and many also NSFW), but I did come across a few good ones. Here are the highlights:

Unprompted Empurpled

PURPLE

#6e3281

Million Dollar Marxist

RED

#b70a1f

Induce Watercourse

BLUE

#10cffb

Sharp-worded American Cheddar

ORANGE

#e77616

Summertime Sorry

PURPLE

#8fb0eb

Graeco-roman Chocolate-brown

BROWN

#52382e

Cat Valium

GREEN

#5bfee2

Unconscionable Orange Tree

ORANGE

#c44312

Unused Butter

YELLOW

#f0d299

Scandalmongering Shuttlecock

YELLOW

#e8c467

Norse Naughty

PURPLE

#5d04f2

Disconsolate Denim

BLUE

#74b7c5

Italian Methamphetamine Green

GREEN

#84e8e9

Odoriferous & Off-key

YELLOW

#d7b743

Journey’s End (#BAC9D6)

For now it’s time to climb back out of the rabbit hole, but maybe one day we can teach our algorithm about puns (or even just include homophones in addition to synonyms to increase the likelihood of accidental puns).