Text-Mining, Geography, and Canonical African American Short Stories

The representation of predominately black environments across a 100 year history of African American short stories, perhaps, extends far past identifying specific street names and geographic landmarks. Placing Charles Chesnutt’s “The Wife of His Youth,” Rudolph Fisher’s “City of Refuge,” Zora Neale Hurston’s “Sweat,” and Richard Wright’s “Big Boy Leaves Home” in conversation with one another also reveals how Southern environments are also characterized by language usage.

In Hurston and Wright’s stories, Southern landscapes are complemented by African American Vernacular English. Words such as “ah,” which stands for either “I” or “A,” are used 135 times in both stories. Similarly, the word “kin,” which stands for “can” or shortened version of “kinship,” is used 25 times in Wright’s “Big Boy Leaves Home” and 11 times in Hurston’s “Sweat.” Both writers rely on other phonetic spellings of words in both stories such as “yuh” 137 times, “git ” 80 times, and “mah” 39 times.

The significance of these spellings suggests that Southern geographies are also accompanied by specific linguistic representations associated with a region. Text-mining reveals that Hurston and Wright represent place and location through the use of black vernacular, not the mention of places like Edward P. Jones does.

Text-mining methods make it possible to quantify the number and percentage of words and unique words in a story. Charles Chesnutt’s “The Wife of His Youth” has 30% unique words. Rudolph Fisher’s “City of Refuge” has 31% unique words. What I have been discovering is that stories that include characters who speak black vernacular have higher percentages of unique words than stories that are only in so-called Standard English. Text-mining thus helps us account for the tangible contribution that vernacular speech adds to linguistic creativity of writers’ stories.

As I hope my work shows, short stories by black writers address topics such as African American geographic spaces and linguistic terms that are typically not covered by digital humanities. At the same time, a tool like text-mining software can reward the literary scholar who seeks to quantify language usage in African American literature.