Pages

Friday, November 18, 2016

Data Mining: Cool Tool of the Digital World

The internet has provided people the ability to access research around the world. This research can lead to future research, and further research after that. Nowadays, technology offers tools that allows its users to conduct their own research in simple ways. Our Digital Humanities class recently used tools that allowed us to conduct interesting research within a short period of time; these tools were data mining technologies.

Data mining is the process of analyzing data from different
perspectives and summarizing it into useful information. Our Digital Humanities class spent some time working with websites that allowed us to perform the process of data mining: Voyant and Ngram. In the process of working with these websites, our tribe made interesting observations and were able to form hypotheses regarding the data-mined information.

...No, not quite.

Voyant is a technological tool that can take in a large amount of text and find the most commonly used words within the text. The tool then takes those commonly-used words and creates a collage of sorts for the user to analyze. While it may seem as though the site has just made a collage of random words, those commonly-used words can say a lot about the text as well as its meaning.

When using the tool, our tribe used some of our blog posts to see what kind of words we used the most:

The first chart presented from Voyant is tribe member Patrick's usage of words from all of his individual blog posts for the class, as well as the tribe posts he wrote. The second chart is tribe member Morgan's word usage of the posts she wrote as well. It may be unsurprising to notice that students tend to use the words "digital", "media", and "humanities" when observing the charts.When comparing the charts to each other, one may notice that both members have used the exact same words, such as "digital", "like", "human", "humanities", and "just". This can portray the ideas and themes that students are getting out of the Digital Humanities class.

Ngram is another data mining tool that takes the usage of words into consideration. A user will type in words, separated by commas, and produce a graph that portrays the usage of words in print texts from 1500 to 2008.

In this Ngram, I typed in the words happy, sad, angry, annoyed, and scared. The results of this were, in my opinion, quite fascinating. As one can see when looking at this graph, the usage of the word "happiness" in texts has significantly decreased from the year 1800 to the year 1980, and slightly increase again from 1980 to 2000. Could the decreased use of this word in texts symbolize a decrease in societal happiness itself? The term "angry" is pretty static, and then spikes a bit at around 1990. After contemplating this, my professor urged me to add in the terms "war" and "peace":

The tribe found these results to be especially interesting. It is not at all surprising to see an increase of usage in the words "war" and "peace" in about 1918 and 1942; these are around the times where World War I and World War II occured. We also noticed that during the times that the term "war" spiked in usage, "happiness" was decreasing, and the usage of the term "angry" slightly increased. The tribe feels as though this could say a lot about society's reaction to wartime.

The tribe also used Ngram to compare Morgan and Patrick's commonly used words to research how often those words have been used within a 200-year span. We entered each of their number one mos-used words into Ngram: "digital" and "media". The results show a prominent increase in the usage of both words beginning in the 1960s, yet the usage of the word "media" skyrockets. Media studies have been popular for quite some time, but digital studies (such as the Digital Humanities) and quite new. The word usage shows the difference in popularity.

Voyant and Ngram are tools that have allowed us to conduct research through the popularity and usage of words within texts and specific time frames. These data mining sites have allowed our tribe to observe interesting results, as well as form hypothesis that could be further researched. Tools like these can easily pose interesting questions, as well as lead to research that can further continue our attempt to understand society.