Information strings and their use in understanding digital journalism

Oxford Internet Institute MSc in Social Science of the Internet research proposal by Christopher Sheats

Introduction. In his book, Information: A Very Short Introduction, Dr. Luciano Floridi defines the differences between five types of semantic information (5-TSI)—primary, secondary, meta, operational, and derivative. Floridi then tells a story and describes specific pieces as being “primary” in nature, or “operational” in nature, etcetera. I have adapted Floridi’s 5-TSI to create a framework that goes beyond focusing on independent pieces of information. My research links these independent pieces together to more effectively trace the focus of how the main topic or topics of a news article are being described to the reader. I propose that this linking of information-types becomes an information string.

Identifying information strings allows one to analyze semantic information by qualifying and categorizing information to determine what is and is not present in any given article. My area of interest concerns the quality of information of politically motivated online news articles. A diverse and relevant range of information strings makes up a news article’s informativeness, which is a metric that can describe how high or low the quality of information is. My objective is to determine the quality of any given set of information, which may or may not indicate aspects of informativeness, misinformation, and disinformation.

Hypothesis. In order for my hypothesis to work, I had to invent an information-string system. The notion of “primary” information is the smallest, least complicated “tier” of information which I call a “first-tier” information string. Operational information concerning a news article topic is “primary-operational” information, or, a second-tier information string, because it is operationally describing the primary information. Each sub-tier is always a complication of its respective higher-tier information string, be it secondary, meta, operational, or derivative. I propose that every information string in a stand-alone news article should start as primary-“something”, because information in any news article should focus on or support a main topic.

I hypothesize that information strings can be logically and visually mapped in such a way that will enhance news aggregation websites.

With any given news article topic or topics (the primary information), there should be a substantial amount of related information already available online. With the release of new information by journalists, politicians, whistle-blower release sites, encyclopedia developments, and social media participants, the nature of how that primary information will evolve. Over time, primary information strings change depending on a multitude of factors that affect primary’s sub-tier information. Through analyzing the nature of the information string change, trends should emerge to help identify those factors.

What I expect my research to support is the premise that information string evolution will dramatically shift based on specific sources, including individual journalists, individual political speakers, entire news agencies, or entire political organizations. Tracking information that has a high probability of being deceptive via the application of information strings should allow me to visually represent the change of information over time to better understand the consequences of using low-quality information.

Methodology. With the help of my research mentor, Dr. Floridi, I will select a topic in news media to analyze. For example, in a public blog post, I looked at an NBC News article alleging that US officials claimed that the Iranian government was responsible for cyber attacks against the US government. Through the application of information strings, I was able to provide evidence that its low-quality information was later being referenced in follow up articles by the same and other news agencies, leading to systemic low-quality information and probable deception. Information strings can become dependent on false information that allow the generation of all kinds of information strings in other, stand-alone, online news articles.

Surveys will need to be developed and administered to a wide range of participants to gauge the informativeness of the use of specific information string diversity and order. Survey questions will depend on the development and application of my information strings framework. The following questions were developed to better shape survey questions:

Is it feasible to track the behavior of information in semantic media using the 5-TSI?

How persuasive is one information type, of the 5-TSI, over another?

What type of the 5-TSI affect the trend of semantic media the most? The least?

Is objective information composed more of one of the 5-TSI over any of the others?

In semantic media, can the 5-TSI be broken down into percentages and graphed?

How does subjective information and objective information affect the 5-TSI? Vice-versa?

Is it possible to identify the gaps between data and information in semantic media, depending on the type of information, either biological information or the 5-TSI?

Is it possible to automate the detection of the 5-TSI present in a piece of semantic media?

To what degree does biological information affect the 5-TSI?

What types of the 5-TSI persuades a user of that information to ask more questions rather than make more assumptions? Vice-versa?

Is it possible to use one or many types of information to strategically develop information warfare operations?

Do the 5-TSI change in perception by a biological entity that is limited to biological information?

Does understanding the information type affect one’s ability to understand information in a more objective sense?

How do we extract wanted information from all perceived information, of the 5-TSI?

How do we extract primary information from secondary information? Or vice-versa?

What percentage of the 5-TSI create more perceived information entropy, information, and/or contradictory information? Can the 5-TSI be broken down into these categories?

Do various types of the 5-TSI create any more or less information entropy?

Does the diversity or order of the 5-TSI affect information entropy?

Does information entropy shift as one learns more?

How does information entropy change and how is it affected by biological information and the 5-TSI?

Does semantic information have strong relationships with biological information? Can it be understood using complex adaptive systems analysis?

Is there a dualism to Floridi’s 5-TSI?

Is it feasible to minimize or maximize the use of meta information, except when in support of primary information, to better produce disinformation? Or any of the other 5-TSI?

Is it possible to systematically or systemically organize meta information as primary information, or secondary information as primary information, etc?

Conclusion. Depending on the probability of informativeness and the ever present risk of deception in political news articles, a news aggregator such as Google News could eventually achieve two things. One, it could make targeted suggestions to information consumers that present the least amount of content to consume while achieving the greatest amount of informativeness based on open sources. Two, because there will eventually be a database of historical trends based on information string change, a news aggregator could strategically suggest information that will best support probable information changes.

This research will allow for the development of automated systems to best support the actions of an information consumer based on high-quality information, rather than wallow in a bunch of unstructured, seemingly random news with no qualified risks of misinformation or disinformation. If my research is successful, I have every intention to push this research in the OII’s DPhil in Information, Communication and the Social Sciences program.

Henry Markram of the Blue Brain project, founded in 2005 to attempt to create a synthetic brain, was quoted in an interview from 2008 as saying, “So much of what we do in science isn’t actually science. I say let robots do the mindless work so that we can spend more time thinking about our questions.” The internet has extraordinary capacity to meaningfully inform its users. We need better information management systems to help us ask the right questions when it comes to consuming information online.