Better algorithms and open data frameworks needed for inclusive and diverse online content

Better algorithms and open data frameworks needed for inclusive and diverse online content

By: Arthur Gwagwa

Introduction

Due to its unprecedented ability to
continually learn from collected data, artificial intelligence (AI)
technologies are transforming the publication of online content, for better or
for worse. Today, the search engine algorithm decides the results of our
searches (Nair, 2018)
while AI-driven apps like Inkitt are
connecting readers to authors. The effectiveness of such algorithms in
mediating online content is largely dependent on a combination of better
algorithms, comprehensive analytics but data availability and repetition
based on reinforcement learning. Although African bloggers and content creators have
benefitted from using multi-media digital platforms to create and distribute
their work, cyberspace, in particular, the internet is still mostly dominated
by the global north narrative. This is the case since the Global North
engineers largely design algorithms that analyze, harvest, index, present
and repurposes online data and the applications that run the computers, whilst
Global North data scientists decide on the training data sets to apply in
training the algorithms. The interaction between people and machines have
also led to systems (social machines)
that are blurring the lines between computational processes and human input
with a direct impact on the publication of online content. Due to unequal
access and connectivity to the internet, most of the data that feeds into
computer analytics is generated in the global north too. Coupled with
data collection and distribution frameworks that are not built towards healthy
partnerships between industry and government, the northcetnric search and
sorting algorithms may prevent the Global South countries from realizing the
potential in their data. At a technical level, improved algorithms are needed
for inclusive and diverse online content but equally important, at a
regulatory level, more open data frameworks.

The issues

Imperfect but intelligent machines
as arbiters of online content

As ‘smart machines’ are
increasingly pervading almost every aspect of human existence, computational processes have increasingly replaced human
agency, for instance, machine learning algorithms now classify knowledge:
categorization and representation schemes; information retrieval,
recommendation, classification thus creating their own culture. Although this
has led to more efficient public decision making and implementation, such
systems have also produced and reinforced discriminatory patterns in content
inclusion and distribution. This may be unintentional, for example, it may be
occasioned by the non-linear variables selection process in the algorithm
design, and the output verification process. Relevant instances of
discrimination in real life include:

Such as when algorithms push a particular type of
content to a certain class of online readers and clients, such as the
recent case of the Netflix algorithm matching black customers with African
content (Guardian, 2018)

The non-linear variables selection
process in the output verification process is in part due to the way the design
of machine learning algorithms have evolved. During the introductory
phases of machine learning, programmers would instruct these machines on what
to do but now they simply give them a set of instructions to follow (bottom-up
as opposed to top-down machine learning algorithms). This makes them
responsible for decisions, including as arbiters of online content. Such an
approach lacks transparency and accountability and may also be reinforced if
such algorithms are being incorporated into already-opaque governance
structures. As Piwowar, K (2018), puts it, “Another challenge is algorithmic
opacity, understood as the inability to audit algorithms, including in-depth
inspection of data inputs, general algorithm design, as well as output data, in
conjunction with companies’ trade secret, is another challenge.” Piwowar also
says that another issue relates to how effectively communicating when
algorithms are used, to what purposes and with what effect(s). Further,
although engineers have tried to replicate human intelligence in machines, AI
systems are not confined to methods that are biologically observable and will need another two decades of development, as the current algorithms lack
“intuition (Stanford University, 2018). This view was also echoed in the article, “Why computers shouldn’t teach calculus’

Interactions between
individuals, technologies and data/information (The ‘Social Machine’)

Algorithms are created through training datasets and can only be as good as those datasets, in other
words, they are a direct reflection of the training datasets. In order to train an AI algorithm, in many cases, a large amount of data and repetitions
are needed. The most influential corporations in this sphere, for example,
economic agents like Amazon, Apple, Microsoft, Google, Facebook, and Baidu,
wield extraordinary power from a distance. Take China, for example, and the sheer scope of the data generated by Chinese tech
giants. Think of how much data Facebook
collects from its users and how that data powers the company’s algorithms; now
consider that Tencent’s popular WeChat app is
basically like Facebook, Twitter, and your online bank account all rolled into
one. China has roughly three times as many mobile phone users as the US, and
those phone users spend nearly 50 times as much via mobile payments.
China is, as The Economist first put it, the Saudi Arabia of data.

As David Kaye (2018), recently
observed, “Tech giants develop rules,
standards,
and guidelines,
often in Silicon Valley, to determine for people around the world the
appropriate boundaries of expression. In many places, American companies
provide the dominant source of news and information, having an enormous impact
on public life. Much as they may try, they are often out of touch with local
and national concerns in the places where they operate” (David Kaye, 2018).

Therefore, algorithmic determination of knowledge can be
traced to decisions made by individuals and groups of individuals operating
within particular local, linguistic, regional, religious, bureaucratic
cultures. The datasets used for training, decision- making and implementation,
as well as the algorithmic determination of knowledge may therefore reflect
societal biases. For instance, there is currently a North-centric sensibility
to the creation and training of algorithms and its dominance in the larger
computational world, whereby the Global North culture has been the
authoritative principle’ operative in and around algorithmic culture (IEEE
P7003 draft standard on culture). This is also reflected in the online content
that internet users are accessing.

Proposed solutions

Algorithmic accountability and
transparency

In light of the above, there have
been calls
for technology and data fairness for the Global South. On one level, the
solution could be found in human-centred design or usability that balances
security, privacy, transparency. However, as the use of big data
by public institutions is increasingly shaping peoples’ lives (Vosloo, 2018),
the issue extends beyond protecting user data and privacy, but transparency and comprehension of big data (ICTworks,
2018). ICTworks suggests that in order to demonstrate a commitment
to being transparent and accountable for the data they collect, organisations that mine big data need to become
interpreters of their algorithms. Someone on their data science team needs to
be able to explain the math to the public. Data visualizers and data storytellers should tell the story behind that data- “how we got here”
explanation.

The creators and arbiters of data-
organisations that use the third party big data analysis should actively
ask where the data comes from, what steps were taken to audit it for
inherent bias as part of the chain of demanding algorithmic
accountability” (ICTworks, 2018).

Open Contracting

However, knowledge of where the data
originated requires the Global South countries to adopt the Open
Contracting Data Standard (OCDS) which enables disclosure
of data and documents at all stages of the contracting process
by defining a common data model. The model was created to
support organizations to increase contracting transparency,
and allow deeper analysis of contracting data by a wide range
of users. At the moment, many countries in the Global South are not being given necessary access to their countries’ own data which
stays hidden under contract rules and public citizens cannot access,
and therefore take the benefit, from it. The absence of regulations that
mandate equal access to collected data will likely prolong the current mismatch
between the pace of the data collection among big established companies and
small, new, and local businesses.

Equally important in the
distribution of content is the fact that the vast majority of social media act
like silos. APIs play an important role in corporate business models, where the
industry controls the data it collects without reward, let alone user
transparency. Negotiation of the specification of APIs to make data a common
resource should be considered, for such an effort may align with the citizens’
interest (Cordova, 2018)

Free flow of non-personal data

Open contracting could be augmented
by the free flow of non-personal data across the region, for example, the
European recently ended data localisation requirements within the
Member States by adopting a Regulation on the free flow of non-personal data proposed by the European Commission in September 2017.
This regulation adds a key pillar of the Digital Single Market meant to
facilitate a digital economy and society.

Opening up of data through opening
contracting arrangements is also seen at local levels despite the competing
values inherent in data stewardship, for instance, some universities are
imposing open access requirements, whereby researchers must provide access to
their data as a condition of obtaining grant funding or publishing results in
journals (Borgman,2018).

Inclusive algorithm design

Algorithmic accountability in the context of open
contracting becomes even more necessary since big international corporations
such as Facebook have been signing secretive contracts with Global South
governments and local operators. This has led private sector platforms like
Facebook, Google, and Twitter to become primary sources of information and
vehicles for expression; they effectively function as the public square for
civic engagement. Their algorithms affect their users’ access to information
and how they form political opinions. This has created conceptual confusion
about the roles and responsibilities of social media platforms in democracy
(David Kaye, 2018).

As a first step towards data
fairness, these corporations need to involve local communities in to input into
the training data and social media companies need to involve such communities
in governing their platforms. According to David Kaye (2018), they could take
steps like diversifying leadership, enabling greater local content moderation
not outsourced to contractors, and engaging deeply with the communities where
they operate are essential.

If the companies cannot make these
kinds of changes, they need to explore how they could design algorithms that
reflect the diversity of the regions where they operate, in the case of social
media platforms, spin off national versions of their platforms (David
Kaye,2018).

Paper
written for the HIVOS’s Africa Content Creators’ Summit, Nairobi, Kenya,
December 2018 by Arthur Gwagwa-Research fellow, CIPIT,
Strathmore University & Dr. Ansgar Koene – Senior Research Fellow at the
Horizon Digital Economy research institute, University of Nottingham