By Dr Richard George, Chief Data Scientist
Amongst other things, one of the lessons taught by Mary Shelley’s story of Frankenstein’s monster is that things aren’t always greater than the sum of their parts, regardless of the quality of the parts themselves:
“I had selected his features as beautiful. Beautiful! Great God! His yellow skin scarcely covered the work of muscles and arteries beneath.”
An altogether less visceral but equally composition-based process goes into the building of today’s artificial intelligence (AI) platforms. One of the most powerful AI models in use today is deep learning, a machine learning algorithm that identifies patterns in various different (and enormous) sets of input data, and uses them to generate insights that help inform human decision-making. Deep learning applies vast layers of artificial neural networks to data, creating a ‘black box’ of calculations that are impossible for humans to understand. In short, it generates insights that we cannot possibly identify.
Making a monster of AI
Like Frankenstein’s monster, not knowing how the constituent parts of an AI algorithm interact with one another ultimately undermines the quality of the individual parts themselves. Luckily for data scientists, preventing the creation of a ‘monster’ when developing AI involves an understanding of data validity, rather than of the supernatural.
AI platforms built on deep learning are based on the idea that more data equals better accuracy. This generally holds true, but the actionable insights an AI produces are only as good as the data it ingests. This is why frameworks like the Oxford-Munich Code of Data Ethics (OMCDE) must be applied to the collection, processing and analysis of data.
What is the Oxford-Munich Code of Data Ethics (OMCDE)?
The OMCDE is a code of conduct built upon the input of researchers and leading representatives in Europe, and designed to address both practical and hypothetical ethical situations pertaining to data science in industry, academia and the public sector. Its stipulations are categorised into seven different areas: lawfulness; competence; dealing with data; algorithms & models; transparency, objectivity and truth; working alone and with others; and upcoming challenges.
In addressing the complexity and variety of numerous situations, the OMCDE assumes that even well-intentioned data professionals cannot always know and act in the best way without guidance. It is therefore subject to constant iteration and amendment, so its contents remain aligned to recognised best practices being implemented in the field.
Why does the OMCDE apply to AI?
As mentioned above, most forms of AI employed by organisations use neural networks - mathematical models to connect pieces of data together - and deep learning - drawing conclusions from multiple layers of these mathematical models. In other words, modern AI is simply an automated system of reading, understanding and generating outputs from data.
Being able to process, and make sense of, masses of data at a speed a thousand times faster than biological neurons, can often make AI a superior decision-maker over humans. This was exemplified by DeepMind’s AlphaGo when it defeated reigning world champion Lee Sedol in a five-game match of Go in 2016. But AI, without human oversight, may not always produce the best results, especially when we consider the wide number of areas where AI can be applied.
Consider the following example. A company uses AI to analyse its existing workforce, technological advancement and economic trends and produce a model predicting which job roles are the most likely to be impacted and face redundancy. Without proper interrogation of the data – looking for sampling biases, breaches of privacy, or issues with accuracy/validity, for example – any potential issues with the input will inevitably be carried over into the output. Unlike a board game, practices such as this can put people’s livelihoods at stake when poor data practices are compounded in AI.
How to apply the OMCDE to AI in practice
Data, analytics and AI groups within organisations typically follow an AI model development process that starts with the decision to build (or at least experiment with) AI, followed by designing, building, deploying and monitoring the AI model. At every stage, those involved must realise the inherent responsibility to ensure good data governance practices in line with the OMCDE are followed. A good way to implement this in practice is via activity documenting: an auditable, time based, record incorporating the source, methods, and discoveries related to the data used.
Aside from being briefed on how to spot potentially erroneous data, all stakeholders should also have full knowledge of the range of data being used. This not only shares the burden of responsibility amongst a wider group, but also facilitates transparency within the widest permitted forum within legal and proprietary constraints. Finally, all stakeholders must uphold a professional duty to correct any misunderstandings or unfounded expectations of colleagues, managers or decision makers who may rely on his/her work.
The ability to store, process and transfer data has increased exponentially for the past 50 years or so. At the same time, the relative cost to do so continues to drop (as explained by Moore’s Law). With the development of AI so closely aligned to these abilities, we will inevitably see AI algorithms being used in an increasing number of business applications, technological innovations and everyday situations.
A clear set of responsibilities and guidelines for those involved in the development of AI is imperative to make sure that this future is sustainable. Without these, its potential decision-making power could be its undoing – and ours too.
Dr Richard George, Chief Data Scientist - Faethm AI