CHAPTER ONE – AN INTRODUCTION TO ARTIFICIALLY INTELLIGENT SYSTEMS
This is a book about technology and artificial intelligence – more specifically one that is designed to assist the non-scientifically minded legal practitioner to understand the implications of this new and exciting technology: to demystify, and hopefully provide clarity around the issues and implications that we as legal practitioners need to take into account when navigating our disparate legal disciplines. It does not purport to be definitive – the technology is far too new for this and is in a state of constant evolution. What it strives to do is to provide a grounding relative to the current “state of the art” as it applies today, and separate fact from fiction.
We are currently experiencing a golden age or renaissance period for machine learning and wider artificial intelligence-based systems. This book is predominantly focussed on the innovations provided by machine learning – that is to say technology which, through exponential growth and scale in computing power, has enabled effective use of neural networks to enable machines to adapt and learn without explicit programming. It is important however not to ignore associated technologies that complement and enhance machine learning systems, and it is this wider ensemble of techniques (including machine learning itself) which I refer to as artificial intelligence or AI.
To the casual observer, it seems that the technology is advancing on a daily or even hourly basis. You can now talk to Alexa or Siri using ordinary spoken words (in whatever language is your norm) and through the miracle of natural language processing get a cogent answer, or instruct Google to run context based searches on characteristics gleaned from your photo collection. Meta’s Facebook will now identify your friends and family for you and Spotify will try to define and predict your musical tastes by selecting music tracks based on your listening history.
Generative AI systems represent the latest explosion in machine learning capability. As we shall see, these types of machine learning systems are capable of generating entirely new content-rich outputs based on simple user text prompts – whether that is in the medium of pictures, via models such as Midjourney or Dall-E, text via OpenAI’s ChatGPT, music via GoogleLM or speech synthesis via Uberduck. We are just really beginning to exploit the potential of these technologies.
What makes artificial intelligence systems different from traditional computer systems and why should we be worried?
Sometimes it can be difficult to differentiate fact from fiction – particularly in an environment which is driven by marketing hype and plain old misinformation – especially so given that the idea of the intelligent machine, fuelled by Hollywood, plays heavily on our collective psyche. Is for example artificial intelligence the end of the world as we know it and are we about to be suborned to a new species of supercomputer, or is it the key to a prosperous new future and a golden age of human endeavour? This book will at least give you an oversight of the issues involved from an ethical and legal perspective, and I suspect an appreciation that the reality is somewhat more prosaic than either the pessimists or optimists would have us believe.
In order to understand the practical implications of the technology, it is worth understanding how traditional computer systems work, and analogise this against how neuroscience currently thinks that neurons in the human brain function. We’ll then move on and use this analogy to provide a functional explanation for machine learning.
Computers in the traditional sense need prescriptive and directive sets of instructions to execute complex tasks – in essence, their programming. A traditional computer is unable to handle complex tasks effectively unless every eventuality is programmed into its code. As we all know, this makes these machines highly effective at larger repetitive or data intensive tasks, with a defined set of pre-programmed variables. What such machines cannot do well is adapt, learn, evolve or extrapolate their decisions to new and unforeseen situations. In contrast, the human brain although rather less good at repetitive tasks, is a marvel of flexibility and of navigating through a chaotic world. It does this through a process of conceptualisation.
The simple act of recognising a human face (or indeed any image) provides a real-life example. You learn from an early age as a human being to recognise a face from any angle or orientation, full on or profile. You can identify someone in low light and you don’t need an image of a face to be in colour or even in three dimensions for you to recognise it. Your brain can extrapolate a face from incomplete or partial data – indeed it is unbelievably good at “joining the dots” (so much so that we are often caught out by recognising anthropomorphic features in inanimate objects). A traditionally programmed computer finds this task almost insurmountably complex and difficult to achieve. You need to ensure that the face you want the machine to recognise is oriented in precisely the right way and under precisely the right lighting conditions – as otherwise it may not even identify it as a face. In this context, what the human neo-cortex achieves (and what current AI technologists are trying to replicate) is the holy grail of the “invariant representation“. This is the ability to learn what a “face” is as a concept (or indeed a cube, car, tree or any other animate or inanimate thing) and apply this to real world data. The notion of a concept introduces an entirely new dimension – a level of data abstraction, classification, recognition and labelling which enables semantic representation in areas such as pattern recognition and linguistics, in short, bringing order from chaos.
Put grandly, artificially intelligent systems aspire through their structures (to a greater or lesser degree) to have the ability to process unstructured data, to extrapolate it, and to adapt and evolve in ways which are comparable to human beings. How they achieve this (as we shall see later) is in fact through the application of deep mathematics, pattern correlation and statistical probabilities.
Robotics, Perception and Artificial Intelligence
So what makes robots different from machine learning systems? It is worth briefly covering the difference as to the uninitiated, this can be a somewhat confusing question – particularly when the terms are used fast, loose and interchangeably in most commercial, non-technical literature.
Obviously, at its most simplistic level, a robot is a mechanism that is designed to replicate the actions and behaviours of living creatures. It is a manufactured autonomous agent in the real world which is capable of autonomous action to a defined degree. Robots in some form or another have been in existence for over 200 years.
As AI systems have become more sophisticated and able, robotic research has likewise developed more capable robotic machines and scientists in the field of robotics have become more pre-occupied with replicating animal characteristics of proprioception – the unconscious ability in living creatures of knowing at all times the boundaries and extent of their physical body and that when a limb is extended in front of it, it is part of its own body and has a sense of movement. Knowing also that the limb is sensing hot or cold or touching another object, and whether it is constrained or injured are an integral part of this ability in living creatures. It is this need to physically interact with the real world on an effective basis which has been described by some commentators as one of the primary catalysts of animal and human intelligence and which is the driving emphasis for AI research in the field of robotics.
A brief history of AI
Before we get into how current “state of the art” artificially intelligent systems are structured, it is worth (briefly) looking at the historical evolution of machine learning. It was Alan Turing who first coined the term “artificial intelligence”, and who through his brilliant efforts in the second world war, managed to decode Nazi Germany ciphers produced on the Enigma machine. Alan Turing wrote on the concept of machine intelligence in a seminal 1950 paper. His analysis centred on human intelligence as a benchmark for artificial intelligence (more on this later). He postulated that if you could hold a test where a human conversed with a computer and that human could be fooled by a clever computer program into thinking that the machine they were talking to was in fact human, then the machine would have passed a benchmark for intelligence. This of course evolved into the famous “Turing test“.
The Turing test led to a surge in the mid-20th century in traditional programming techniques being used to emulate intelligence – however developers very soon realised the limitations of this approach. Programs such as Eliza, one of the very first early winners of a Turing test, fooled reviewers by adopting clever, but simple linguistic tricks (e.g. through the repetition of questions) which gave a superficial semblance of self-awareness and interaction through mimicry but did not create anything approaching human equivalent intelligence. Unsurprisingly AI development stagnated after these initial attempts.
The father of AI is rightly credited to be Marvin Minsky, an American cognitive research scientist who developed the first randomly wired neural network learning machine, SNARC in January 1952. Minsky was also author of the book Perceptrons (with Seymour Papert) in 1969 that became a seminal work in the field of artificial neural networks.
For the purposes of this book I am of course paraphrasing a long and complex developmental history – there are many works which espouse the historical evolution of artificial intelligence in a much greater depth but it was subsequent to the development of the Turing test that AI research bifurcated into two distinct directions. One of which centred on the earlier approach of “emulation” – namely focussing on mimicking outwardly observable intelligent behaviours; and a new approach of “simulation” – one which was based on the view that in order to achieve machine intelligence the fundamental structure and processes of neurons firing in the human nervous system had to be simulated.
As we shall see later on, both branches of research are propelling AI development through its current renaissance. There are however still limitations. Not least due to the fact that despite much scientific endeavour, we are still a very long way off from understanding precisely how the human brain functions, and for this reason, even with the advent of generative AI, probably a long way off from developing a machine with human equivalent (or greater) levels of sentience (or feeling, perception and subjective experience), combined with objective reasoning and logic – the holy grail of Artificial General Intelligence (AGI) or “Strong AI“.
Inevitably, AI research and development initiatives have themselves evolved and differentiated themselves by trial and error – as some structures have shown promise, those have been refined, so now we have lines of machine learning research and applications which may in fact have very little relationship to how neuroscience currently understands the organisation and structure of the human brain (in addition of course to those that still strive to closely model real organic brain function in so far as it is understood).
So let’s now take a slightly more detailed look at what is actually happening in the field of artificial intelligence at the moment.
As I mentioned at the beginning of this chapter, we have seen a dramatic rise in the effectiveness and use of solutions based on “Weak AI” or “Narrow AI” or Artificial Specific Intelligence (ASI) – AI solutions that are based around a specific, narrowly defined task or application, collectively (and somewhat confusingly) falling under the umbrella term of IA or Intelligent Automation.
Generally speaking, machine learning systems fall into one of four categories – they can be optimised for use cases that involve either prediction, classification, generation or comprehension (understanding), or obviously any combination of the above:
- Prediction refers to the ability of the machine to identify and predict choices;
- Classification refers to the ability of the machine to discriminate between different classes of data:
- Generation refers to the ability of the machine to generate entirely new classes of data (a particular capability which we will examine in more detail in Chapter Three); and
- Comprehension refers to the ability of the machine to understand “real world” inputs, whether that is through text or speech.
Each of the above generally involves different machine learning optimisations and model types. Translating the above categories into use cases, there are several real-world artificial intelligence applications which are driving developments in the technology:
- Image processing and tagging
Image processing as it suggests requires algorithms to analyse images or to discriminate between them to get data or to perform transformations. Facial recognition is the example that immediately springs to mind in this context, however more prosaic examples of this include identification/image tagging – as used in applications such as Facebook to apply name labels to individuals or to ascertain other data from a visual scan, such as health of an individual or location recognition for geodata. Optical Character Recognition – where algorithms learn to read handwritten text and convert documents into digital versions is another good example of this.
- 3D Environment processing
3D environment processing is an extension of the image processing and tagging skill – most obviously translated into the skills required by an algorithm in a robot or a “CAV” (connected and autonomous vehicle) to spatially understand its location and environment. This uses image data but also potentially radar and lidar sourced spatial scanning data to process 3D geometries. Typically this technology could also be used in free roaming robot devices including pilotless drones or aisle patrolling robots in supermarkets tasked with reviewing stock levels.
- Textual Analysis
These are techniques and algorithms which extract information from or classify textual data. Textual analysis uses two distinct approaches – one based purely on pattern recognition of words and their meanings and concatenated sequences of the same, the other on grammar driven natural language processing. In terms of practical usage, these could include social media postings, tweets or emails. The technology may then be used to provide filtering (for SPAM); information extraction – for example to pull out particular pieces of data such as names and addresses or more complex “sense based” sentiment analysis – to identify the mood of the person writing (as Facebook has recently implemented in relation to postings by members who may be potentially suicidal). Text analysis is also at the heart of Chatbot technology – allowing for interaction on social media, or providing for automated first line technical support.
- Speech Analysis
Speech processing takes equivalent skills to those used for textual documents and applies them to the spoken word. It is this area which is seeing an incredible level of investment in the creation of personal digital home assistants from the likes of Amazon (with its Echo device), Google’s Home device and Apple with Siri. As with textual analysis, speech analysis has moved into a more sophisticated realm of understanding not just the words themselves, but also the manner in which they are spoken. Emotion detection tools are now playing a key part in back office call centre SLA management.
- Data Mining
This is the process of discovering patterns or extrapolating trends from data. Data mining algorithms are used for such things as Anomaly detection – identifying for example fraudulent entries or transactions as outliers or classifying them as known types of fraud; Association rules – detecting supermarket purchasing habits by looking at a shopper’s typical shopping basket; and Predictions – predicting a variable from a set of others to extrapolate for example a credit score.
- Video game virtual environment processing
Video games are a multi-billion dollar entertainment industry but they are also key sandboxes for machines to interact with and learn behaviours in relation to other elements in a virtual environment, including interacting with the players themselves.
Machine Learning – Understanding the basic elements
At the most simplistic level, machine learning systems are no different from conventional computer systems in that both rely on the elements of computational hardware and software to function effectively.
Whilst many modern machine learning systems exploit huge advances in computing scale and power and make use of the vast amounts of data that are available in our “big data” society, they still need what would be recognisable as a computing platform. It is the logic or software that such systems use which differs markedly from traditional computer programs directly created by human programmers which require sequential, explicit and largely linear instructions that are followed to the letter by the machine.
I have already explained that technologists developing machine learning systems have developed a variety of solutions, methodologies, models and structures to get machines to “think” in a narrow AI sense. In fact so many approaches have been developed, it can be difficult for the non-computer scientist to effectively decode them. To further complicate this issue, many proprietary machine learning systems employ a “mix” of adaptive learning solutions which are optimised for the particular applications at hand. We’ll step through some of these models in a moment, but for the moment, and in order to provide a consistent framework for you, the reader, it is worth setting out the very basic absolutes of machine learning systems – in other words, those conceptual elements that all current machine learning systems use. It is important to stress that this framework applies to that subset of AI which is machine learning as defined at the beginning of this chapter – the terms may not be relevant to other wider or peripheral AI technologies which are generally out of scope for the purposes of this book.
At the most simplistic level It is generally established that machine learning systems are comprised of three parts: the Model – that is to say the way in which the system is structured and organised – in short its mathematical architecture, including the nodes, links and weights which will be applied to the data to be processed. Then there are the Parameters – these are the properties of the training data that are to be learned during training and finally the Learner –generally comprised of a learning algorithm – that part of the system that adjusts the model for how it processes the parameters on a supervised, unsupervised or reinforcement basis (see below under “Training”) and an activation or transfer algorithm which defines the way in which data are to be transferred between nodes within the network by forwards and or backwards propagation (see further below under “Backpropagation”).
The word “algorithm” is often used to generalise the entire process I have described above. In many texts you might as well substitute the words for “magic spell”.
In fact, as we have seen above, an algorithm is merely a set of complex mathematical actions expressed as a formula conceptualising the model utilised by the system. Clearly machine learning algorithms are not magic spells. More prosaically, learning algorithms selectively adapt and change the model and parameters based on the data that is introduced into the machine learning system. Rather confusingly as well, it is important to note (and often misunderstood), that simply referring to a solution that is “algorithmic” does not necessarily imply that that solution has any degree of artificial intelligence or machine learning.
All machine learning systems need to be “trained” (or train themselves) with a data set – a set of data points which are intended to assist the system to “understand” the relevant narrow AI task at hand and which contain the parameters I have described above. Training data sets are, as we shall see later on in this book, an incredibly significant and important feature of these systems. The nature of the training data set provided however is very different depending upon whether the system is designed to learn on a “supervised”, “unsupervised” or “reinforcement” basis.
Supervised learning systems are provided with guiding or labelled training data sets that contain a mass of examples of the desired answers and outputs. In such cases, the training data are subsets of data that are well known in terms of all of their features, content, correlations and so forth and thus can provide a good representative example to benchmark the outputs of the system. Typically these data sets will be past, real life examples of the problem the system has been configured to resolve. Usually this type of machine learning supports evaluation, classification or prediction outcomes.
Systems that are designed to learn on an unsupervised basis in contrast are provided with unlabelled data that may or may not contain the desired answer or output. In these cases, the systems attempt to find either outliers or correlations or patterns in the data without any form of guidance.
Provided data sets may be clustered into different classes that share some common characteristics – unsurprisingly, this is often referred to as “clustering” by the industry. As we’ll see in Chapter Three, Generative AI tools use innovative variations of this type of learning (typically on a massive scale), to generate novel outputs.
Reinforcement learning is a similar process to unsupervised learning – however machine learning systems here are typically exposed to a rules-based competitive environment where they train themselves continuously using trial and error to try to find the best reward – which might be winning a game or earning more money. This type of AI system attempts to learn from past experience in order to refine and improve decision outcomes, and is particularly well suited to closed environments with a static framework.
Whether the system is designed to work on a supervised or unsupervised basis it is clearly vitally important that the data sets provided to train the system need to be representative of the underlying problem the system is being designed to resolve and should not be unbalanced or skewed in any particular direction (aka biased). As we’ll see later on in this book, there are particular issues with this which could trap the unwary. Proactively editing out bias too far in a data set may well have the opposite and undesired effect of making that data set too specific (so called “overfitting”) and less representative of the data class, in turn leading to the corresponding outputs of the system being prejudiced in favour of or against particular outcomes.
Backpropagation is the process which allows a machine learning system to adjust and change by reference to previous outcomes.
In technical terms it is the means by which the machine learning system (via the activation algorithm, see above) mathematically computes and optimises the gradient descent required in the calculation of weights used in the network by distributing back through network nodes.
So these are the basic features of AI or machine learning systems. As I mentioned above, there are a variety of different AI Models which are being deployed, and it is worth spending a little time on the most significant of these.
Artificial Neural Networks
Artificial neural networks aim to most closely model the functioning of the human brain through the simulation approach and contain all of the basic machine learning elements described above.
Neuroscience has established that the human neo-cortex, which is where most of our higher brain function resides, consists of a very dense population of interconnected neurons or brain cells. Neurons consist of the soma (the cell body) and axons and dendrites (the filaments that extend from the cell itself). Observed higher human brain functions require groups or networks of neurons to fire together in electro chemical activity. Even though the base physical structure of the cortex is the same, it is settled science that there are cortical regions that specialise in particular skills, such as language, motor movement, sight and hearing. More recently, it has been established that there are progressively “higher” levels of brain processing driven by layers of neurons. You might for example have a collection of neurons at a low level that are specialised in the visual detection of “edges”. Data from these lower functions is passed up the tree (or network) to higher functions, such as those that might detect nose shapes or eye shapes, so that collectively, you are able to perceive a face.
In the world of artificial intelligence, scientists have attempted to replicate or model these structures and their functionality by use of neural networks. In simplistic terms, neural networks can be organised as “shallow” – 1-3 layers or “deep” – over 7 layers (as is generally the case in the human neo-cortex).
Artificial neural networks are composed of artificial input “neurons” – virtual computing “cells” that activate, that is to say, assume a numeric value (by reference to a chosen algorithm which applies weights and biases to that numeric value – in effect influencing it) and then hand it off to another layer of the network, which also applies an algorithmic treatment to it, and so on and so forth until the data has passed through the entire network and is outputted.
The process is heavily mathematical. Most neural networks apply something called a cost function, which is an additional mathematical process that determines how to adjust the network’s weights and biases in order to make it more accurate. Typically this is achieved by something called gradient descent – a calculus derived mathematical function which is designed to reach the minima (the lowest possible, and hence the most accurate value, of the cost function described above).
Stepping away from the maths for a moment, it is worth making the point that Artificial Neural Networks process data in a highly complex non-linear manner through a number of ‘hidden layers’ – as we saw from the basic elements section above, data may backpropagate as well as move forwards through the network (a further mathematically driven process of output refinement and tuning). This complexity drives opacity in their processes which often leads them to being referred to as “black boxes”. This can make it very difficult to understand and equally difficult to explain how or why the system has reached a particular outcome in a particular instance.
It is also important to note that neural networks come in a variety of flavours, so for example Regression Neural Networks are neural networks that are trained to analyse data on either a linear or non-linear regression basis (a simple example of regression analysis might be extrapolating growth rates from height measurements taken from a child).
Deep Learning Networks
Deep learning models are simply varieties of artificial neural networks that employ vastly greater computing power, more recent algorithmic innovations and much bigger data sets – in short, they take advantage of the much greater computing power available to us, but operate in much the same manner as I have described above. Of course, the challenges of explainability and transparency are also correspondingly amplified when such networks are used.
On a wider basis, it is probably worth providing a brief introduction to some other non-machine learning AI technologies, although I do not propose to delve into these in any great detail.
In contrast to machine learning, decision trees are a “white box” artificial intelligence model – typically their decisions are more easily explicable and they are principally used for classification style problems. In simple terms a decision tree works by processing data through a series of question “nodes” (similar to a flow diagram type structure). Each node hands off data to corresponding next layer once a question has been answered. Decision trees usually work on a Boolean basis (ie yes or no).
Decision trees depend upon being able to classify data sets in an expected manner and are not suitable for applications that are based on unsupervised pattern correlation or recognition. This means that they are in turn susceptible to in-built bias if the overall problem they are designed to resolve has been incorrectly modelled.
Random Forests and Deep Forests are very large ensembles of Decision Trees. In simple terms, Random Forests take random subsets of data to be analysed and assign these to individual trees. The collective output of the trees provides a range of responses which can be correlated on a statistical basis to provide a stronger prediction than a single decision tree alone. I do not propose to explain the intricacies of these structures further in this book, however the reader should note that by aggregating decision trees together in a very large hierarchy, such structures may become inherently less explicable in terms of their decision making.
Probabilistic artificial intelligence
Probabilistic or Bayesian artificial intelligence techniques are some of the hardest to conceptualise and understand. They may or may not incorporate machine learning technology. Systems that work on this basis are attempting, by application of mathematical probability theory, to express all forms of uncertainty associated with the problem the model is trying to resolve. They then apply inverse probability (Bayes’ Rule) to infer unknown characteristics of the problem to make predictions about, and learn from, observed data (so called “inference algorithms“).
The greatest strength of Probabilistic models is that they know what they do not know or in effect have an internal representation of past outcomes that are learned, on the basis of which they can guess a probable outcome. We live and work in a very messy world – and many decisions we take are inferred from observable data sets that are incomplete. Such systems are therefore incredibly powerful but do depend on a very careful probabilistic representation of uncertainty.
So that was an introduction to the key elements of the technology. In the following chapter we delve into a little more detail on the core ethical issues surrounding the use and deployment of machine learning systems (as these ethical issues form the keystone of nascent AI regulation). In Chapter Three we discuss the latest and most controversial aspects of the technology, Generative AI, and explain how that differs from what I would refer to as the established AI techniques discussed in this chapter.
 This is known scientifically as the concept of pareidolia
 This is difficulty is often referred to as Moravec’s Paradox, articulated by Hans Moravec and others in 1988. He postulated that whilst it was comparatively easy to get computers to exhibit “adult” level skills, such as playing chess or checkers (draughts), it was difficult or impossible to give them the skills of a one year-old in terms of perception and mobility
 The term robot derives from the Czech word robota, which means forced labour or slavery
 Computing Machinery and Intelligence (1950) by A.M. Turing
 Stochastic neural analog reinforcement calculator
 Lidar measures spatial distances by illuminating objects with laser light and measuring reflections with a sensor. This can then be used to create 3D scans of objects in the immediate vicinity of the CAV
 See for example http://www.bbc.co.uk/news/technology-39126027 “Facebook artificial intelligence spots suicidal users”, 1st March 2017
 See for example Deloitte’s TrueVoice software enterprise product. TrueVoice applies behaviour and emotion analytics to assist in the optimal management of call centres
 See for example the August 2017 edition knowledge feature of Edge Magazine – “Machine Language” which discusses new startup SpiritAI – a business that has developed an intelligent character engine for NPCs (non-player characters) in video games, thus obviating the need for thousands of pages of pre-scripted dialogue
 Bizarrely, scientist have apparently discovered that there is a neuron that appears to be dedicated to Bill Clinton recognition. See On Intelligence, by Jeff Hawkins