What Is Artificial Intelligence?
If you follow the hype, Artificial Intelligence Is The Next Big Thing. Our houses, our cars, our toasters, all of these seem to be teeming, even overflowing with intelligence, like some great fungus gone amuck. AI is here to stay, and you need it, right now!
Okay, that may be a bit overblown, to the extent that it is worth asking what exactly all this artificial intelligence stuff is, and whether it may actually not be as great as everyone claims it is (nor as bad as everyone fears, if you take the opposite stance that AI is going after everyone’s jobs).
A bit of a history lesson then is in order. Artificial intelligence as a concept has been around for as long as humans have been telling stories. Singing swords, enchanted items, the various stuff of magic is a way of ascribing intelligence and free agency to inanimate objects. Hephaestus, the Greek god of the forge, supposedly created bronze handmaidens to help him when he was crafting the weapons of the gods. Talos, the Bronze automata that Hephaestus created to protect the isle of Crete featured as one of the more gripping stories in the tales of Jason and the Argonauts, where Jason and his men were only able to defeat him by taking off the oil cap on his ankle (the original Achilles heel) and letting the oil drain out.
Much more recently, in the 1950s, a team of researchers led by Marvin Minsky and John McCarthy established what would in time become the MIT Computer Science and Artificial Intelligence Laboratory. Minsky himself was a controversial figure during his life (he died in 2016). He was reponsible for one of the first neural networks, an algorithm that roughly modeled the way a limited number of neurons worked in the brain, but his criticisms of the theories of others such as Frank Rosenblatt’s works on what the latter referred to as Perceptrons and his attempt to downplay what AI could do dampened investor interest in AI dramatically, leading eventually to what has since become known as the AI Winter that lasted through much of the 1960s and 70s.
In retrospect, this may not have been a bad thing. Minsky was correct in his assessment that computing power was insufficient at the time for AI to really work, and it would take the compounding factor of Gordon Moore’s doubling of processing power every eighteen months another thirty years to reach a stage where the computers were beginning to have the horsepower to explore neural networks to a reasonable level. Ironically, Rosenblatt’s perceptron would end up figuring prominently in that, along with the growing realization that non-linear mathematics would be at the heart of that.
Indeed, this was one of Minsky’s key arguments in the book that he and psychologist Nicolas Papert wrote, that the perceptron was a non-linear approach, and hence not solvable with technology of the time.
Moving Beyond Linearity
Linearity is a mathematical concept that has a few different meanings. At its simplest, it means that you can solve problems using variations off of y=a*x + b. For instance, the relationship between temperatures in Fahrenheit and Celsius is given as C = (5/9) * (F – 32). More generally, it means that you can transform formulas in such a way that the transformed formula has this kind of relationship. Exponential and logarithmic equations are often handled this way, and, if complex numbers (real + imaginary numbers) are used, this also includes trigonometric functions like sines and cosines.
All of these happen (not coincidentally) to be solutions of linear differential equations in calculus, which means among other things, they can be solved exactly, and can be solved with comparatively little problem using numerical methods. Because they describe the behavior of a lot of engineering systems at a fundamental level, mathematicians work very hard to take problems and make them linear.
Non-linear equations, on the other hand, describe a much wider domain of problems, but usually the solutions cannot be transformed into a linear equation, making it harder to solve. For instance, Newton’s equations of motions describe the behavior of perfect objects – a hockey puck on ice, for instance, will stay at the same velocity it was hit until it encounters a barrier.
However, the same hockey puck on concrete will slow down dramatically, will hop about, and will spin. Why? Friction. Once you introduce friction into the equation, that equation goes non-linear, and it becomes considerably harder to predict its behavior. They become much more sensitive to initial conditions, and can often become discontinuous so that for two points that are more or less next to one another in the source, the resulting function maps them in ways that result in them being nowhere near one another in the target.
The simplest example of this is the hyperbolic equation: y = 1/x. As you get closer to x on the positive size, the value of y goes up, while it goes down for the corresponding negative values of x. At x = 0, the equation is meaningless. This is called a discontinuous function, and it’s the bane of mathematicians and physicists everywhere.
There is, however, another class of functions called higher order functions, in which the output of a function is then used as the input to that same function. For instance, suppose that you have a function y = f(x) = x + 1. When x = 0, y =1, a nice simple linear equation. However, f(f(0) = f(1) = 2, f(f(f(0) = 3 and so forth. This is an example of a recursive function.
Non-linear recursive functions tend to produce a cloud of discontinuous points, but the intriguing thing, first discovered by meteorologist Edward Lorentz in 1971, was that if you ran enough points, the cloud would converge upon an orbit that was not quite re-entrant, what he dubbed a strange attractor.
Based upon the work of Lorenz and his own research on the similarity of the stock market movement to the shape of coastlines, mathematician Benoit Manderlbrot popularized the visualizations of non-linear equations by calling them “fractals”, because they exhibited characteristics similar to the linear-dimensions (dot,line,plane,space, hyperspace) that we’re familiar with, but fell somewhere in between these dimensions.
Fractals for a while become a popular field for mining for computer screen backgrounds before the craze finally died down, but unbeknownst to most, they would find a second life in the world of computing as the research into neural networks began to encounter the increased speed and memory availability of computing systems.
Text Analytics In A Non-Linear World
However, another diversion is necessary to get there. Linguistic computing has long been something of a backwater in the computing science field. As computers went from large vacuum tube systems down to notebooks and ultimately cell phones and tablets, the urge to talk to (okay, scream at) your computer has, if anything, only become stronger over time. Similarly, there are many tasks associated with curating books and magazine articles – determining salient points, common topics and summarizing – that are both time intensive and require a great deal of skill to do well. If we could get computers to read and summarize (or even more powerfully read and translate) on the fly, it would solve one of the biggest headaches in almost any organization: being able to find the information that you need in media.
There is a whole field that has been around since the 1960s called textual analysis, which involves the use of statistical functions in order to determine the thematic similarity between two works. It’s success has been mixed – the search capabilities that these bring is far better than the manual efforts of a legion of librarians manually summarizing, but relevancy is still pretty poor.
In most cases, what is actually used in such systems are indexes, typically with some kind of strength indicator between words and phrases, coupled with where these are located. The statistics are (mostly) linear, but it means in general that there are significant limitations to what can be interpreted.
Tim Berners-Lee , the creator of the first web browser, client and communication protocol that underlies the Internet, started this effort primarily as a way to make it easier to find documents at CERN in Switzerland. By tagging content and building in metadata directly in the documents, Berners-Lee was able to make the documents more machine readable.
He came back to this theme a decade and a half later, and realized that he could use a similar approach with any kind of data. The big difference was that he realized that the information in a “data document” could be broken down into an interconnected graph network of simpler assertions. Each node became an identifier of an entity or concept, each edge a vector that described a relationship with other nodes.
This “graph” view provided a number of huge advantages over traditional databases. First, metadata about something could be added simply by creating an link to from the metadata back to the item in question. Second, a resource could have more than one value for a given property without the requirements of building a whole table. Finally, it became much easier to abstract out patterns of behavior in the data using the metadata, patterns that could be traversed recursively.
Notice a pattern beginning to develop here? Recursion is difficult to do in a relational database, so there are very few recursive design patterns. With one query, you can essentially reproduce a family tree when working on a graph, you can traverse across the graph without necessarily knowing the next adjacent nodes, and you can merge multiple graphs together without duplication.
This can be used to provide searches based upon connections – search for Batman and you get superhero as a concept, and from this can search through all black (or dark gray) wearing caped crusaders. This is difficult to do with a relational database. It can be done with an XML or JSON database, but in general neither of these is very good at managing references to other entities.
The advantage of working with graphs of information is that the information gets atomized and refactored as distinct properties, often buildable from other properties. In effect, you’re dealing with multi-dimensional data, and can even change the categories that different facets of the data have dynamically. Finally, you can detect patterns based upon clusters of data dynamically, without necessarily knowing a priori what properties you’re working with. This means that the data can, over time, become self-organizing.
These types of graph databases are known as knowledge graphs. There are other forms of graph data stores that are optimized for different kinds of processing, but most of them work on the idea that information is stored in a graph or mesh connecting entities with relational information.
Wheels Within Wheels
Approaching 2020, artificial intelligence seems to have several key factors:
- Recursion (non-linear processing) is used to create multiple levels of abstraction,
- Information is worked on holistically – at any given point information is determined primarily by context, but context switching between and along abstraction layers is a critical part of finding and transforming content.
- Categorization is an integral part of the process, with categories coming into existence as a utility to perform certain operations before disappearing once again.
- The graph of information is constantly changing in response to filters that are themselves dynamically generated based on existing information.
- Intelligence is nodal but also distributed and stochastic – the information that you have in the system is never complete nor totally comprehensive, and decisions can only be made when a tipping point of facts confirming or denying a certain query are reached (information becomes stochastic).
- At any given point, information exists in a model, but that model is itself flexible and has the potential to be self-modifying. This is unlike existing systems where the model is usually predetermined.
I think these all fit into our current definition of artificial intelligence, but there’s an additional definition that is only beginning to emerge:
- The system has a certain degree of self-awareness.
I add this final point with some trepidation, but I think that it is important. Awareness comes in the ability to detect anomalous patterns that threaten the fidelity of the data, actions that are potentially destructive, and actions that incentivize more efficient storage or access of information. Self-healing data systems are one such form of awareness. Systems that are able to determine (and later counter) unwanted bias are another. Once you have a data system that is capable of rejecting data not because of syntactical issues but because the provenance of that data “tastes funny”, you have a system that is beginning to become self-aware.
There’s another important point here that may not have been obvious – artificial intelligence is not an algorithm. It is a network of databases that uses both data science algorithms (which are mostly linear in the broader sense) and higher order functions (recursion and fractal analysis) to change the state of itself in real time.
I think this also sidesteps the Turing Test problem, which basically says an artificial intelligent system is one in which it becomes indistinguishable from a human being in terms of its ability to hold a conversation. That particular definition is too anthropocentric. To be honest, there are a great number of human beings who would appear to be incapable of holding a human conversation – look at Facebook. If anything, the bots are smarter.
The above set of definitions are also increasingly consistent with modern cognitive theory about human intelligence, which is to say that intelligence exists because there are multiple nodes of specialized sub-brains that individually perform certain actions and retain certain state, and our awareness comes from one particular sub-brain that samples aspects of the activity going on around it and uses that to synthesize a model of reality and of ourselves. We even have a pretty good idea how to turn that particular node on or off, via general anesthesia. The rest of the brain is still working, just the part that is your consciousness is temporarily 404’d.
Within those definitions, there are a number of technologies that fall within the rubric of Artificial Intelligence:
- Machine Learning. Data systems that modify themselves by building, testing and discarding models recursively in order to better identify or classify input data.
- Reinforcement Learning. The use of rewarding systems that achieve objectives in order to strengthen (or weaken) specific outcomes. This is frequently used with agent systems.
- Deep Learning. Systems that specifically rely upon non-linear neural networks to build out machine learning systems, often relying upon using the machine learning to actually model the system doing the modeling. It is mostly a subset of machine learning with a specific emphasis on neural nets.
- Agent Systems. Systems in which autonomous agents interact within a given environment in order to simulate emergent or crowd based behavior. Used more and more frequently with games in particular, but is also used with other forms of simulations.
- Non-Linear Grid Systems. A variation of agented systems in which cells in n-dimensional grids maintain internal state but also receive stimulae from adjacent cells and generate output to those cells. The distant ancestor of most of these is Conway’s Game of Life, but the idea is used to a much higher degree of complexity with most weather and stock modeling systems that are fundamentally recursive.
- Self-Modifying Graph Systems. These include knowledge bases and so forth in which the state of the system changes due to system contingent heuristics.
- Knowledge Bases, Business Intelligence Systems and Expert Systems. These often form a spectrum from traditional data systems to aggregate semantic knowledge graphs. To a certain extent they are human curated, but some of this curation is increasingly switching over to machine learning for both classification, categorization and abstraction.
- Chatbots and Intelligent Agents. This differs from agent systems. Agents in general are computer systems that are able to parse written or spoken text, use it to retrieve certain content or perform certain actions, and the respond using appropriately constructed content. The earliest such system, Eliza, dates back to the mid-1960s, but was very primitive. Today’s agents and chatbots, on the other hand, use a combination of semantics, Bayesian analysis and machine learning to both build up the appropriate information and learn about the user.
- Visual/Audio Recognition Systems. In most cases V/A systems work by converting the media in question to an encoded compressed form, then algorithms look via either indexes or machine learning systems for the closest matches. This is often enhanced with Bayesian Analysis, where specific patterns are analysed based upon their frequency of occurrence relative to one another, and are also often tied in with semantic systems that provide relationship information.
- Fractal Visualization. The connection between fractals and AI runs deep, and not surprisingly one of the biggest areas for AI is in the development of parameterized natural rendering – the movement of water, the roar of fire, the coarseness of rock, the effects of smoke in the air, all of which have become standard fare in big Hollywood blockbusters.
It’s also worth noting what aren’t themselves AI systems, but often play in the same general “space”:
- Autonomous Vehicles. These make use of visual recognition systems and real time modeling in order to both anticipate obstacles (static and moving) and to determine actions based upon objectives.
- Drones. A drone is an autonomous vehicle without a passenger, and can be as small as a dragonfly or as large as a jet. Drones can also act in a coordinated fashion, either by following swarm behavior (an agent system) or by following preprogrammed instructions.
- Data Science / Data Analytics. This is the use of data to identify patterns or predict behavior. This uses a combination of machine learning techniques and numeric statistical analysis, along with an increasingly large roll for non-linear differential equations. The primary distinction is that most data scientist does not make heavy use of higher order functions or recursion, though again, this is changing.
- Blockchain and Distributed Ledgers. Distributed ledger technology underlies electronic coinage, but it is also playing a bigger and bigger role in tracking resources and transactions. One aspect of such systems is that they make it possible to bind virtual objects as if they were unique physical objects, in effect making intellectual property exchangeable. This has application throughout the AI space, especially in the realm of agented systems, even if it is not AI per se.i
- Internet of Things / Robotics. Internet of things is intended to provide network connectivity to devices so that they can communicate with other devices. Robotics involves creating autonomous physical agents capable of movement. In that both of these may end up managing their own state, relies upon AI-based systems for identifying signals and determining response, they use AI, but aren’t directly AI.
- GPUs. The Central Processing Unit is so last century. Artificial intelligence is taking advantage of Graph Processing Units in a big way, as their structure makes them ideal for both semantic analysis and recursive filter applications.
The modern high end camera is a good example of an AI-enabled device. The focusing system is essentially a robot – it has servos and actuators, it is capable of operation independent of a human host, its auto-focusing system involves a continuous feedback loop to determine the best focal length and exposure even before the picture is “taken”, it makes use of light sensitive arrays that convert light into digital signals, it typically stores dozens or even hundreds of “photos” that it can then composite, and then has AI routines capable of removing red-eye, improve focusing and compensate for lighting conditions. In most cases these very complex operations are hidden from the user, who just know that they are taking better photographs than ever before.
As to where the line gets drawn between software and AI, the reality is that the line has been blurry and is becoming more so daily. The vanilla office suite – word processing documents, spreadsheets, presentation software and mail clients – would seem to be the last bastion of non-AI software, but in point of fact all of these packages learn from their users.
Clippy, the annoying pop-up sprite that defined an entire generation of Microsoft Office users, was AI, albeit a very primitive one. Clippy’s descendents are busy at work pointing out editing problems and misspellings, making suggestions about different presentation styles, identifying patterns and suggesting better tools to solve problems. Excel’s charting suite will create what it believes is the most suitable type of chart to display data, and will then factor in your own interests when you override it so next time it may be more inclined to use your charts of interest – and even note what color schemes you most like to work with.
Adobe Photoshop CS now has the ability to select figures of interest with a single menu click. Image recognition plays a big part in that capability, and increasingly that is also being tied back into semantic systems that will identify the person being selected and update the metadata so that photoshop files can be searched automatically for pictures of that person – all without the user being aware that anything is happening.
In other words, what is happening today is that while most of the software that users work with are not examples of artificial intelligence, AI is in the background, making suggestions, handling classification and identification, pulling together options for making decisions and in some cases making those decisions autonomously. This trend will only continue.
It can be argued that artificial intelligence isn’t really a technology, per se, but is instead, at any given point, a set of future facing technologies that usually manifest near the end of an ascending business cycle. Some, like fully certified Level 5 autonomous vehicles, quantum communication systems and even artificial general intelligence (AGI) are years or decades in the future. However, seeds for these are being planted now, and it is likely, even if there is an economic downturn, that research will continue during the quiet periods.
So, one good way of thinking about artificial intelligence is that it is the process of teaching computers to deal with multiple levels of abstraction simultaneously. It’s information processing gone fractal. And it is, ironically enough, here to stay.