AI assistants capable of intelligent conversation are the Holy Grail of AI. This is a natural and reasonable expectation of real artificial intelligence. We would expect such assistants to remember what was said, to learn interactively, to understand complex sentences, and to be able to reason and explain themselves. Up to now, this has been a largely unsolved problem.
One reason for this is that human cognition and language are extremely complex, requiring fluid context, common-sense knowledge, adaptive learning, and as linguists will tell you, complex handling of syntax, semantics, and pragmatics. Here’s a small sampling of the issues involved:
Multiple word meaning selection, tenses, plurals, conjugations, nouns, adjectives, adverbs, verbs, auxiliaries, determiners, genitives, preposition and clause attachment resolution, ditransitives, pronoun and co-reference resolution, actor/ patient identification, entities versus concepts, personas, proper nouns, names, titles, genders, complex ontologies, inheritance (up & down), synonyms, antonyms, space and time, scalars, mass nouns, data types, unit-of-measure, conversions, relationships: space, time, relatives, cause-effect, compounds, negation, and, or, (fuzzy) quantifiers, part-of, ownership, implications, conditionals, contradictions, fuzzy values, certainty, temporal information, patterns, sequences, analogy, same meaning, meta-cognition, emotions, confusion, error recovery, disambiguation…
Another reason that today’s assistants and chatbots are far from meeting our AI expectations is that they use the wrong technology, implementing roughly the same approach:
The standard ‘solution’ today is to train a fixed categorizer using tagged examples for each ‘intent’ or skill that is supported. For example, ‘bla bla bla Uber’ will trigger the Uber skill, or ‘bla bla, umbrella’ may select the weather report. Each ‘skill’ is hand-crafted or specified by extracting required key words (e.g. Uber destination), and possibly handling follow-up questions. While some of the tools available to create these skills are quite sophisticated, there is little or no memory, no real-time learning, no deep parsing, no generalization or reasoning. Moreover, each skill operates largely in a vacuum: it is oblivious to overall goals and context, to what transpired before, and to what information may be needed for other related skills. That’s why these systems are often described as ‘voice search’; they perform one trick at a time.
Let’s contrast this with the start of a simple conversation that you could have, even with a five-year old child:
You: “My sister’s cat, Spock, is… pregnant.”
Now, just these 6 words convey several facts that the child would immediately understand and learn: you have a sister, your sister owns a cat, and the cat’s name is ‘Spock’. Upon hearing Spock, the child could assume the cat is male, however, learning that the cat is pregnant would provide enough context for the child to understand that the cat is female.
The knowledge gained in these few words serves as context for all events that follow, and perhaps a week later the child would ask ‘Have the kittens arrived?’. Another scenario could occur if the child already knew you have two sisters, allowing her to ask ‘Which sister, Kate or June?’.
After telling Alexa or Siri those same six words you would be lucky to get a Star Trek episode for a response.
To better understand why current technologies are not up to the task, it helps to look at what intelligence is, and what it requires. A fair description being:
“Intelligence is the cognitive ability to understand the world; to help achieve a wide variety of goals; and to integrate such knowledge and skills in ongoing learning. It needs to function in real time, in the real world, and with limited knowledge and time.
Human intelligence is special in that it features the ability to form and use highly abstract concepts, and to think and reason using symbols (language).”
Intelligence requires the ability to proactively adapt and learn knowledge in real time, and to be able to generalize these.
Current chatbot/assistant technology does not fulfill the requirements for intelligence at all:
Experts agree that a fundamentally different technology from DL or ML is need to achieve human-like cognition:
“To build better computer brains, we need to look at our own”
Demis Hassabis, Google’s Deep Mind founder
“ My view is to throw it all away and start again”
Geoff Hinton, godfather of Deep Learning
“Electric light bulbs did not come about from the continuous improvement of the candle”
The Aigo Team heartily agrees with the sentiment of the DARPA report, that AI needs to move towards the ‘Third Wave’.
The ‘First Wave’ refers to standard programming approaches such as rule systems and formal logic, while the ‘Second Wave’ encompasses the great strides made in machine learning (ML) and deep learning (DL) since about 2010.
It has been apparent for a long time that flowchart-like logic programming will not create intelligent, conversational AI. There exist far too many combinations of natural language.
As big data, computing power, and ‘deep’ neural networks started making strides in speech and image recognition a few years ago, hope was renewed that these ‘Second’ Wave’ technologies would solve complex natural language understanding and conversation. That AI bots based on these technologies could be trained enough through exposure to conversation examples. Through both practical experience and theoretical limitation, it can be seen that sophisticated meaningful conversation cannot be achieved through ‘second wave’ technology.
So, what is the Third Wave of AI? Will it bring us true artificial intelligence ?
Research conducted by the Aigo Team demonstrates that a ‘cognitive architecture’ incorporating the elements of high-level cognition can achieve true intelligence. Among other thing, this requires implementing short- and long-term memory, context, goals, focus, real-time learning of knowledge and skills, and meta-cognition.
This approach agrees with opinions expressed by AI luminaries: “unsupervised learning is key ” (Geoffery Hinton & Yann LeCun), “we need to look at how our brains work ’ (Demis Hassabis), and “we need more built-in structure ” (Gary Marcus ).
Cognitive architecture implementations have been around for several decades but have not lived up to their promises.
Typical cognitive architectures have underperformed because ML/DL are often preferred for specialized AI applications, and because they are often improperly implemented.
We see the ‘Third Wave’ being spearheaded by a cognitive architecture (CA) approach – which is officially described as ‘…hypothesis about the fixed structures that provide a mind… and how they work together – in conjunction with knowledge and skills embodied within the architecture – to yield intelligent behavior in a diversity of complex environments’.
Various CA implementations have been around for several decades but have not generally lived up to their promise. It is worth noting that until about 2012 ML and neural nets had also ‘been tried for several decades, and not worked’. They didn’t work until they did work (thanks to Big Data and computing power, and a few new insights).
In our opinion, the reason that CAs have generally underperformed is that (a) for many narrow AI applications ML/ DL are hard to beat, and (b) they are not implemented correctly. Here’s a schematic of a traditional CA:
The key problem with this design is the high degree of modularity – parsing is separate from memory and context, which in turn is isolated from inference, and so on. Our brains/ minds don’t work this way – they are highly integrated, facilitating the synergistic support of each function with all the others.
That is why we designed our architecture from the ground up to be highly integrated and synergistic – for example, our deep semantic parser has full access to short- and long-term memory, context, goals, and inference. Moreover, all cognitive components in our system work against a common knowledge/ skills semantic graph.
The Aigo architecture is integrated and synergistic. The deep semantic parser has full access to short- and long-term memory, context, goals, and inference. Research and development at Aigo.ai, as well as the team’s prior commercial implementations, proves that Aigo’s unique cognitive architecture addresses the key requirements of high-level intelligence:
The Aigo intelligence engine inherently implements the following essential functionality:
Further built in skills include ad-hoc natural language learning via conversation and reading, question answering, reminders and alerts, lists and notes, messaging, calendar management, selected IoT control, and learning interaction preferences.
Knowledge and skills are implemented in three fully-integrated, compatible layers: core, app, and user.
Core knowledge and skills are common to all applications and are constantly expanded and improved. Application specifics include domain specific ontologies (medical, legal, or for example, crypto trading), skills, and interfaces to external systems. User specifics include personal (and potentially private) information taught, custom preferences and default behavior, as well as unique skills taught by the user.
This design allows users to customize and develop both at the app level, as well as the individual level, while providing for ongoing, continuous improvements of common (sense) knowledge and skills.