 |
| |
 |
| |
|
| |
 |
Coauthored with Ted Goertzel
I remember heading home from college for spring break in
1983, toward the end of my freshman year. I’d just
recently turned 16, and I’d been thinking about AI
a hell of a lot – even more than about my new girlfriend,
Rachel Gordon, whom I was pretty darn crazy about at the
time. A few days before spring break I’d tried to
explain my theories on artificial intelligence to my friend
Ken Silverman. Ken couldn't understand what I was talking
about, so I promised him I'd work on it over spring break,
and that when I got back to school I’d explain how
it all worked, I’d give him a complete design for
a thinking computer program. I had the idea clear in my
head, but I was totally unable to articulate it in a way
that Ken or anyone else could understand. I spent the whole
break working on it, and during those few days I basically
worked out the ideas that I’d later put in my first
book, The Structure of ntelligence, six years later. I went
through every aspect of the mind - reason, memory, aesthetics,
intuition, emotion, etc. - and convinced myself that every
one could be expressed in terms of pattern recognition and
pattern formation. The mind, I concluded, was a pattern
recognition system that recognized patterns in the world
around it, and – very crucially -- also recognized
patterns in itself. Recognizing patterns in itself, it formed
patterns within itself, continually giving rise to new structures.
After the break, I still wasn’t able to explain my
realizations to Ken in a way that made sense to him, but
at least things made a little more sense to me. I knew I
had to find a mathematical language to make sense of my
intuitions, or I’d never be able to communicate them
to anyone, let alone program them on a computer. My grasp
of software design at this stage was extremely weak; it
was formed mainly by programming games in BASIC. I was nowhere
near having the skills to design a general pattern recognition
system that recognized patterns in itself and adjusted itself
accordingly.
Ken's dad was an extremely smart guy and a prolific and
successful inventor, mostly in the area of electrical engineering;
and in our college years, Ken often fantasized about becoming
a rich inventor and building a mansion, with a basement
laboratory in which we’d putter around day and night,
wiring together intelligent robots and time machines and
so forth. So it’s pretty funny that 14 years later
when I decided to start an AI company (Intelligenesis Corp.,
later renamed Webmind Inc.) I somehow happened to turn to
Ken when I needed someone to take over the job of programming
my AI system, the Webmind AI Engine.
I hadn't spoken to him for years – he’d stayed
in the New York area, where he had grown up, whereas I’d
moved all over the world, teaching in universities in Las
Vegas, New Zealand and Australia. After getting his degree
in electrical engineering, he’d done a lot of different
things, including real estate and computer programming.
He was really psyched to finally get the chance to collaborate
with me on my thinking machine project. Finally, after a
decade and a half, I had figured out how to express my plan
for artificial intelligence in a way Ken could understand!
Ken was the lead engineer at Webmind Inc. for its first
couple years, and VP of Technology for the entire lifetime
of the firm. Now I’m working with a different crew
of engineers, and Ken is working on his own advanced pattern
recognition software, but we’re still good friends,
and he definitely played an important role in the evolution
of my work.
To articulate my vision of the mind in a comprehensible
form was much much harder than I’d ever thought it
would be. It turned out that the vocabulary for expressing
what I wanted to say didn’t really exist in the field
of computer science. To find the language I needed to express
my ideas and to work out the details, I had to step a long
way back from the world of computers and get deeply into
the philosophy of mind. Although I was very young then,
and even more naïve than I am now, I realized intuitively
that it was necessary to get the philosophy right before
proceeding to the computational details. Now, I’m
jaded by a fair amount of practical experience – though
I don’t have a head full of gray hair yet –
and I see this far more clearly than I did then. In implementing
a general vision of how the mind works, it’s very
easy to be misled by the nature of contemporary computer
hardware and programming languages, and to wind up implementing
things that subtly deviate from the vision one started out
with. The way to avoid this is to have the conceptual, philosophical
vision very firmly fixed in one’s mind as one sets
about the detailed design work, which is huge and at times
confusing.
What I’m going to give you in this chapter is a fairly
sketchy, but hopefully evocative, overview of the process
of creating Webmind and then Novamente. The two AI systems
are very different on a technical level, but on the level
of a popular exposition like this one, the differences are
really pretty small. Novamente uses more sophisticated mathematics
and more efficient software structures to implement the
same basic concepts that Webmind did. To keep things from
getting confusing, I’ll write mostly about “Novamente”
here, except where I’m talking historically about
the creation of Webmind in particular; but most of what
I’ll say about Novamente also applies to Webmind.
The Novamente project is far from complete. Just like every
other AI researcher, I’m an abject failure so far
– I haven’t yet created a software program displaying
human-level general intelligence. Unlike most other AI researchers,
however, I and my colleagues honestly believe we are on
a path that will lead us to success at this ambitious goal.
I don’t expect to convince you one way or another
in these pages – my hope is merely that the story
of our quest may be an interesting one … and that
some of the lessons we’ve learned along the way may
be of general value.

I
knew from the start that I didn’t want to build an
artificial idiot savant – an overspecialized, brittle
system as was typical in the AI field. I wanted to build
a mind.
But what is a mind, anyway?
In that spring break, sophomore year, that I spent trying
to figure out how to explain my vision of the mind to Ken,
I arrived at a basic working definition of the mind: a mind
is the set of patterns in an intelligent system.
Your mind is not your brain, nor is it some disembodied
soul somehow exchanging messages in your brain. Your mind
is the set of patterns in your brain – the structures
and processes in your brain, so that knowing these structures
and processes allows you to explain the brain more simply
than just listing the parts of the brains and their positions
and states over time.
Novamente’s mind is not the C++ code that my engineering
team and I type in – that’s just a code for
creating the mind, a little like DNA is the code for creating
a human. Novamente’s mind is the set of patterns in
the billions of 0’s and 1’s existing in RAM
while Novamente runs, cycling through the machine’s
processors and passing through the network cables. These
0’s and 1’s themselves are not Novamente’s
mind – it’s the patterns in these 0’s
and 1’s, the static and dynamic patterns, that are
mind. Mind is a set of patterns in a system that achieves
highly patterned goals in a highly patterned environment.
Everything is pattern, pattern, pattern!
Mind recognizes and creates patterns in the world and itself,
achieving complex goals, goals whose definition involves
a great deal of pattern.
Although these ideas were clear to me intuitively in 1983,
it wasn’t till 1990 or so that I was able to write
them down in a clear and comprehensible way. This is what
I did in the first few chapters my first book The Structure
of Intelligence. At that point I had gotten my PhD in mathematics
and was supposed to be doing mathematical research, but
just as I’d always been more interested in my own
reading and thinking than in my schoolwork, now I was spending
my time thinking about pattern and mind and the nature of
the universe, instead of proving math theorems like a good
assistant professor. The next step was to ask the question:
What are the principles by which a set of patterns, a mind,
can actually be intelligent? For sure, the precise structures
and dynamics are going to vary from one mind to the next,
but are there any general principles, applicable to every
kind of intelligent system, be it a human, a dolphin, a
computer program, an intelligent gas cloud on Jupiter? It’s
not totally obvious that there are such principles, but
my belief starting out was that such general principles
had to exist. What are the principles by which mind’s
core algorithm - pattern recognition and formation in itself
and the world -- is self-regulated?
One general principle is what the 19’th-century American
philosopher Charles Peirce called the “One Law of
Mind”: that things in the mind tend to spread attention
to other related things in the mind. This is a basic principle
for attention allocation, that we can see in the brain in
the diffusion of electricity. Novamente incorporates this
via activation spreading similar to that in a neural network.
This is what I call a “heterarchical” principle
– where a heterarchy just means a sprawling network
in which each element connects to a few other elements,
without a hierarchical structure. A random network in which
each node connects to a set of other nodes at random is
a heterarchy.
Hierarchy is another important structure of the mind. We
see it in the human brain all over the place, most famously
in the visual system, where we have a hierarchy of progressively
more abstract processes, starting with rccognition of lines
and edges, then shapes, then 3-D forms, and so forth. Hierarchy
in the mind has to do with increasing abstraction, and with
control that’s aligned with abstraction, so that processes
dealing with more abstract things control related processes
dealing with more concrete things.
A general principle that I’ve thought a lot about
– and that I wrote about in my second book, The Evolving
Mind -- is what I call the “dual network” –
this refers to the interpenetration of hierarchy and heterarchy.
In the mind, hierarchy and heterarchy overlap each other,
and the dynamics of the mind is such that they have to work
well together or the mind will be all screwed up. The overlap
of hierarchy and heterarchy gives the mind a kind of “dynamic
library card catalog” structure, in which topics are
linked to other related topics heterarchically, and linked
to more general or specific topics hierarchically. The creation
of new subtopics or supertopics has to make sense heterarchically,
meaning that the things in each topic grouping should have
a lot of associative, heterarchical relations with each
other. In Novamente, this general “dual network”
principle is reflected in many ways, when one gets down
into the details of its various dynamical processes.
Another general principle is self: that minds contain parts
of themselves that mirror the whole. This gives a quasi-fractal
structure to the mind.
Another general principle, also discovered by Charles Peirce,
is that there are three kinds of reasoning: induction, abduction,
and deduction. These are all ways of manipulating hierarchical
relationships. Hierarchy is about logic, whereas heterarchy
is about the spread of attention and the formation of wholes.
Once heterarchy has lead to the formation of new wholes,
corresponding to clusters of things that all relate to each
other, then these new wholes can be dealt with hierarchically,
they can be reasoned about. I was very fortunate, a month
after Intelligenesis got our seed funding, to get a job
application from Pei Wang, who had worked out a neat computational
reasoning system (NARS) based on the three forms of reason
that I, following Peirce, had identified as essential to
the mind.
There are also two dynamics that I believe are generally
part of mind. These correspond to the basic philosophical
principles of Being and Becoming.
Becoming corresponds to evolution, considered most generally
as the survival of the fittest members of a population,
and the reproduction of the survivors to form new population
elements. Novamente contains explicitly evolutionary components
– variations on the computational technique called
“genetic programming.” It also contains other
components that aren’t traditionally viewed as evolutionary,
but really are. For instance, Novamente’s reasoning
module involves logical relations (we call them “links”)
that combine with each other to create new logical relations.
The facts “pigs are fat” and “fat creatures
are ugly” combine to create the new relation “pigs
are ugly.” And in the reasoning system, unimportant
relations are deleted to save memory. Thus, we have survival
of the fittest, where fitness means importance to the system,
and we have reproduction of the survivors, via the rules
of inference. Reasoning is seen to be a form of evolution,
in the general sense.
Being corresponds to what system theorists call “autopoiesis”
– an obscure word that has a very useful meaning.
It means self-production. Every cell in the body is produced
by other cells in the body – so the body is a self-producing
system. The mind is also a self-producing system. This is
basically the theme of my third book, Chaotic Logic. If
you remove part of the mind, the other parts of the mind
that relate to it will be able to reproduce it, approximately
if not exactly. If you take out the logical relation “pigs
are ugly” for example, the system may be able to regenerate
it by inference from the other relations “pigs are
fat” and “fat creatures are ugly.” It
may come out with a different strength than it had before,
but it will still be reproduced, perhaps lossily. If you
take out all memory of the text “War and Peace”
from the mind, but retain a lot of related knowledge, this
related knowledge will cause the system to want to read
War and Peace, which eventually will likely lead the information
about the text to be regenerated. In this case, interaction
with the environment is part of the mind’s autopoietic
dynamics.
Evolution changes the system in accordance with its goals
and its environment; autopoiesis keeps the system the same
as it was before. The mind needs both of these forces; they
need to be properly balanced. The balance of these leads
to productive creativity, and this was the main theme of
my fourth book, From Complexity to Creativity.
I arrived at my list of general principles of the mind by
a kind of unholy combination of introspection, mathematical
analysis, and survey of biology, psychology and computer
science. I spent a long time trying to prove mathematically
that all these general structures and dynamics, and a few
others, were necessary and sufficient for mind – any
system having them would have a mind, and any system not
having them couldn’t have a mind. But eventually I
gave up; I decided that the mathematics of today is not
adequate for proving this kind of thing. I gathered my various
insights and intuitions and conclusions about how the mind
worked, and gave the list a name: the psynet model of mind.
Psynet = “mind-network”, a theory of the mind
as a network of interacting, intertransforming agents. I
realized that the conceptual picture of the mind that I’d
developed was of significant value in itself, apart from
any mathematical formalization I might give it. No one else
working in the AI field seemed to me to have a similarly
comprehensive and powerful conceptual analysis of the mind.
I still think inventing the needed mathematics to usefully
and completely formalize the psynet model is an interesting
challenge – but it’s not as interesting to me
right now as using my intuitions about the general structures
of intelligence to build thinking software.
The general structures and dynamics of the “psynet
model” can be manifested in many many different ways,
in different systems. The process of building Webmind, and
then Novamente, has been in this sense a top-down process.
I started out with an idea about what general principles
had to emerge from the system to make it intelligent, and
this placed a constraint on what the system had to be like.
It had to be built so as to make the right general structures
and dynamics emerge. Aside from that, I didn’t care
very much exactly what the system was like. I had, and still
have, an attitude of being willing to learn via experimentation
in this regard.

My
first serious attempt to build a real AI system (earlier
chatbots and abortive experiments not counted) occurred
in 1994. I used a programming language called Gofer, which
I later benchmarked at 1/10,000 the speed of C (the standard
programming language in the commercial world). Gofer was
a beautiful language, which matched up nicely to my vision
of the mind. This program was called Antimagicians; it was
a population of actors called magicians, and antimagician
actors that annihilated the magicians in complex patterns.
Just about all it ever did was produce a type of error called
a “stack overflow.” This was a shame, because
my model of mind was very simple and compact in this programming
language. But it could only run on one machine, and it ran
incredibly slowly; the only thing it did fast was use up
all the machine’s memory.
Gofer was a “functional” programming language,
meaning not that that it performed useful functions (far
from it!), but rather that it was based on the mathematical
concept of a “function.” Gofer was basically
equivalent to mathematics. It appealed to my sense of formal
elegance; it was perfect in the sense of a Bach fugue. Unfortunately,
though, functional languages do not match well to the von
Neumann computer architecture, so it is very hard to make
them efficient without special hardware. After the debacle
of my stack-overflowing proto-AI system, I abandoned Gofer
and turned back to C++, and then to the new programming
language Java. But I restricted myself to more modest programming
experiments. I made a C-language version of Antimagicians,
which was much simpler and less interesting than the Gofer
version. In Java, I made a genetic algorithm that ran on
multiple machines (coded together with Rosalind Barr at
University of Western Australia), and a simple actors-based
search engine (coded together with Mark Messenger, also
at UWA). I could see from this experience that, while my
AI system in Gofer had small, because Gofer was made for
expressing systems that refer to and organize themselves,
a comparable system in C++ or Java or any other practical
programming language was going to be huge. It took a couple
years for me to summon the guts to attempt such a thing.
One thing that occurred to me as I started to think about
implementation issues, much more than it had in my days
as a pure theorist, was the crucial role of specialization.
My Gofer-based mind had been theoretically capable of intelligence;
it was a general system for recognizing and forming patterns
in itself and its environment. But its generality didn’t
allow it to solve any particularly useful problems within
practical time and space constraints. In that sense, it
had been a miserable failure as an intelligent system. In
practice, I concluded, to get reasonably efficient intelligence
one needs to code specialized cognition algorithms, aimed
at recognizing patterns in particular kinds of data, learning
how to carry out particular kinds of actions, and so forth.
The brain is very much like this: we have 30% of our brain
specialized for visual pattern recognition; regions specialized
for language; regions specialized for body sensations; regions
specialized for social interaction; etc. etc. And then we
have a little bit of general intelligence, which is what
makes us uniquely brilliant among the animal kingdom –
but this general intelligence relies on all the specialized
stuff to give it a meaningful context within which to operate.
Specialization needs to be mediated by rich interaction
between specialized parts. The different specialized parts
of a system need to learn from each other, and learn about
the world together whenever they can. The integration of
various specialized pattern recognition subsystems has played
a huge role in practical Webmind engineering.
Because of all this specialization, it seemed to me in 1994
and 95 that there was no way to build a thinking computer
program on contemporary computer hardware. It seemed to
me that some kind of humongous brainlike supercomputer would
be necessary. And then I discovered the Internet (unlike
Al Gore, I didn’t invent it!). It struck me that the
millions, soon billions, of machines around the world, all
hooked together on the Net, had enough memory and processor
power to create a real computational intelligence. The Java
programming language came out in 1995 and it seemed the
right tool to use to create a networked AI engine embodying
the general principles of mind: recognizing and creating
patterns in itself and the world, using a variety of specialized
methods integrating together into a whole, an evolving autopoietic
whole.
Not only did the Internet give you the computational power
to build a thinking machine; it also provided a really rich
perceptual environment. A mind can’t exist in isolation;
it has to achieve complex goals in a complex environment.
The physical world is obviously complex but building a robot
body is another huge project, comparable in scope to building
a mind. The Internet is arguably rich enough in diverse
details to support intelligence, and it’s a lot easier
to hook your AI system into the Internet than to build it
a robot body. I made up my own complex goal: To build an
AI system whose body was part of the Net, and whose perceptual
world was the Net itself, the Web. A mind for the Web; a
Webmind.
In terms of the conception of intelligence as “achieving
complex goals in complex environments,” the goals
I had in mind when designing the Webmind system were roughly:
* Conversing with humans in simple English, with the goal
not of simulating human conversation, but of expressing
its insights and inferences to humans, and gathering information
and ideas from them.
*
Learning the preferences of humans and AI systems, and providing
them with information in accordance with their preferences.
Clarifying their preferences by asking them questions about
them and responding to their answers.
·
* Communicating with other AI systems, in a manner similar
to its conversations with humans, but using a mixture of
human language and a more formalized and precise computerized
language we have created, called Sasha
*
Composing knowledge files containing its insights, inferences
and discoveries, expressed in Sasha or in simple English.
*
Reporting on its own state, and modifying its parameters
based on its self-analysis to optimize its achievement of
its other goals.
Of course, my ambitions didn’t end there – that
would be wimpy. Subsequent versions of the system were intended
to offer enhanced conversational fluency, and enhanced abilities
at knowledge creation, including theorem proving, scientific
discovery and the composition of knowledge files consisting
of complex discourses. And then of course the holy grail:
progressive self-modification, leading to exponentially
accelerating artificial superintelligence!
I remember a particular moment when my diverse ideas about
AI crystallized in my mind, with amazing clarity. I could
see in my mind exactly how an AI system could be built.
Now all that was left was to work out a few pesky details.
At this point, it had been 13 years since I’d first
set myself the goal of building a thinking machine. I now
had a PhD in math, and had spent countless thousands of
hours studying of cognitive science, physics, computer science,
neurobiology, philosophy of mind. I’d published four
books on the mind, which were idiosyncratic combinations
of mathematics, philosophy and science, all pushing in the
same direction, toward an understanding of the mind that
was both fundamental and precise. I felt I finally had the
answer. And it seemed that the hardware was finally getting
there too. We had cheap computers with gigabytes of RAM,
and we had high-bandwidth Ethernet and Internet, allowing
distributed computing among dozens or even millions of these
powerful, cheap machines, etc.
It all seemed incredibly clear to me. Mind was exquisitely
simple in essence. A mind was a web of patterns, a network
of independent mind actors, each one concerned with recognizing
patterns in other actors, and patterns emergent between
itself and other actors. New actors were created to embody
new patterns. The overall network of mind was continually
re-making itself via recognizing patterns in itself. The
character of a particular sort of mind was determined by
the assemblage of pattern recognition/creation actors inside
it. The art of mind design – an as yet nonexistent
art – would consist of choosing the right assemblage
of types of actors so that the emergent self-reconstructing
behavior of mind would get into a productive dynamical attractor.
From my 13 years of thinking about human and artificial
intelligence, I felt I had a good idea how to choose and
design the right mind actors, so that when these actors
were released to study and transform one another, the self-reconstructing,
self-recognizing dynamic characteristic of mind would emerge.
And so in the fall of 1996 I started creating the Webmind
AI Engine. As I’ve said, I’d been working on
similar things off and on for years; but the actual design
of the Webmind system as it is now was something I started
in the fall of 1996, when I was in Western Australia, working
at UWA as a Research Fellow. Soon enough this got more interesting
than anything else I was working on -- I realized that I
was on the verge of something really cool, and something
that I wasn’t going to be able to implement myself,
or with a couple research assistants. John Pritchard, my
New York e-mail pal, was convincing me that it was plausible
to get funding to start a company building software according
to my designs. The idea was appealing.
At the start of 97 I quit my job at UWA and moved to the
US to work on Webmind design and coding full time. I didn’t
have any clear business plan in mind, but I figured that
once I got some clearly intelligent behavior working, the
venture capitalists would beat a path to my door. Naively
enough, I figured I’d reach that point after a few
months hard work. I figured that after I got some basic
stuff working, I could raise a few hundred thousand dollars
to pay perhaps 5 programmers, and then we’d get the
whole thing implemented in 6 months time – presto!
a thinking machine. Fame and fortune, and truckloads of
beautiful girls, would be mine.
What I had at the end of summer 1997 was ten thousand lines
of Java, largely designed as I went along. This system was
never completed, and of the parts that were completed only
half of them worked. There were lots of details I didn't
understand. This first serious attempt at Webmind had too
much of my theory of mind in it, and not enough computational
practicality. It was beautiful as a mathematical and logical
statement, but still horrible as a computer program. I still
was too closer to Gofer, and hadn’t come to grips
with what I’d have to do to make a useful, efficient
implementation of my model of mind.
But still, the ideas, data structures and dynamics underlying
this first Webmind were conceptually about the same as the
ones underlying Novamente today. The mathematics and the
software design have both changed tremendously, but the
underlying vision is the same. Novamente, like Webmind before
it, is based on the idea that the mind is a collection of
patterns that forms and recognizes patterns in itself and
the world, and in this way achieves complex goals in the
world. It makes this vision concrete by defining some simple
software objects corresponding to patterns and goals.

In
the mid-90’s, starting out on Webmind design, I had
basically a comprehensive knowledge of what was happening
in the AI world. It was a mess. It’s basically the
same way today. There’s no well-understood, commonly
accepted body of scientific knowledge about AI. Instead,
there’s a vast diversity of approaches to various
aspects of the relationship between computation and mind.
Some of the approaches contradict each other and some of
them complement each other. Designing Webmind was a process
of assembling information from various different perspectives
and disciplinary areas into a coherent whole, guided by
a set of governing principles.
Many different subdisciplines within the AI umbrella contributed
to the structuring of Webmind, and then Novamente. Some
of them I was thinking about when I first started designing
Webmind, others emerged as being significant more recently,
further along in the design process, in some cases only
in the transition from Webmind to Novamente. Table 1 gives
an overview of the sorts of things that Novamente draws
from various disciplines. It may be a bit opaque to the
nontechnical reader, but it will mean something to the reader
with some computer science background, and perhaps to others
it will be at least generally evocative.
|
Cognitive
Psychology |
From
cog psych we have taken a number of high-level
structural principles, for instance the notions
of Long-Term Memory, Episodic Memory (memory
of your own history), and Short-Term Memory;
and the distinction between procedural (knowledge
of how to do thing) and declarative knowledge
(factual knowledge).
|
| Introspective
Psychology |
Modern
cognitive psychology is experimentally focused,
but past traditions in psychology have more
openly drawn their inspirations from introspection,
from what each mind intuitive knows about itself.
The overall structure of Novamente owes something
to ideas drawn from these traditions, from Gestaltism
to Buddhist psychology and Peircean philosophy.
|
| Neuroscience |
From
neuroscience we have taken the observation that
mind can be implemented by a parallel distributed
system with activation spreading around it
in complex patterns – i.e. a ‘neural net’, broadly
conceived. We’ve also taken our approach to
localization from what’s known about the brain:
in Novamente, knowledge is distributed, but
not across the whole system; each type of knowledge
is distributed across a part of the system,
just as is done in the brain.
|
| Complexity
Science |
The
emerging science of complex systems has contributed
crucial concepts such as self-organization,
evolution, autopoiesis and emergence. Novamente
is a modular system in which the real intelligence
emerges from interaction between the modules.
Like many complex systems, it displays behaviors
like phase transitions and sensitivity to initial
conditions, and evolution-ecology interactions.
|
| Nonlinear
Dynamics |
One
of the more rigorous subsets of complexity science,
nonlinear dynamics studies the attractors and
transient patterns that emerge as nonlinear
systems evolve over time. Novamente is a highly
nonlinear dynamical system whose attention is
allocated by complex attractor dynamics, and
that specifically studies transients in its
own dynamics so as to self-adaptively modify
its own structure.
|
| Statistical
Pattern Recognition. |
In
its analysis of numerical data (e.g. financial
forecasting) and its lower-level linguistic
processing, Novamente makes use of statistical
pattern recognition tools. What makes it unique
is its ability to integrate statistically recognized
patterns with other types of knowledge, and
to generalize from this knowledge via inference
and other mechanisms.
|
| Multi-Agent
Systems |
With
the advent of distributed and parallel computing,
there is a substantial body of knowledge about
how to make populations of computational agents
cooperate to carry out useful activities. Novamente
is a multi-agent system, albeit a very unusual
one, and its system architecture makes use of
principles from this area of computer science
in many ways.
|
| Computational
Linguistics |
The
last decade’s explosion of knowledge in computational
language processing has produced many techniques
of use within Novamente. The challenge has
been to get all these tools working together
in a common framework focused on extracting,
creating and producing meaning rather than on
syntax analysis
|
| Expert
Systems |
Novamente
allows humans to enter expert knowledge into
it via XML, Sasha or other special formal languages,
similar to standard AI expert systems. Unlike
expert systems, though, it doesn’t take this
knowledge as truth: it takes it as information
given to it by another mind, and feels free
to forget it or modify it as it sees fit.
|
| Machine
Learning and Optimization |
Machine
learning and optimization algorithms are not
real AI systems but they do solve problems that
are crucial to the mind. Novamente uses genetic
algorithms, genetic programming, and statistical
machine learning techniques for various purposes,
internally.
|
| Logic |
While
Novamente is not a logic system in the traditional
sense, it makes use of the reduction of general
relationships to a simple relational formalism,
which was pioneered by mathematical logicians
and logic-inspired AI engineers. It manipulates
relationships using uncertainty-robust, self-organizing
reasoning techniques different from those used
in the logic or AI literature
|
|
Table
1 - Novamente’s Diverse Inspirations
|
Obviously,
this laundry list of component technologies doesn’t
really tell you a damn thing about Novamente. That’s
because the crux of Novamente lies, not in the component
technologies, but in the way these technologies are structured
to form a coherent self-organizing system. But still, the
presence of all these tools made the process of building
Novamente very different than it would have been if none
of the tools existed, and you had to build every component
technology from scratch. Rather than just “how do
you program a mind on current hardware and software?”,
the question becomes more like “Given all these wonderful
tools, and amazingly powerful distributed hardware on which
to implement them, how can we tie them all together in a
harmonious and mutually adaptive way to produce a mind?”

Given
the general conceptual framework I’ve described, and
the practical and conceptual toolbox I’ve listed,
the first step toward actually designing Webmind was deciding
what the “atomic mental object” should be.
Bigger than a neuron, smaller than a machine, was the first
decision. I created a Java object called a Node. A node
is the most basic kind of pattern known to Webmind –
it’s something Webmind recognizes as a whole. A node
says, “This thing is worth distinguishing from its
environment as a whole entity. Here it is. It persists and
maintains its boundaries over time.” We have some
nodes referring to external sensed objects: TextNodes, DataNodes,
WordNodes, and so forth. We have some nodes representing
patterns recognized in the system itself rather than in
the outside world: CategoryNodes of various kinds, AutomatonNodes
representing evolved patterns, etc. There are nodes called
SubgraphImageNodes that represent parts of the mind, grouped
with a boundary drawn around them so as to be considered
as a kind of higher-order individual. And so on, and so
on, and so on.
But nodes are just the start. Webmind is also wired to recognize
certain kinds of patterns involving nodes. Similarity is
the most basic kind of pattern: it’s the recognition
that two different things, occurring at different points
in space or time, are actually a lot like each other, and
can be interchanged for many purposes. Inheritance is also
basic: it’s the recognition that you can substitute
A for -- (though maybe not -- for A) without substantial
loss of information.
How many link types to incorporate was a big question. In
the AI systems known as semantic networks, you have a different
type of link for every relation in the net – a link
type for kick, a link type for eat, and so forth. On the
other hand, in a typical neural net model you have only
one link type; whereas in the brain, there are many types
of neurons and synapses – hundreds of link types,
if you identify a link type with a synapse that’s
reactive to a certain neurostransmitter.
In designing Webmind, we didn’t want to introduce
too many types of links, because this just leads to a network
that represents data in ways it doesn’t understand.
We chose to use a few dozen link types, representing what
I think of as archetypal types of relationships.
What kinds of relationships are “archetypal”
for Novamente? Here I’ll just give a few important
examples. We have similarity links, representing the belief
that one actor is similar to another. There are inheritance
links, representing the belief that one actor is a special
case of another. There are spatiotemporal links, representing
the belief that one actor represents something occurring
near the other one in time or space. There are containment
links, representing the belief that the entity represented
by one actor is contained inside another one. There are
associative links, representing simply the fact that Webmind's
dynamics tend to associate one actor with another. This
chart shows the definitions of these links in a bit more
systematic way:
| Link
Type Pointing from A to B |
Meaning
of the Link |
| Similarity |
A
is similar to B |
| Inheritance: |
|
|
by Extension |
A
is a special case of B |
|
by Intension |
B
is a special case of A |
| SpatioTemporal |
A
occurs at the same time and place as B |
| Temporal |
A
occurs at the same time as B |
| Before |
A
occurs before B |
| After |
A
occurs after B |
| Containment: |
|
|
Part of |
A
is a part of B |
|
Contains |
B
is a part of A |
| Associative |
B
is associated with A |
|
HaloLink |
B
is associated with A by Webmind's Dynamics |
These
link types, and others refining and extending these, are
the elemental types of relationships that Webmind “understood.”
They are a bit, but not a lot, like the various neurotransmitter
receptors in the brain, which make different synapses different.
The brain's receptors do not correspond so neatly to logical
relations. But Webmind is not a brain; it is a mind that
emerges out of digital computer hardware. Digital computer
hardware is closer to logic than cells are.
These links are heterarchal in a sense; any node can link
to any other node. But they are also organized in hierarchies
of composite actors representing, not specific relationships
like links, but collections of relationships. Nodes contain
links; nodegroups contain nodes, lobes contain nodegroups,
and the mother of them all: the Psynet, the whole Webmind,
that contains a lobe for each machine in its network. The
basis of it all is the node: a node containing a bundle
of links expressing its relationship to other nodes, and
also some basic data objects and actors and roles. Nodes
sending out messages -- information gathering and information
carrying actors -- of various types to help them build new
links to other nodes. A gigantic network of interlinked
actors, constantly rebuilding itself, extending across multiple
CPU's and multiple machines.
The nitty-gritty engineering needed to make this all work
is considerable indeed. But the basic concepts are elementary.
It's nothing but Peirce's network of relations, each spreading
attention to the other relations that it stands to in a
peculiar relation of affectability. It's nothing but Nietzsche's
dynamic quanta, each one defined in terms of other dynamic
quanta, each one re-creating itself and each other. It's
beautiful and primal -- but it's not intelligent, without
more detail, more specialization. It’s like the brain
of an infant. All the core abilities are there, but intelligence
develops as it incorporates and processes specialized information.
It’s easy to see how both node and links are patterns
in the sense that they allow one to compress information.
If two parts of something one is describing are similar,
one can save effort by not describing the second one in
detail and just describing it approximately by reference
to the first one. For instance, to describe a picture consisting
of two similar heads, you can draw one head and then just
say “imagine two of these next to each other.”
If one of the parts of the picture inherits from the other,
one can save effort by replacing the more specific one with
the more general one. Of course, there is a loss of information
here. Suppose half of the picture is a general human shape,
and the other half is my shape. My shape inherits from the
general human shape, obviously. But if you describe the
picture by drawing the general human shape and saying “two
of these,” you’re losing a fair bit of information,
though certainly not all of it.
Similarity and inheritance are logical relations, logical
patterns. We also have purely observational patterns, like
temporal relatedness, spatial relatedness, and part-whole
relatedness. And we look for general association relations:
When the system thinks of X, what Y comes to mind? This
Y stands in an associative relation to X.
Nodes in Webmind contain links to other nodes, each link
embodying one of these basic inter-node relationships: similarity,
inheritance, part/whole, spatial, temporal, associative.
Nodes and links are the two levels of pattern that are automatically
and instinctively recognized by Webmind: nodes representing
perceived wholes carved out of the chaos of the world or
mind, and links representing patterns perceived among the
nodes.
We then have special methods of building links. The method
we used most in Webmind (but have basically abandoned in
Novamente) was one I came up with in 1996, inspired by Web
spidering, called Wandering: we have actors that move around
through the network of nodes, traveling from node to node
along links, looking for nodes that are strongly related
and should be joined by new links. This particular method
of link formation may or may not be the best. The key point
is that there is some dynamic by which new and relevant
links are continually formed.
Relevance is determined by how much “activation”
each node has, and activation is spread through the network
by Peirce’s Law of Mind, which is the same at to say,
by basic neural net activation spreading. The Java object
that carries activation through Webmind, we call a Stimulus.
Associative links are built by a process we call “halo
spreading,” in which a node gets active and then measure
how active other nodes become as a consequence, after a
certain period of time. It spreads Stimuli to other nodes
and then collects them after a while, observing how stimulated
they’d become.
Again, there are a lot of ways of doing these things, and
the current ways may or may not be the best. The exact method
of spreading activation or halos is not crucial to Webmind,
but rather just the overall character of the patterns being
recognized and formed.
Halo spreading and reasoning and wandering form new links,
but it’s also crucial to form new nodes, and this
is done by combining old nodes in various ways (fusing them,
splitting them) and also be explicitly evolving new nodes
to satisfy various goals using special nodes called EvolverNodes.
The achieving of goals, crucial to intelligence, is done
using nodes that we now call SchemaNodes, which contain
little programs that control aspects of perception, action
and thought. Perceptions from the outside world come into
Webmind and are translated into nodes right away. These
nodes link to other nodes representing contexts that the
system is operating in, and these contexts link to SchemaNodes,
representing things that might be desirable to do. The goals
as well as the contexts link to the schema, so that the
hottest schema will be the ones that are relevant to the
current goals in the current contexts. Schema look into
the long-term memory of the system and grab out the various
nodes and links contained therein.
There’s also a SelfNode, recording the history of
the system – what psychologists call “episodic
memory” – and predicting the future of the system,
and selecting the system’s goals according to the
metagoal of maximizing system happiness. Yes, we have a
Happiness FeelingNode, and nodes for other basic emotions,
complex emotions being considered combinations and mutations
of simple ones. What makes the system happy – we get
to decide at first, until it mutates and modifies its own
HappinessNode just like we do. Right now, it likes to answer
questions people ask it, it likes to save memory, and it
likes to build a lot of high-strength links – i.e.,
to discover a lot. Schema look into the SelfNode to get
their overall motivation.
Many goals involve making others happy, and for this, models
of other minds need to be maintained; this is done in UserNodes.
There is a loose mapping between these data structures and
things in the brain. Nodes are a bit like neuronal groups
– clusters of 10,000 to 100,000 neurons, that sort
of act as a unified whole. Links are sort of like bunches
of neural connections between one cluster and another. This
intuitive mapping onto the brain can be useful, and it’s
surely not a complete fluke that the structure of the brain
is a lot like the structure of the mind that emerges from
the brain. On the other hand, it’s important not to
overblow the very loose neural modeling aspect of Webmind.
Webmind was supposed to be a mind, not a model of the human
brain, and it’s a definite failure at being a model
of the human brain, not surprisingly.
There’s a lot of complexity here, just like in the
brain. But basically, Webmind's architecture was that of
a massively parallel network, a population of many, many
different information actors – nodes, links, wanderers,
Stimuli spreading activation and collecting halos. The nodes
continually recompute their relationships to other nodes.
Queries put to the system are transformed into nodes that
take advantage of WebMind's self-evolving structure to produce
the needed answers.

All
this – plus or minus a few critical details, and a
lot of non-critical ones -- was outlined roughly and erratically
in some documents I wrote during Spring and Summer 1997.
Some things were designed in detail, others just hinted
at. Because so many details were left out, it wasn’t
quite clear to me, at that point, what a humongous system
this was going to become.
This was still pre-Webmind Inc.; I was working in loose
collaboration with a friend and programmer named John Pritchard,
who liked my thinking in a general way, but never really
came to grips with my ideas, except on a philosophical level.
He wanted to approach things by first building a general
Java infrastructure for dealing with AI, and then implementing
my particular AI theories – an approach which makes
sense, but only if the infrastructure is deeply informed
by the AI theories, which wasn’t the case then.
During summer 1997, John and I parted ways, and my friend
Lisa Pazer and I started the company that was initially
called Intelligenesis Corp., and later changed its name
to Webmind Inc. (because American businesspeople seemed
to have too much trouble spelling the orignal name!). At
that point I gave up coding 10 hours a day, turning that
responsibility over to my newly recruited old friend Ken
Silverman, and spending most of my time on design issues.
I was still coding a few hours a day at that point, but
not like before.
Ken learned Java in a couple weeks, and set to work. We
talked on the phone several hours a day, and he coded for
the rest of his waking hours. He ended up creating a new
Webmind from scratch, based on reading and reinterpreting
print-outs of my eccentric, tangled Java code. My first
version had been useless, but had followed the concepts
of my theory of mind fairly directly. Ken's version followed
the structure of Java more so than my theoretical ideas.
It was a colossal step backward in conceptual elegance.
But it had one fantastic redeeming feature: as of February
1998, it finally worked!
OK, in retrospect, it didn’t really work, but it looked
like it worked at the time. It wasn’t made to exploit
multiprocessor machines, or networks of machines. It wasn't
ready to serve as the infrastructure for the global brain.
It was too small to demonstrate any really interesting emergences,
any of the structures of mind I’d identified in my
theoretical work. But it was our first working prototype,
and we rigged it up to do some simple things like read in
a bunch of Web pages or numerical data series, and decide
which ones were similar to each other. No tremendous intelligence
was apparent yet, but we hadn't expected any. We'd built
the infrastructure for intelligence, but hadn't put in the
specialization that would allow the system to display useful
intelligence in particular areas.
It was very simple in concept, but very complex to actually
implement. We had a network of mental entities, each one
related to other mental entities, and each one constantly
revising its collection of relationships. Each node, and
each actor, was an "object" in the Java programming
language, which proved very well suited to our needs. Writing
Webmind meant writing Java "classes" for all the
different kinds of nodes, wanderers and other objects we
needed. Practical problems kept coming up, problems I had
never thought of when I was writing theoretical books and
scribbling notes on the back of photocopied research papers.
For example, what do you do when the system has recognized
too many relationships in itself and has run out of memory?
How do you decide which relationships to cull? How does
the system manage its time, allocating certain amounts of
CPU time to each node to use in building new relationships?
How does the system determine how much time to spend loading
in new information into new nodes, versus building new relationships
among existing nodes? And so on, and so on, and so on.
We also wanted to build up Webmind's thinking power. This
meant we had to keep increasing our palette of specialized
classes of nodes and links representing particular kinds
of relationships and concepts. The real intelligence, I
was certain, would then emerge from the interactions of
all these specialized nodes and links in the self-organizing
network. But before we could get there, there were dozens
of mechanical issues to be worked out, debugged, tested,
tuned.
In the very early days of Intelligenesis, before we got
funding, the work proceeded in pairs, each pair consisting
of me and someone else. Lisa and I worked on the business
plan and tried to raise money. Ken and I worked on the first
Webmind prototype, which ran on a single computer with a
single processor; Ken doing nearly all the coding, me giving
him designs and suggestions through endless phone calls
and meetings. Jeff and I were taking his nonlinear prediction
algorithms and making them more intelligent and flexible,
integrating them with some of my own AI work. Onar and I
were sending back and forth endless e-mails diagramming
what would later become the language learning component
of Webmind’s natural language system. And Paul, in
looser communication with me than the others, was designing
and coding the Pods system, a very nice system for doing
self-organizing computing on multiple machines and multiprocessor
machines.
In the spring of 1998, Ken integrated Webmind with the Pods
system, producing the first Webmind that had a prayer of
actually running on a lot of machines at once. This was
a system which could serve as the foundation for a global
mind. It exploited the power of Java even more fully than
Ken's first version had -- it was more "object-oriented,"
and used Java's network-computing facilities more thoroughly.
And then things went completely crazy. In a mostly good
way. Lisa finally got us funding, and we started hiring
programmers and scientists. People were coding nodes and
links embodying specialized kinds of intelligence. The system
got smarter, and things got far messier.
The most crucial hire was Pei Wang, a Chinese computer scientist
a few years older than Ken and me, who when we hired him
had spent the last 12 years developing a system of probabilistic
logic called NARS, the Non-Axiomatic Reasoning System. Within
a few months, Pei had integrated many of the ideas of his
NARS reasoning system into Webmind, providing us with a
handy nodes-and-links version of probabilistic logic. He
also introduced a lot of ideas into Webmind as a whole,
apart from its reasoning component. For instance, it was
Pei’s inspiration that every link in Webmind should
have four numbers associated with it: a strength telling
how significant the pattern represented by the link is;
a confidence telling how sure we are of the assessed significance;
an importance telling how useful the node is to the system
as a whole; and a decay rate telling you how fast importance
decays for that particular node.
Toward the end of summer 1998, we also hired Cassio Pennachin,
who at that point was just one among a handful of Java hackers
around the world whom I’d recruited through job ads
on Usenet. Cassio lived in Belo Horizonte, Brasil, and first
took on the job of fixing up some code I’d written
for evolving new structures in the mind using a variant
of genetic programming. This was the beginning of what’s
become an Intelligenesis tradition: Brasilian programmers
receive American code by e-mail and respond very politely
with comments like “Excuse me, but would you be terribly
offended if I made a few changes to this code?” Of
course, you say yes, and a few days later you receive a
completely new version of the software, containing exactly
three lines from your original code, but much better designed
and also more efficient.
Cassio proved to be an excellent manager as well as an excellent
software engineer, and I let him accumulate assistants until,
as of now, we have more than half our engineering staff
in an office Belo Horizonte, with Cassio as our overall
Director of Webmind Development. The Brasilians, so far,
have not made any big AI innovations, but the disciplined
approach to object-oriented design that they’ve brought
us has been just as important as our AI innovations, in
terms of getting Webmind, this humongous piece of Java code,
to actually work. The real importance of this aspect of
their work didn’t become apparent until the end of
1999, with their psycore redesign – but I’m
getting ahead of myself.
The rapidly increasing size of the Webmind codebase was
inevitable because the core code Ken and Paul and I had
written wasn't enough for intelligence in any practical
context. It was just a generic intelligence mechanism, a
self-organizing, relationship-building network. As we introduced
more and more specialized nodes into the system, the system
as a whole changed. New problems emerged. We should have
anticipated that this would happen, but we hadn't really
thought about it. We'd been too busy dealing with the challenges
of formulating the psynet model in Java in a network-friendly
way.
To deal with this blossoming of the Webmind code, in the
summer of 1998, Ken and Paul split Webmind into parts. The
central part, the one they had been working on, they called
Psycore. This contained the generic mechanisms for dealing
with nodes, links and wanderers. In a sense, this was Webmind's
operating system, the code that enabled all the parts to
work together. Then there were the Psymodules, one for each
specialized area of intelligence: natural language, reasoning,
numerical data analysis, etc. If we were to decode the DNA
code that generates the human brain, we might find that
it works in a similar way. The "psycore" would
be the DNA code that describes the features that are common
to all neurons, synapses and neurotransmitters. The "modules"
would be the DNA code which describes the distinct features
of the specialized types of neurons (there are dozens) and
neurotransmitters (there are hundreds), and the particular
patterns of neurons, neurotransmitters and synapses that
make up different parts of the brain.
The brain has hundreds of specialized parts devoted to tasks
such as visual perception, smell, language, episodic memory,
and so forth. Each of these parts is composed of neurons
which share certain fundamental features, but each also
has its unique features and capabilities that scientists
are only beginning to understand. Similarly, when a Webmind
is running on a computer, different parts of the computer's
memory are assigned to different tasks. Each of these parts
of the computer's memory draws on the psycore for its basic
organizational framework, and on more specialized modules
for advanced capabilities.
Each of Webmind’s modules is specialized for recognizing
and forming a particular kind of pattern. And all the different
kinds of nodes and links can learn from each other -- the
real intelligence of Webmind lies here, in the dynamic knowledge
that emerges from the interactions of different species
of nodes and links. This is how Webmind builds its own self;
it’s the essence of Webmind’s mind, of how Webmind’s
patterns create and recognize patterns in themselves and
the world to achieve their complex goals.
I’ll give a quick laundry list of modules, without
going into great detail on any of them.
There was a numerics module, containing data processing
actors that recognize patterns in tables of numbers, using
a variety of algorithms, some standard, some innovative.
DataNode embodies nonlinear data analysis methods and it
recognizes subtle patterns that’ll always be missed
by ordinary data mining and financial analysis software.
There was a natlang module, which deals with language processing.
The natlang module represents texts as TextNodes, linking
down to WordNodes representing words in the text, and other
nodes representing facts, concepts and ideas in the text.
It has text processing actors that recognize key features
and concepts in text, drawing relationships between texts
and other texts, between texts and people, between texts
and numerical data sets. These actors process vast amounts
of text with a fair amount of understanding and a lot of
speed.
The natlang module also contained reading actors, which
are used to study important texts in detail. They proceed
through each text slowly, building a mental model of the
relationships in the text just like a human reader does.
These reading actors really draw Webmind's full set of semantic
relationships into play, every time they read a text.
There was a category module, containing actors that group
other actors together according to measures of association,
and form new nodes representing these groupings. This, remember,
is a manifestation of the basic principle of the dual network.
There were learning actors, that recognized subtle patterns
among other actors, and embody these as new actors. These
spanned various modules, including the reason module, containing
logical inference wanderers, that reasoned according to
a form of probabilistic logic based on Pei's Non-Axiomatic
Reasoning System; and the automata module, containing AutomatonNodes
that carried out evolutionary learning, according to genetic
programming, a simulation of the way species reproduce and
evolve.
In the user module there were actors that model users' minds,
observing what users do, and recording and learning from
this information – these are UserNodes and their associated
Wanderers. There are actors that moderate specific interactions
with users, such as conversations, or interactions on a
graphical user interface. And in the self module there are
self actors, wanderers and stimuli that help the SelfNode
study its own structure and dynamics, and set and pursue
its own goals.
Each of these actors involved in the modules had in itself
only a small amount of intelligence, sometimes no more than
that you might see in competing AI products. The Webmind
core – “psycore”, as we sometimes called
it -- was a platform in which they can all work together,
learning from each other and rebuilding each other, creating
an intelligence in the whole that is vastly greater than
the sum of the intelligences of the parts.
The version of Webmind we completed in the summer of 1998
– the first multi-module version -- worked fine for
about a year. We used it to build the modules essential
for Webmind's core intelligence and for several impressive
applications. It included a module for text-based market
prediction; a natural language module for mapping texts
into networks of meanings; several modules for the evolution
of concepts according to different methods; a module for
Webmind's self-understanding; and so forth. The development
of each module was driven by requirements particular to
certain application areas. The financial modules were driven
by the practical need to predict the markets. The natural
language module was driven by the need to parse financial
text, and understand human queries. The concept learning
modules were driven by the need to learn concepts relevant
to financial prediction and to the processing of human queries.
The self-understanding module was driven by the need to
have the system proactively think about things that humans
were likely to ask it about in the future.
At this point, Webmind benefited greatly from the fact that
we weren't just implementing a theory, we were hard at work
developing practical applications. One of the most profound
pieces of advice I’ve ever received about Artificial
Intelligence came from Danny Hillis, who I discussed above
-- inventor of the Connection Machine parallel processor,
founder of Thinking Machines Inc., and an informal advisor
for Webmind Inc. throughout its lifetime. As we sat in the
South Street Seaport in New York eating dinner one day,
he was discussing a major AI company that had worked for
10 years to design an AI system, without considering in
detail any particular application of the system. Lo and
behold, the system had never done anything useful. Danny’s
comment was: “They were brilliant people with good
ideas, but they made a serious methodological error. They
developed their system for years and years, without any
contact with practical applications.” Our software
was saved from this fate by the fact that we were committed
to producing actual products, simultaneously with working
toward the goal of real AI. We were freed up to commit other
major errors instead!
The Webmind AI Engine itself was never used inside any production-version
software products, but it was used to prototype a number
of AI processes that were later re-implemented inside products.
One of these products, the Webmind Market Predictor, will
be discussed in detail in the following chapter. The reason
the Webmind AI Engine wasn’t used directly in products
was basically that it was too slow-running, and plagued
by hard-to-excise bugs. The Novamente system, as I’ll
discuss a little later, is a more mature effort and doesn’t
have these problems, and it’s being directly used
inside some software products we’re developing for
the bioinformatics market.

Working
on practical problems in parallel with grandiose long-term
goals was valuable – but it had its disadvantages
as well. It pushed us to overspecialize the system, hyperdeveloping
those portions that were needed for products, rather than
developing the whole system in a more evenly-balanced way.
Most of our code was good for the specific tasks it specialized
in, but we had not gotten to the stage where all the modules,
all the different node and link types, were working together
in one big multi-machine Webmind. We were producing cool
research software, but not the global brain I had dreamed
of. We hadn't yet seen the emergence of the dual network,
of the self. And we weren’t able to push straight
toward it because the particular portions of the system
needed for the Webmind Market Predictor – our first
product – needed so much attention.
But overspecialization induced by business needs was far
from our only problem. The truth, as we sourly discovered,
was that our core Java code, implementing the essence of
the psynet model of mind, was just barely adequate for building
products, let alone building real AI – it had too
many bugs and was poorly documented. Ken had implemented
this code brilliantly and painstakingly over a year of 15
hour workdays, but, even so, the task had been too big for
any one human. We could have fixed up his code to make it
product-ready, but we doubted whether we’d ever get
it to the point where it could support the global brain.
So, toward the end of summer 1999, we decided to rewrite
the Webmind code again. Not the whole system, thankfully
-- we were too far along for that -- but this time only
psycore, the central core of the system. This time around,
Ken was helped out not only by me but by Cassio and several
of his colleagues in Belo Horizonte, most notably Andre
Senna and Thiago Maia, two masters of data structures and
algorithms. At this point, there was a lot of pressure,
from some members of staff on both the technical and business
sides of Intelligenesis, to give up on the unified AI architecture
altogether, and just focus on making individual products
as good as they could be, postponing real AI into the future.
But Ken and Cassio and I and others focused on building
real AI resisted this pressure and plowed ahead with building
a new, improved psycore. Among other beloved chunks of code,
Paul’s Pods system met its doom in this rewrite.
The reasons for this redesign are somewhat interesting;
they reveal a lot about the nasty realities of building
big software systems doing complicated, intelligent things.
The erratic bugs and lack of documentation in Ken’s
code were part of the problem, and made Ken the arch-enemy
of the engineering staff for a while. But this stuff was
fixable. There was also a more serious problem with the
system. It just wasn't flexible enough to enable a huge,
multi-module Webmind to be run in a really intelligent way.
When the system was only doing one thing – say, reading
text, or using text to predict markets – then it was
fine. But, it was very bad at regulating several activities
at once.
For example, when loading in a series of texts, one would
see it get slower and slower at reading. The reason was,
the more texts it had in it, the more it had to think about.
It had no time to read more text because it was so busy
thinking about the texts it had already read! I remember
once when Mark Watson, one of our Java AI gurus, noticed
this problem in a Webmind demonstration he had written.
Jim McLoughlin – one of our early hires who built
a lot of Webmind’s numerical and financial analysis
components --showed him a way around it. By hacking the
code, you could get it to do anything, in any particular
situation. But what was needed was intelligent self-control:
the system had to know what processes were important to
it, and regulate the amount of attention it spent on various
things accordingly. Of course, we had always realized this
would be necessary. But we hadn’t realized how deeply
we’d have to code self-control into the system. Ken’s
1998-99 psycore was built to follow its whims, not to control
its dynamics in accordance with goals; and imposing goals
on top of this structure was like trying to get a hyper
child to sit down and listen to a history lesson.
The system was so complicated that we couldn’t easily
make the simple changes we needed to make to turn it into
a real global brain platform. We needed to be able to turn
on and off the different capabilities of the nodes and links
at will -- and have the system do this automatically, adapting
to its circumstances. We needed to be able to take collections
of nodes and links that were stable, no longer evolving,
and "freeze" them into a state that took up very
little memory, providing easy access but no adaptability.
We needed to be able to observe what was going on in a particular
part of the system, and chart its dynamics, to see what
structures were emerging.
Over the period 1998-99, psycore had evolved incrementally,
getting new features whenever a module author needed them.
The natural language team needed psycore to do one thing,
the finance team needed it to do another, the categorization
team needed it to do another, the reasoning team needed
a reasoning module, and so on, and so on. None of these
requests fundamentally changed the architecture of nodes,
links and wanderers -- mental entities relating to each
other and dynamically altering their relationships -- but
they changed the details of how nodes, links and wanderers
worked, and how they could be accessed and changed. The
abundance of new features had made the core code more powerful,
but it had made it messier too, and harder to control. Many
of the new features had similar structures, and in hindsight
could be consolidated into simpler structures. Engineers,
charged with building specialized components of Webmind,
complained that the system offered so many features and
possibilities that it was difficult to figure out how to
use it. They wanted something simpler, with a few good features
rather than a large number of features of varying quality.
Was it really necessary to go through all these revisions?
Why not just figure out everything correctly the first time,
and avoid all the reworking and re-reworking? One answer
is: We should have, we were just inexperienced, so we kept
fucking up. But there’s also another answer, that
I prefer because it’s more flattering to me! This
answer is: evolution doesn't work that way. Webmind, as
a software system, is an engineered system, but it is also
an evolved system. It went through several incarnations,
each one with some fit aspects and some unfit aspects. The
fit aspects survived to the next incarnation; the less fit
aspects didn’t. All large software projects evolve
through multiple generations; Webmind was not unique in
this regard. But the evolution of Webmind had unique aspects
because what is evolving is mind itself. In this evolution
we had to retain both those features that were most useful
for practical applications and those that were in accordance
with the abstract structure of mind.
Evolution’s good at figuring out how to make a system
that can achieve its goals within a certain environment.
In this case, the system was Webmind, and the environment
includes the physical structure of modern computer hardware,
the universe of software that has evolved to adapt to it,
and the practical applications that Webmind was intended
for, like market prediction, news filtering, data analysis,
text analysis, and conversation. Java, wonderful as it is,
wasn’t designed for mind hacking. The von Neumann
architecture was designed for repetitive mathematical calculations,
not for intelligence. But, by the same token, the brain
was designed for sensing and acting, not for abstract thought.
Fiber cells were designed for musculature, not for use as
neurons. Mind can emerge from any sufficiently flexible
substrate, as the features of the substrate gradually adapt
themselves to the requirements imposed on them.
The new psycore had a multi-layered structure, which I invented
based on some conversations with Youlian Troianov, a Bulgarian
software engineer who believes Webmind can never be truly
intelligent because it doesn’t make use of the fundamental
quantum symmetries of the universe (but he kept working
for us anyway, and even now follows Novamente work very
closely). I still don’t completely understand what
Youlian meant when he suggested psycore should have many
layers, but the idea set off a spark in my mind, and the
current psycore does indeed have three layers.
The lowest layer was what we called “abstract actors.”
It was a general framework for computational actors that
group other actors and transform other actors and send messages
to other actors. We chose the word “actors”
here instead of “actors” because “actors”
seems to mean too many things to too many people. Lots of
other possibilities were tossed around, including more interesting
ones like “cells,” “psells”, “psions”,
“psychons” and so forth. Basically, Layer 1
provides a kind of “mind operating system,”
suitable to run on a single machine and a single processor,
or else on a massively parallel hardware system in which
each actor gets its own processing power, like in the brain.
The second layer was “distributed actors” –
this deals with all the horrible nastiness of implementing
a massively parallel system on a collection of multiprocessor
machines networked together by TCP-IP. Scheduling of processes,
sending of messages from one machine to another, and so
forth. Paul’s Pods system was considered as a structuring
principle for this layer, but based on extensive testing
by the Brasilians, we chose some other ideas instead, which
Paul wasn’t terribly happy about.
The third layer, finally, was nodes and links and wanderers
and all the good stuff – all the stuff I invented
in 1997 and Ken and I coded up in the beginning. This layer
comes out very small once you put all the general actor
interaction stuff in layer one and all the nasty multiprocessor
and multi-machine stuff in layer 2. But the fact that it’s
small is great, because it means it can be easily experimented
with.
That’s it – three layers of psycore. If you
want to you can extend the layering metaphor outside of
psycore into the rest of Webmind. The fourth layer, conceptually,
is the modules, all the specific node and link types for
carrying out specialized functions of mind. And the fifth
layer, if you want to stretch the metaphor, would be some
fragments of Java code called “interfaces” that
I wrote to systematize all the different learning methods
in the modules. For instance, a categorization interface
that groups together all the different categorization methods
in the different modules.
None of this layering really adds anything to the philosophy
of mind underlying Webmind – it’s just a matter
of making the huge morass of complexity needed to make a
practically useful mind into a workable software system.
The complexity comes from two places: first, the diverse
specialization needed to make pattern recognition and formation
practical in any real world; and second, the fact that we
don’t have a massively parallel hardware substrate
like the brain, so we have to get a massively parallel self-organizing
system of nodes and links to run on a hodge-podge of processors
and memory units. The object-oriented design skills of our
Brasilian engineering team were crucial in getting all this
to actually work correctly, which it now seems to, much
to my amazement and pleasure.

The
next big upheaval in our conception of Webmind – a
few months after the new psycore was done – had to
do, not with the structure of the core system, but rather
with the teaching of the system as a whole. The basic problem
here was: Once you have the dynamics and structures needed
for the mind implemented in an adaptable, workable, testable
way – you still need to turn this mind framework into
an actual particular mind, that understands the world around
it. How do we get the knowledge in? We had a lot of ideas
about learning and extending knowledge from grounded to
ungrounded domains – ideas I mentioned above. But
this didn’t seem to be quite enough. We started to
think the system was going to need a bit more of a helping
hand in learning how to cope with its world.
In January 2000 I read the engineering plan for the Natural
Language module, written by Karin Verspoor, our Director
of Natural Language -- another one of our very early hires
who took over the language processing aspect of Webmind
from Onar way back in 1998. Karin has a comprehensive knowledge
of linguistics and computer science, but when she started
working for us, she didn’t have much background in
computational linguistics. She inherited from Onar and me,
when she first started out, a lot of ideas about how computers
could learn language by statistically studying texts. The
basic framework was one that my wife Gwen and I had developed
in 1994, when Gwen was working on her PhD in computational
linguistics. Onar helped me extend the ideas beyond where
they’d been when Gwen had dropped out of her PhD.
Karin helped Onar to make practical Webmind implementations
of the stuff he was working on.
Some of these language learning schemes were great and some
were absolute rubbish. We’re still not sure about
some of them. Anyway, after about 8 months trying to get
the 1998 version of Webmind to learn language by recognizing
patterns in texts, Karin gave up in frustration, and started
taking the Webmind NL module in a new direction, in which
the structures of language are built in, and even a lot
of specific facts about English are built in, like parts
of speech, grammar rules, and so forth. I always had mixed
feelings about this. Webmind was really supposed to learn
everything on its own, not have stuff wired in – that
was the stuff of expert systems, rule-based AI, which I
knew on philosophical grounds was a total dead end. On the
other hand, I told myself, human language was a special
case among all the things Webmind had to deal with. My theory
of mind explained how Webminds could learn their own language,
to communicate amongst each other. But I’d never really
explicitly thought about how a mind could learn the language
of a completely alien race – which is what we are,
from the point of view of Webminds.
Webmind didn’t have a human body, and without one,
could we really expect it to learn human language? Although
Onar’s and my methods for having the system learn
language from recognizing patterns in texts seemed to make
perfect sense, mathematically and conceptually, this obviously
wasn’t the way people learned language. You learn
to talk and hear language before you learn to read it. You
learn what words mean from your embodiment in the world
with other people who talk to you and listen to you.
It seemed to me we were missing something – not in
the core Webmind design, but somewhere else. Finally, after
a few days of soul-searching, I figured out what it was:
Humans learn how to be intelligent by interaction with other
humans in a shared environment. It’s as simple as
that. Raise a baby human in a room by itself and it’ll
grow up to be a moron. Of course, I’d said as much
in my theoretical books, way back in the Dark Ages, but
with all the focus on getting the system to actually work,
getting all the modules to work on their own and to work
together, I’d let this aspect of intelligence slide.
How to work this aspect into our current work on Webmind,
I wasn’t quite sure, but I knew I had to figure it
out fast.
I needed someone to bounce these ideas off of as they developed.
I chose two of our best young AI engineers – Cate
Hartley and Mike Ross – together with my Deputy CTO,
Stephan Bugaj. It was important to me, in developing these
ideas from vagueness to concreteness, to work with people
who hadn’t played a big role in designing the current
system, because I was afraid that the conclusion might be
that the current system was lacking in some basic way, and
I thought our old established engineers might be afraid
to come to this conclusion even if it was the right one.
Our conclusion was that the current system was pretty much
fine. It contains all the parts needed for learning through
shared experience; the trick is just to deploy them in the
right way. We designed a simple user interface in which
Webmind can move objects around and watch us move objects
around, and chat with us about what it’s doing and
seeing. Using this “Baby Webmind” interface,
we need to lead the system step by step through goals, beginning
with simple goals and gradually moving to more complex ones.
We need to teach the system step by step almost like a baby.
Ken, Karin, Pei, Jeff and all the “old guard”
quickly became deeply involved in helping us work out the
details.
All this entailed some changes in emphasis from our pre-Baby-Webmind
work. Before, we’d focused mainly on the system perceiving
its environment, now, in the Baby Webmind context we started
thinking just as much about action, about what Webmind does
in its world and how its actions intersect with its perceptions.
Before Baby Webmind, the evolutionary learning aspect of
Webmind was focused on learning to recognize patterns in
text and numerical data; now, it was tweaked so it can easily
evolve schema, procedures for seeing, doing and acting,
and so forth. But fortunately, we discovered, nothing big
and new needed to be built for Baby Webmind; it was all
just a matter of adjusting the modules that were already
there, encouraging them to interact with each other in the
right way.

The
dissolution of Webmind Inc. in March 2001 was a class A
disaster for all of us involved with the firm. I can think
of sadder moments in my life, but none that dragged on for
weeks and weeks. The “endgame” of the company
was a torturous process of laying off one group of friends
one week and another group the next, and sitting in endless
argumentative meetings, occupying ceaseless consecutive
12-hour workdays, trying to find some way to salvage things.
In retrospect I wish I’d spent the last 5 months of
the company’s life playing Pac-man or writing poetry
in machine language… but of course, at the time it
wasn’t evident how things would come out, and if we’d
succeeded in salvaging the firm then all the torture would
have seemed worthwhile.
From one point of view, however, the tragic event could
be construed as a positive thing. In the weeks following
the dissolution, the legal status of the company’s
“intellectual property” was not at all clear
(in fact, it did not become clear for over a year). A group
of us resolved to stick together and continue pushing toward
a real AI, but, we didn’t feel at all legally comfortable
continuing to work with the Webmind AI Engine sourcecode.
We waited a few more weeks to see if the legal situation
would resolve itself, but it didn’t, and so we decided
to start anew.
This new start was a much bigger break from the past than
the “psycore redesign” I discussed above. In
that redesign, mistakenly in hindsight, we’d tried
to preserve backwards-compatibility with the previous codebase.
We wanted the pre-redesign natlang module to keep on working
with the redesigned core. In practice, this backwards-compatibility
turned out to be pretty worthless, because everyone wound
up redesigning their modules for optimal performance under
the new core, even though their old versions would have
kept working “in principle” with only minor
changes. We hadn’t wanted to admit the necessity to
throw all the code out and start over, retaining only the
ideas and lessons and the best of the mathematical high-level
design features from the old version. We were under too
much business pressure to get the Webmind AI Engine doing
amazing things fast, so it could contribute to product development
and moneymaking.
But as we took stock of our situation in late Spring 2001,
we also realized we’d made some big mistakes that
weren’t attributable to business pressures or the
mistaken desire to retain backwards-compatibility. There
were three big problems.
First, we had built the new core as a generalized software
agents system, and done a wonderfully good job of it. But
with this generality came a terrible cost in terms of computational
inefficiency. Ken’s psycore had been just plain incorrect,
and incomprehensible. The new Brazilian psycore was elegant
and clearly structured; when there were bugs in it, it was
possible to find them and remove them, because the code
was well-written and well-documented. But the combination
of the inefficiencies of the Java programming language and
the inefficiencies of the general-agents-system design,
combined, led to a system that could take several minutes
for a single “cycle” to occur (a “cycle”
meaning a period of time in which every node in the system
got to do a little processing, dynamically relating itself
to the other nodes around it via its links, building new
links or modifying its old ones).
The other problem was more conceptual and mathematical than
software-oriented. We had a system for representing procedural
knowledge, which used SchemaNodes and related link types.
And we had a system for representing declarative knowledge,
using InheritanceLinks, SimilarityLinks, AssociativeLinks
and related things. The relationship between procedural
and declarative knowledge, however, could be expressed in
the system only in an extremely complicated way. We were
spending a terrifying amount of time working out the mathematical
and conceptual details of this relationship. This aspect
of the system just seemed wrong, because it was so bloody
complex. And it seemed complex in the wrong kind of way.
The right kind of complexity, we felt, was the kind that
involved a very simple framework giving rise to complex
emergent structures and dynamics. The wrong kind was when
the foundational framework itself was too complex, and that’s
what seemed to be happening with our procedural/declarative
integration work.
Finally, our approach to natural language processing really
wasn’t working. We were trying to hybridize a rule-based
approach with a statistical-learning approach, and it was
getting to be a huge mess. Accomodating the linguistic rules
that human beings had made up and stored in linguistic databases,
was requiring us to do all sorts of perverted things with
node and link types. More and more, we were forced to conclude
that you just couldn’t perform the hybridization we
were attempting. Instead, we began to think, the only way
to do language processing was to take the bull by the horns
and go with a full experiential interactive learning approach.
This wasn’t what the businessman half of my brain
wanted to hear, because the pure EIL approach means that
language processing comes last, after all the various cognition
processes are working perfectly together – which is
a problem for a AI system being built within a company whose
products are based on language processing. We badly wanted
the Webmind AI Engine to help our market prediction and
document management products understand language better
– but what our research was telling us was that there
were two ways to approach language: the overspecialized,
standard-AI, rule-based way, or the real-AI, EIL-based way.
Our attempt to chart a middle course by fusing the two together
just wasn’t going to cut it.
A number of us had been thinking for a while about better
ways of doing things. All the experimentation with the new
core had taught us a lot about how an integrative AI system
should work. Except for the language processing and procedural/declarative
interfacing issues, we seemed to have solved all the thorny
conceptual problems of inter-mind-module integration –
and there had been a hell of a lot of them. We felt like
we knew what we were doing to a vastly greater degree than
had been the case in 1999 when we’d designed the now-old,
then-new psycore. Which of course meant that, in a sense,
the 1999 core rewrite had been a success – because
working with it had taught us a hell of a lot.
In April 2001, two Brazilian engineers, Thiago Maia and
Andre’ Senna, began creating the new new “psycore.”
The basic principles of this new system were outlined by
Thiago and myself in New York, before he went back to Brazil,
lacking a visa to remain in the US after the death of Webmind
Inc. (as well as lacking a US source of income). The ideas
we discussed were loosely based on many past conversations
with others, including Senna, Cassio, and a few wild AI
mavericks from the Webmind New York office: Anton Kolonin,
Shane Legg and Youlian Troyanov. The latter three guys all
had their own theories of how to build a real AI, though
none of them articulated their approaches nearly clearly
enough for my liking (I’m still in touch with all
of them, and still enjoying witnessing the development of
their ideas). Because they each had an intuitive sense for
the “whole enchilada” of the real AI problem,
through their own speculative work, they were extremely
good critics of the Webmind AI approach. The idea of the
new new core was not to make a better generalized agents
system, it was rather to make a more specialized software
framework, which was ideally suited for exactly the mind
modules we now knew we needed. Ken’s original psycore
had been specialized, but then we’d had to add onto
it endlessly, because its specialization was overly limiting,
and didn’t allow all the mind modules we found we
needed. On the other hand the Brazilian 1999 new core had
been general enough to allow us to experiment with all sorts
of different mind modules, but this generality had carried
too much of a price in terms of efficiency. Now we knew
what we needed in terms of modules and could build an appropriately
specialized system.
The procedural/declarative learning problem was a hard nut
to crack, and when we started the development of our new
system in late Spring 2001, I had only a general idea of
how to solve it. But many months of effort paid off, and
by late fall 2001 I had come up with an apparently workable
solution, which used an advanced and obscure branch of mathematics
called combinatory logic to bridge the procedural/declarative
divide. This solution was eminently unbrainlike –
it much more resembled what goes on inside the compilers
for functional programming languages like Gofer -- but at
this point, quasi-detailed brain emulation no longer seemed
so critical to me. I was simultaneously working on a paper
on what I called “Hebbian Logic,” an original
theory of how advanced logical inference emerges from brain
structures. As my thinking on brain dynamics and its relation
to thought got clearer and clearer, I could see that, in
some cases (such as the procedural/declarative interface)
what’s right for the brain just isn’t going
to be workable for any system running on a clustered-von-Neumann-machine
hardware substrate. And one nice thing that happened was
that, when I formalized the most difficult parts of natural
language processing in terms of my new combinatory-logic-based
framework, much of the complexity melted away. The continuity
between language processing and generic cognitive processing
became vastly clearer.
Thus we arrived at the AI design we call Novamente –
the new mind. Novamente is currently (February 2002) only
about 25-30% implemented, but I have little doubt that by
the time you read these words, substantially more progress
will have been made.
I’ve told you a lot about our various mistakes, oversights
and revisions – and you may well draw from this tale
the conclusion that I and my colleagues are a bunch of oafs
who can’t get anything right! I think that a fairer
conclusion, however, is that the real AI problem is really
goddamned hard. Building a market predictor or a better
text classification system – these were somewhat tricky
problems, but we solved them relatively rapidly and unproblematically.
Building a real AI is a different sort of animal. Most people
who have approached the problem seemed to have begun with
a certain technical approach and followed it where it led
– and then stayed where their initial technical approach
led, doing valuable work and making specialized applications,
but abandoning the original real AI goal. On the other hand,
we began with a very general, high-level conceptual picture
of the type of system we wanted to build, and have progressively
revised our technical approach in order to achieve a closer
and closer approximation to our high-level conceptual picture.
What does Novamente do, right now? It doesn’t hold
a conversation with you. It doesn’t rewrite its own
sourcecode. In fact it is not nearly as impressive in its
current behaviors as, say, Deep Blue, which is well known
to be cognitively shallow. One of the really terrible things
about the real AI problem, however, is that the approach
that gives the best interim results is probably not anywhere
near the best approach to the end goal. This is because
good interim results are usually obtained by overspecializing
one mind-module for independent performance, whereas real
AI will only be achieved by interadapting an appropriate
assemblage of mind-modules to one another.
We have not given up on the use of interim, incomplete versions
of our AI system to yield practical results – first,
because we can’t afford to; and second, because we
really do believe that, as Danny Hillis says, feedback from
real applications is a critical part of the AI creation
process. This time around, however, we have enough experience
to choose our interim practical applications more intelligently.
We are not going to attempt serious language processing
until a significantly more advanced phase in the system’s
development. Rather, we are using the system’s inference,
association-finding, and concept-formation abilities to
enable highly sophisticated data mining – recognition
of patterns in complex databases filled with heterogenous
data. In particular, we’re applying the current Novamente
version to some sticky datamining problems that arise in
the analysis of genetic data. And we’re having some
significant success! But this is a story that’s best
told after some more background on modern genetics has been
presented, and so it will be deferred until Chapter 8. There
are also fascinating potential applications to the analysis
of brain scan data, as will be discussed in Chapter 7, though
unlike genetics this is not a type of data analysis we have
actually attempted yet.

Recall
from Chapter 1 the notion that there are two metasystem
transitions involved in the emergence of mind from unintelligent
matter. This idea was related there in a Novamente context
– and now that so much detail on Novamente has been
given, it may be appreciated more fully.
Each of Novamente’s modules has a certain wholeness,
a certain synergetic transcendence of the whole over the
parts. But the metasystem transition we’re really
focused on is the next one up. The big trick is to get emergent
intelligence out of the whole mess – active and productive
emergent intelligence, wherein the whole mind is engaged
in achieving goals by recognizing patterns in itself and
the outside world. The specialized pattern recognition and
formation routines in the modules aren’t capable of
achieving really complex goals or of generalizing from one
domain to another. Putting a few modules together can give
you functions that normal AI software can’t do –
things like using text to predict the financial markets,
which we’ll discuss in the following chapter. But
putting all the modules together can get you actual intelligence,
because the modules are chosen specifically so as to allow
the system to understand itself, to recognize patterns in
itself. Self-understanding is not an easy thing after all.
The modules in the current Novamente system represent pretty
much the minimal set required to achieve it, in the particular
complex environment that is the Internet. Interaction with
other intelligences – us – in a shared environment
is a task that uses all the modules of Novamente, integrated
together tightly and generating emergent structures that
are constantly tested for usefulness.
To teach our baby Novamente, we won’t chat with it
about trees and flowers and teeth, because it doesn’t
have direct experience of these things. We’ll chat
with it about data files and shapes and MIDI music files,
because these are the things that we can both experience.
Intelligence has to be gained through interactive experience
in a shared environment. And it’s intriguing to see
how the basic task of learning to interact in the world
uses all of Novamente’s specialized modules. Reasoning
and genetic programming – evolution – are used
to find schema -- sets of basic procedures for seeing and
doing and thinking – that are useful at achieving
the system’s goals and hence make the system happy.
Categorization is needed to define contexts in the world
– a schema has to be judged by how it achieves important
goals in relevant contexts. Language processing is obviously
needed to chat with humans, and although in this context
most of the specific nature of human language must be learned,
nevertheless the basic structures needed for language understanding
need to be provided from the start; learning language as
a general set of patterns is a job for millions of years
of evolution rather than for months or even years of learning.
Data processing is needed to turn raw numerical data files,
sensed by the system, into comprehensible perceptual features.
And so on. All the link building and node building methods
of Novamente’s long term memory, its core, are needed
to provide the data that basic behavior schema need to act
intelligently.
All this complexity is not 100% obvious from the original
vision of mind as a collection of patterns that forms and
perceives patterns in itself and the world, in order to
achieve complex goals in a complex environment. But once
you dive into the details, it does fall out of this general
view fairly naturally. A complex environment, including
other intelligences, involves a lot of different kinds of
things, each one requiring its own specialized pattern recognition
and formation mechanisms. Achieving complex goals in such
an environment involves forming concepts that span various
kinds of things, internal and external things. This requires
intense interaction between the various modules of mind.
And so it goes. The best conclusion I can think of is this:
There’s no big trick to building a thinking machine,
actually. A mind is a collection of patterns that recognizes
and forms patterns in itself, in order to achieve complex
goals. There are some universal structures and dynamics
that it seems any mind has got to have. And it’s possible
to build a system possessing these universal structures
and dynamics in Java, running on a network of high-powered
PC’s. The main problems are these. First, getting
the needed memory and processing power. Then, the routine
but really annoying software engineering problems of getting
such a huge system to actually work in a reliable and efficient
way. There’s the problem of parameter tuning –
getting the system to regulate itself, all its modules together,
in a way that keeps the whole huge system functioning adequately,
without any part starving the other for resources. And then
there’s the problem of teaching – how do we
play mommy and daddy to a baby intelligence so unlike us
without driving it totally batty! Fortunately we seem to
have solutions to all these problems, and so the creation
of the world’s first really thinking machine would
seem to be only a year or two ahead of us. And as we walk
along the path, we’re building lots of cool components
that can – if we play our cards right -- make us money
along the way. There are worse ways to spend a few years!
And the possibility of our work triggering the Singularity
in the fully Vingean sense is also somewhat tantalizing….

Now,
after all this, where does the “Web” part of
the original Webmind scheme fit in?
Refreshingly, the original vision still fits pretty damn
well: the Internet, now even more so than in the mid-90’s,
has the potential to give a real AI system both processing
power and a rich perceivable/manipulable environment. To
make this potential real, however, requires the development
of specific Internet software aimed at making the Net useful
for AI. Implementing Heylighen’s proposals for adaptive
hyperlink weight modification would be a step in this direction.
But what’s needed to make the Net truly Novamente-friendly
is a good bit more than this. Toward this end, we have designed
a global distributed processing framework called WebWorld,
which will allow a Novamente (or any other roughly-similarly-structured
AI system) to split up its thought processing across literally
millions of machines. Some of these machines may not be
powerful enough to run Novementes, but may nonetheless be
strong enough to run smaller “Webmind auxiliary processing
units,” which we call WebWorld lobes. The WebWorld
framework was prototyped at Webmind Inc.; a fully functional
version was never built, but a fairly complete design exists
and if no one else creates something similar, in time a
WebWorld variant will be implemented as part of the Novamente
project.
Once WebWorld has been built, how exactly will it be used
in Novamente? In the beginning, at least, a big Novemente
will always have a cluster of dedicated machines as its
main mind. But it will farm out various learning problems
to thousands or millions of machines elsewhere. One thing
that this surplus of machines will allow it to do is to
read the huge amount of textual and numerical data that’s
out there on the Web, and eventually picture, sound and
movie data as well. So, although Novamente is starting out
as a program running on a small cluster of machines and
operating on a limited pool of data, its need for a rich
perceptual environment combined with its limitless thirst
for processing power is going to push it onto the Net, totally
consistently with the initial vision of a Web mind, an Internet
global brain, the Internet turned into a global brain.
Taking this vision one step closer to reality, let’s
look at what this might mean in terms of the Internet of
the next five or ten years. Of course, we realize that no
such “map of the future” is likely to be extremely
accurate. The Internet is a complex and rapidly evolving
system. No one person, company or computer program can control
it. But nonetheless, we can all take part in guiding it.
And in order to do this intelligently, an overarching vision
is required.
The figure below, drawn from my recent book Creating Internet
Intelligence, is an attempt at an “architecture diagram”
for the entire Net, in its Webmind-infused form. Naturally,
any diagram with such a broad scope is going to skip over
a lot of details. The point is to get across a broad global
vision:
| |
 |
| |
|
| |
 |
First,
we have a vast variety of “client computers,”
some old, some new, some powerful, some weak. Some of these
access the intelligent Net through dumb client applications
– they don’t directly contribute to Internet
intelligence at all. Others have smart clients such as WebWorld
clients, which carry out two kinds of operations: personalization
operations intended to help the machines serve particular
clients better, and general AI operations handed to them
by sophisticated AI server systems or other smart clients.
Next there are “commercial servers,” computers
that carry out various tasks to support various types of
heavyweight processing – transaction processing for
e-commerce applications, inventory management for warehousing
of physical objects, and so forth. Some of these commercial
servers interact with client computers directly, others
do so only via AI servers. In nearly all cases, these commercial
servers can benefit from intelligence supplied by AI servers.
Finally, there is what I view as the crux of the intelligent
Internet: clusters of AI servers distributed across the
Net, each cluster representing an individual computational
mind. Some of these will be Novamentes, others may be other
types of AI systems. These will be able to communicate via
a common language, and will collectively “drive”
the whole Net, by dispensing problems to client machines
via WebWorld or related client-side distributed processing
frameworks, and by providing real-time AI feedback to commercial
servers of various types.
Some AI servers will be general-purpose and will serve intelligence
to commercial servers doing a variety fo particular things;
others will be more specialized, tied particularly to a
certain commercial server (e.g., Yahoo might have its own
AI cluster to back-end its portal services).
Is this the final configuration for the Global Brain? No
way. Is it the only way to do things? No. But this seems
the most workable architecture for moving things from where
they are now to a reasonably intelligent Net. After this,
the dynamics of societies of AI agents become the dominant
factor, with the commercial servers and client machines
as a context. And after that….

Recall
the notion of “the Singularity,” first proposed
in the 70’s by science fiction writer Vernor Vinge,
referring to the notion that the accelerating pace of technological
change would ultimate reach a point of discontinuity. At
this point, our predictions are pretty much useless –
our technology has outgrown us in the same sense that we’ve
outgrown ants, beavers, rhesus monkeys and striped cockroaches.
The Singularity is not just about AI, but AI may play a
special role in the advent of the Singularity, because once
it’s sufficiently advanced it can serve as a powerful
“metatechnology,” drastically accelerating the
pace of creation of new technologies of various kinds.
Eliezer Yudkowsky and Brian Atkins have founded a non-profit
organization called the Singularity Institute [http://www.singinst.org/intro.html]
devoted to helping to bring about the Singularity, and making
sure it’s a positive event for humanity rather than
the instantaneous end of humankind. Yudkowsky has put particular
effort into understanding the AI aspects of the singularity,
discoursing extensively on the notion of Friendly AI –
the creation of AI systems that, as they rewrite their own
source code, achieving progressively greater and greater
intelligence, leave invariant the portion of their code
requiring them to be friendly to human beings. We’ll
discuss some of these ideas in depth in Chapter 12 below.
The notion of the Singularity seems to me to be a valid
one, and the notion of an AI system approach it by progressively
rewriting its own source code also seems to be valid. But
as usual, there are a few pesky details that only become
clear once one has a sufficiently-well-fleshed-out framework
within which to analyze them. From a Novamente perspective,
the following is the sequence of events that seems most
likely to lead up to the Singularity:
1. Someone (most likely the Webmind AI Engine team!) creates
a fairly intelligent AI, one that can be taught, conversed
with, etc.
2. This AI is taught about programming languages, is taught
about algorithms and data structures, etc.
3. It begins by being able to write and optimize and rewrite
simple programs
4. After it achieves a significant level of practical software
engineering experience and mathematical and AI knowledge,
it is able to begin improving itself ... at which point
the hard takeoff begins.
My
intuition is that, even in this picture, the “hard
takeoff” to superhuman intelligence will take a few
years, not minutes. But – obviously -- that's still
pretty fast by the standards of human progress.
The Singularity emerges, in this vision, as a consequence
of emergence-producing, dynamic feedback between the AI
Engine and intelligent program analysis tools like the Java
supercompiler. The global brain then becomes not only intelligent
but superintelligent, and we, as part of the global brain,
are swept up into this emerging global superintelligence
in ways that we can barely begin to imagine.
To cast the self-modification problem in the language of
Novamente AI, it suffices to observe that self-modification
is a special case of the kind of problem we call "schema
learning." The AI Engine itself is just a big procedure,
a big program, a big schema. The ultimate application of
schema learning, therefore, is the application of the system
to learn how to make itself better. The complexity of the
schema learning problem, with which we have some practical
experience, suggests how hard the “self-modifying
AI” problem really is.
Sure, it’s easy enough to make a small, self-modifying
program. But, such a program is not intelligent. It’s
closer to being “artificial life” of a very
primitive nature. Intelligence within practical computational
resources requires a lot of highly specialized structures.
These lead to a complicated program – a big, intricate
mind-schema – which is difficult to understand, optimize
and improve.
Creating a simple self-modifying program and expecting it
to become intelligent through progressive environment-driven
self-modification is an interesting research program, but
it seems more like an attempt to emulate the evolution of
life on Earth than an attempt to create a single intelligence
within a reasonable time frame.
But just because the “learn my own schema” problem
is hard, doesn’t mean it’s unsolvable. A Java
or C program can be represented as a SchemaNode inside Novamente
, and hence it can be reasoned about, mutated and crossed
over, and so forth. This is what needs to be done, ultimately,
to create a system that can understand itself and make itself
smarter and smarter as time goes on – eliminating
the need for human beings to write AI code and write articles
like this one.
Reasoning about schema representing computer programs requires
a lot of specialized intuition, and specialized preprocessing
may well be useful here, such as for instance the automated
analysis and optimization of program execution flow being
done in Val Turchin and friends’ Java supercompilation
project [http://www.supercompilers.com]. There is a lot
of work here, but it’s a fascinating direction, and
a necessary one.

Call
us mad scientists if you will, but all of us involved in
the project believe that the Novamente, once fully implemented
and tested, will lead to a computer program that manifests
intelligence, according to the criterion of being able to
carry out conversations with humans that will be subjectively
perceived as intelligent. It will demonstrate an understanding
of the contexts in which it is operating, an understanding
of who it is and why it is doing what it is doing, an ability
to creatively solve problems in domains that are new to
it, and so forth.
And of course it will supersede human intelligence in some
respects, by combining an initially probably modest general
intelligence with capabilities unique to digital computers
like accurate arithmetic and financial forecasting.
We believe we’ve covered all the bases: every major
aspect of the mind studied in psychology and brain science.
They’re all accomplished together, in a unified framework.
It’s a big system, it’s going to demand a lot
of computational resources, but that’s really to be
expected; the human brain, our only incontrovertible example
of human-level intelligence, is a complex and powerful information-processing
device.
Not all aspects of the Novamente system are original in
conception, and indeed, this is much of the beauty of the
thing. The essence of the system is the provision of an
adaptable self-reconstructing platform for integration of
insights from a huge number of different disciplines and
subdisciplines. In Novamente, aspects of mind that have
previously seemed disparate are drawn together into a coherent
self-organizing whole.
The cliché Newton quote, “If I’ve seen
further than others, it’s because I’ve stood
on the shoulders of giants,” inevitably comes to mind
here. (As well as the modification I read somewhere: “If
others have seen further than me, it’s because giants
were standing on my shoulders.”….) The human
race has been pushing toward AI for a long time –
Novamente, if it lives up to our aspirations for it, will
merely put on the finishing touches.
While constructing an ambitious system like this naturally
takes a long time, we were making steady and rapid progress
until Webmind Inc.’s dissolution in early 2001. It
seems Arthur C. Clarke was off by a bit -- Webmind won’t
be talking like HAL in the film 2001 until a bit later in
the millennium. But we’re currently scraping by with
a small team, and making significant and steady progress.
Accurate timing estimates remain difficult to make, but
if we manage to keep well enough funded to keep the current
team full-time on the project, we believe Novamente’s
first moderately intelligent conversations will take place
sometime in the next few years … and that’s
going to be (to use the technical term) pretty bloody cool!
What are the complaints and counterarguments most often
heard when discussing the Novamente project with expert
outsiders? We’ve already discussed some of these above.
First, there are those who just don’t believe AI is
possible, or believe that AI is only possible on quantum
computers, or quantum gravity computers, etc. Forget about
them. They’ll see. You can’t argue anyone out
of their religion. Science is on the side of digital AI
at this point, as has been exhaustively argued by many people.
Then there are those who feel the system doesn’t go
far enough in some particular aspect of the mind: temporal
or causal reasoning, or grammar parsing, or perceptual pattern
recognition, or whatever. This complaint usually comes from
people who have a research expertise in one or another of
these specialty areas. The Novamente system’s general
learning algorithms, they say, will always be inferior to
the highly specialized techniques that they know so well.
My feeling is that the current Novamente design is about
specialized enough. I don’t think it is so overspecialized
as to become brittle and non-adaptable, but I worry that
if it becomes more overspecialized, this will be the case.
My intuition is that things like temporal and causal reasoning
should be learned by the system as groundings of the concepts
“time” and “cause” and related concepts,
rather than wired in.
On the other side, there are those who feel that the system
is “too symbolic.” They want something more
neural-netish, or more like a simple self-modifying system
as I described in Chaotic Logic and From Complexity to Creativity.
I can relate to this point of view quite well, philosophically.
But a careful analysis of the system’s design indicates
that there is nothing a more sub-symbolic system can do
that this one can’t. We have SchemaNodes embodying
Boolean networks, feeding input into each other, learning
interrelationships via neural-net-like mechanisms such as
Hebbian learning, and being evolved by a kind of evolutionary-ecological
programming. This is in fact a sub-symbolic network of procedures,
differing from an evolutionary neural net architecture only
in that the atomic elements are Boolean operators rather
than threshold operators – a fairly insubstantial
difference which could be eliminated if there were reason
to do so. The fact that this sub-symbolic evolving adaptive
procedure network is completely mappable into the symbolic,
inferential aspect of the system is not a bad thing, is
it? In fact, I would say that in the Novamente design we
have achieved a very smooth integration of the symbolic
and subsymbolic domains, even smoother than is likely to
exist in the human brain. This will serve the system well
in the future.
There’s the complaint that Baby Novamente won’t
have a rich enough perceptual environment with just the
Internet. Maybe. Maybe we’ll need to hook up eyes
and ears to it. But there’s a hell of a lot of data
out there, and the ability to correlate numerical and textual
data is a good correlate of the cross-modal sensory correlation
that is so critical to the human brain. I really believe
that this complaint is just plain old anthropomorphism.
There’s the complaint that there are too many parameters
and it will take forever to get it to actually work, as
opposed to theoretically working. This is indeed a bit of
a worry, I can’t deny it. But we’ve gone a long
way by testing and tuning the individual modules of the
system separately, and so far our experience indicates that
the parameter values giving optimal function for independent
activity of a mind module are generally at least acceptable
values for the activity of that mind module in an integrated
Novamente context. A methodology of tuning parameters for
subsystems in isolation, then using the values thus obtained
as initial points for further dynamic adaptation, seems
very likely to succeed in general just as it has in some
special cases already.
Finally, there are those who reckon the design is about
right, but we just don’t have the processing power
and memory to run it, yet. This complaint scares me a little
bit too. But not too much. Based on our experimentation
with the system so far, there are only two things that seem
to require vastly more computer power than is available
on a cluster of a few dozen powerful PC’s. The first
of these, the learning of new procedures for acting appropriately
in various situations (“schema learning,” in
our lingo) is something that can be done offline, running
in the background on millions of PC’s around the world.
WebWorld. And the second, real-time conversation processing,
can likely be carried out on a single supercomputer, serving
as the core of the AI Engine cluster. We have a very flexible
software agents system that is able to support a variety
of different hardware configurations, and we believe that
by utilizing available hardware optimally, we can make a
fairly smart computer program even without the massive advances
that Moore’s law will quickly bring. Of course, the
more hardware we have, the cleverer our system will become…
and soon enough it will be literally begging us for more,
more, more!

I’ll close this chapter with a quote
I found in the book “Conversations with a Mathematician”
by algorithmic information pioneer Gregory Chaitin. Chaitin
is not an AI researcher, although his mathematical work
was inspirational for some parts of the Novamente design,
and in his career at IBM he has kept apprised of their diverse
AI work. When an interviewer asked him about AI and its
relationship with some mathematical ideas, he said:
“[M]y personal opinion is that AI is not a
mathematical problem, it’s an engineering problem….
To me a human being is just a very complicated piece of
engineering that’s exquisitely well-suited for surviving
in this world….
“[I]t’s very often the case that theoreticians
can show that in theory there’s no way to solve
a problem, but software engineers can find a clever algorithm
that usually works, or that usually gives you a good approximation
in a reasonable amount of time. And I think that human
intelligence is also a little bit like that, and that
it’s a matter of creeping up on it little by little,
a step at a time, until we can usually do a good job imitating
it.
“In fact I think that we may be almost halfway there,
only we don’t realize it, and that fifty years from
now we’ll be close to a real AI, and then people
will wonder why anyone ever thought that it was difficult
to create an AI. This AI won’t be the result of
a theorem, it’ll be a mountain of work, a giant
engineering project that was built piece by piece, little
by little, just like what happens in Nature. As the biologists
say, God is a tinkerer, he cobbles things together, he
patches things up, he makes do with what he has to create
new forms of life by experimenting with sloppy little
changes one step at a time….
“We humans aren’t artistic masterpieces of
design, we’re patched together, bit by bit, and
retouched every time that there’s an emergency and
the design has to be changed! We’re strange, awkward
creatures, but it all sort of works! And I think that
an AI is also going to be like that….
“[A] working AI is going to be like some kind of
Frankenstein monster that’s patched together bit
by bit until one day we realize that the monster sort
of works, that it’s finally intelligent enough!
“
I think he overstates the case a little bit. There is a
kind of elegance and order to complex adaptive systems with
emergent behavior, which is different from the elegance
and order in modern mathematics. But still, I like his articulation
of a point that has always seemed to me a piece of “AI
common sense,” but that yet seems to elude most academic
AI theorists. Building a mind is hard. But so was building
the Apollo rocket, so was building a computer. The magic
that we subjectively feel – we minds, we emergent
patterns in our brains – is mind-level magic, not
brain-level magic. The presence of this subjective experiential
consciousness-magic doesn’t imply that it takes magic
to build digital brains. Building a brain is hard work,
requiring a lot of really smart people collaborating together,
and the integration of results from many different kinds
of science. But this kind of hard work is exactly what the
ongoing sci-tech revolution is all about.
When I forwarded this Chaitin quote to some friends, one
person replied, basically: “But that doesn’t
say anything! Of course, anyone who believes that the mind
is a machine, automatically believes building a digital
mind is an engineering problem.”
Of course, there is a little truth to this retort. Chaitin
actually made the quoted statement in response to a question
about Roger Penrose’s claim that the mind is not a
machine in the standard sense, but rather some kind of cosmically
nonlocal nondeterministic quantum gravity system. However,
I think there is a little more substance to Chaitin’s
view that just “mind is mechanical.”
Yes, of course everyone who believes that the mind is a
machine believes that there is an engineering problem involved
in building a mind. But the question is: is the problem
primarily one of engineering, or primarily one of mathematics,
or primarily one of neuroscience, or primarily one of cognitive
psychology, etc.
I know a few individuals who believe the mind is a machine,
but who believe there is some simple mathematical trick
underlying mind operations, and that if we just find this
trick, then creating an artificial mind will be easy. So
they believe that figuring out the math is the main thing
required. My former Webmind collaborators Shane Legg and
Youlian Troyanov (commonly known as the “Bulgarian
Madmind”) have at various times held this opinion
with varying levels of strength.
Ray Kurzweil and others believe that the main problem is
figuring out exactly how the human brain works. Once this
is known, they reckon, it's just a matter of emulating the
brain on a sufficiently powerful computer, using a simple
neural simulator program (feeding in the exact distribution
of neurons, neurotransmitters, synapses, etc. as inputs).
Gregory Chaitin, on the other hand, is making the statement
that mind is mechanical, but he's also making the statement
that the task of constructing a thinking machine requires
primarily engineering-type thinking.
Chaitin's view is a bit like that of Danny Hillis, who stated
that he thinks intelligence is just "a lot of little
things all working together." Marvin Minsky's Society
of Mind theory of AI is somewhat in this direction as well.
These guys don't place much stock in emergence, and in the
need for different structures and dynamics to be exquisitely
harmonized together.
Also, it should be made clear that when Chaitin contrasts
engineering with mathematics, he is taking a pure mathematicians
view of mathematics. In my own AI work, I have not yet had
the opportunity to apply "deep math." There have
been no profound theorems proved about Novamente or Webmind
components, critical to the AI work. On the other hand,
there have been plenty of applications of known math, e.g.
probability theory, combinatory logic, nonlinear dynamics,
and so forth. To a real mathematician like Chaitin, working
out fairly straightforward applications of known math is
not "doing math." I’m well aware of the
pure mathematician’s attitude, having started out
my career in this environment; and when I first started
I hoped that pure math could provide the solution to AI
– just prove the “Fundamental Theorem of Mind”
and you’re in like Flynn … real AI is yours.
But it doesn’t seem to work that way; mind isn’t
that sort of thing.
Of course, the “it’s all engineering,”
“it’s all neuroscience and fast hardware”
and “it’s all the right math formula”
type views are extremes. Most of us involved in Real AI
theory or practice probably hold views that are intermediate
between these extremes. It's the extreme views that get
remembered and propagated because they're so compact to
state. My own view is an intermediate one: I think it takes
a mixture of philosophy, neuroscience, math, engineering
and psychology. When I started out I underestimated the
importance of the "engineering" part, but recognizing
that importance doesn't mean denigrating the importance
of the other aspects. It is precisely the need for integrative
input from so many domains of inquiry that makes digital
mind design (as opposed to simple digital emulation of the
brain cell by cell or molecule by molecule) so hard –
and so delightful.
|
|
|