 |
| |
 |
| |
|
| |
 |
Coauthored with Ted Goertzel
I remember heading home from college for spring break in
1983, toward the end of my freshman year. I’d just
recently turned 16, and I’d been thinking about AI
a hell of a lot – even more than about my new girlfriend,
Rachel Gordon, whom I was pretty darn crazy about at the
time. A few days before spring break I’d tried to
explain my theories on artificial intelligence to my friend
Ken Silverman. Ken couldn't understand what I was talking
about, so I promised him I'd work on it over spring break,
and that when I got back to school I’d explain how
it all worked, I’d give him a complete design for
a thinking computer program. I had the idea clear in my
head, but I was totally unable to articulate it in a way
that Ken or anyone else could understand. I spent the whole
break working on it, and during those few days I basically
worked out the ideas that I’d later put in my first
book, The Structure of ntelligence, six years later. I went
through every aspect of the mind - reason, memory, aesthetics,
intuition, emotion, etc. - and convinced myself that every
one could be expressed in terms of pattern recognition and
pattern formation. The mind, I concluded, was a pattern
recognition system that recognized patterns in the world
around it, and – very crucially -- also recognized
patterns in itself. Recognizing patterns in itself, it formed
patterns within itself, continually giving rise to new structures.
After the break, I still wasn’t able to explain my
realizations to Ken in a way that made sense to him, but
at least things made a little more sense to me. I knew I
had to find a mathematical language to make sense of my
intuitions, or I’d never be able to communicate them
to anyone, let alone program them on a computer. My grasp
of software design at this stage was extremely weak; it
was formed mainly by programming games in BASIC. I was nowhere
near having the skills to design a general pattern recognition
system that recognized patterns in itself and adjusted itself
accordingly.
Ken's dad was an extremely smart guy and a prolific and
successful inventor, mostly in the area of electrical engineering;
and in our college years, Ken often fantasized about becoming
a rich inventor and building a mansion, with a basement
laboratory in which we’d putter around day and night,
wiring together intelligent robots and time machines and
so forth. So it’s pretty funny that 14 years later
when I decided to start an AI company (Intelligenesis Corp.,
later renamed Webmind Inc.) I somehow happened to turn to
Ken when I needed someone to take over the job of programming
my AI system, the Webmind AI Engine.
I hadn't spoken to him for years – he’d stayed
in the New York area, where he had grown up, whereas I’d
moved all over the world, teaching in universities in Las
Vegas, New Zealand and Australia. After getting his degree
in electrical engineering, he’d done a lot of different
things, including real estate and computer programming.
He was really psyched to finally get the chance to collaborate
with me on my thinking machine project. Finally, after a
decade and a half, I had figured out how to express my plan
for artificial intelligence in a way Ken could understand!
Ken was the lead engineer at Webmind Inc. for its first
couple years, and VP of Technology for the entire lifetime
of the firm. Now I’m working with a different crew
of engineers, and Ken is working on his own advanced pattern
recognition software, but we’re still good friends,
and he definitely played an important role in the evolution
of my work.
To articulate my vision of the mind in a comprehensible
form was much much harder than I’d ever thought it
would be. It turned out that the vocabulary for expressing
what I wanted to say didn’t really exist in the field
of computer science. To find the language I needed to express
my ideas and to work out the details, I had to step a long
way back from the world of computers and get deeply into
the philosophy of mind. Although I was very young then,
and even more naïve than I am now, I realized intuitively
that it was necessary to get the philosophy right before
proceeding to the computational details. Now, I’m
jaded by a fair amount of practical experience – though
I don’t have a head full of gray hair yet –
and I see this far more clearly than I did then. In implementing
a general vision of how the mind works, it’s very
easy to be misled by the nature of contemporary computer
hardware and programming languages, and to wind up implementing
things that subtly deviate from the vision one started out
with. The way to avoid this is to have the conceptual, philosophical
vision very firmly fixed in one’s mind as one sets
about the detailed design work, which is huge and at times
confusing.
What I’m going to give you in this chapter is a fairly
sketchy, but hopefully evocative, overview of the process
of creating Webmind and then Novamente. The two AI systems
are very different on a technical level, but on the level
of a popular exposition like this one, the differences are
really pretty small. Novamente uses more sophisticated mathematics
and more efficient software structures to implement the
same basic concepts that Webmind did. To keep things from
getting confusing, I’ll write mostly about “Novamente”
here, except where I’m talking historically about
the creation of Webmind in particular; but most of what
I’ll say about Novamente also applies to Webmind.
The Novamente project is far from complete. Just like every
other AI researcher, I’m an abject failure so far
– I haven’t yet created a software program displaying
human-level general intelligence. Unlike most other AI researchers,
however, I and my colleagues honestly believe we are on
a path that will lead us to success at this ambitious goal.
I don’t expect to convince you one way or another
in these pages – my hope is merely that the story
of our quest may be an interesting one … and that
some of the lessons we’ve learned along the way may
be of general value.

I
knew from the start that I didn’t want to build an
artificial idiot savant – an overspecialized, brittle
system as was typical in the AI field. I wanted to build
a mind.
But what is a mind, anyway?
In that spring break, sophomore year, that I spent trying
to figure out how to explain my vision of the mind to Ken,
I arrived at a basic working definition of the mind: a mind
is the set of patterns in an intelligent system.
Your mind is not your brain, nor is it some disembodied
soul somehow exchanging messages in your brain. Your mind
is the set of patterns in your brain – the structures
and processes in your brain, so that knowing these structures
and processes allows you to explain the brain more simply
than just listing the parts of the brains and their positions
and states over time.
Novamente’s mind is not the C++ code that my engineering
team and I type in – that’s just a code for
creating the mind, a little like DNA is the code for creating
a human. Novamente’s mind is the set of patterns in
the billions of 0’s and 1’s existing in RAM
while Novamente runs, cycling through the machine’s
processors and passing through the network cables. These
0’s and 1’s themselves are not Novamente’s
mind – it’s the patterns in these 0’s
and 1’s, the static and dynamic patterns, that are
mind. Mind is a set of patterns in a system that achieves
highly patterned goals in a highly patterned environment.
Everything is pattern, pattern, pattern!
Mind recognizes and creates patterns in the world and itself,
achieving complex goals, goals whose definition involves
a great deal of pattern.
Although these ideas were clear to me intuitively in 1983,
it wasn’t till 1990 or so that I was able to write
them down in a clear and comprehensible way. This is what
I did in the first few chapters my first book The Structure
of Intelligence. At that point I had gotten my PhD in mathematics
and was supposed to be doing mathematical research, but
just as I’d always been more interested in my own
reading and thinking than in my schoolwork, now I was spending
my time thinking about pattern and mind and the nature of
the universe, instead of proving math theorems like a good
assistant professor. The next step was to ask the question:
What are the principles by which a set of patterns, a mind,
can actually be intelligent? For sure, the precise structures
and dynamics are going to vary from one mind to the next,
but are there any general principles, applicable to every
kind of intelligent system, be it a human, a dolphin, a
computer program, an intelligent gas cloud on Jupiter? It’s
not totally obvious that there are such principles, but
my belief starting out was that such general principles
had to exist. What are the principles by which mind’s
core algorithm - pattern recognition and formation in itself
and the world -- is self-regulated?
One general principle is what the 19’th-century American
philosopher Charles Peirce called the “One Law of
Mind”: that things in the mind tend to spread attention
to other related things in the mind. This is a basic principle
for attention allocation, that we can see in the brain in
the diffusion of electricity. Novamente incorporates this
via activation spreading similar to that in a neural network.
This is what I call a “heterarchical” principle
– where a heterarchy just means a sprawling network
in which each element connects to a few other elements,
without a hierarchical structure. A random network in which
each node connects to a set of other nodes at random is
a heterarchy.
Hierarchy is another important structure of the mind. We
see it in the human brain all over the place, most famously
in the visual system, where we have a hierarchy of progressively
more abstract processes, starting with rccognition of lines
and edges, then shapes, then 3-D forms, and so forth. Hierarchy
in the mind has to do with increasing abstraction, and with
control that’s aligned with abstraction, so that processes
dealing with more abstract things control related processes
dealing with more concrete things.
A general principle that I’ve thought a lot about
– and that I wrote about in my second book, The Evolving
Mind -- is what I call the “dual network” –
this refers to the interpenetration of hierarchy and heterarchy.
In the mind, hierarchy and heterarchy overlap each other,
and the dynamics of the mind is such that they have to work
well together or the mind will be all screwed up. The overlap
of hierarchy and heterarchy gives the mind a kind of “dynamic
library card catalog” structure, in which topics are
linked to other related topics heterarchically, and linked
to more general or specific topics hierarchically. The creation
of new subtopics or supertopics has to make sense heterarchically,
meaning that the things in each topic grouping should have
a lot of associative, heterarchical relations with each
other. In Novamente, this general “dual network”
principle is reflected in many ways, when one gets down
into the details of its various dynamical processes.
Another general principle is self: that minds contain parts
of themselves that mirror the whole. This gives a quasi-fractal
structure to the mind.
Another general principle, also discovered by Charles Peirce,
is that there are three kinds of reasoning: induction, abduction,
and deduction. These are all ways of manipulating hierarchical
relationships. Hierarchy is about logic, whereas heterarchy
is about the spread of attention and the formation of wholes.
Once heterarchy has lead to the formation of new wholes,
corresponding to clusters of things that all relate to each
other, then these new wholes can be dealt with hierarchically,
they can be reasoned about. I was very fortunate, a month
after Intelligenesis got our seed funding, to get a job
application from Pei Wang, who had worked out a neat computational
reasoning system (NARS) based on the three forms of reason
that I, following Peirce, had identified as essential to
the mind.
There are also two dynamics that I believe are generally
part of mind. These correspond to the basic philosophical
principles of Being and Becoming.
Becoming corresponds to evolution, considered most generally
as the survival of the fittest members of a population,
and the reproduction of the survivors to form new population
elements. Novamente contains explicitly evolutionary components
– variations on the computational technique called
“genetic programming.” It also contains other
components that aren’t traditionally viewed as evolutionary,
but really are. For instance, Novamente’s reasoning
module involves logical relations (we call them “links”)
that combine with each other to create new logical relations.
The facts “pigs are fat” and “fat creatures
are ugly” combine to create the new relation “pigs
are ugly.” And in the reasoning system, unimportant
relations are deleted to save memory. Thus, we have survival
of the fittest, where fitness means importance to the system,
and we have reproduction of the survivors, via the rules
of inference. Reasoning is seen to be a form of evolution,
in the general sense.
Being corresponds to what system theorists call “autopoiesis”
– an obscure word that has a very useful meaning.
It means self-production. Every cell in the body is produced
by other cells in the body – so the body is a self-producing
system. The mind is also a self-producing system. This is
basically the theme of my third book, Chaotic Logic. If
you remove part of the mind, the other parts of the mind
that relate to it will be able to reproduce it, approximately
if not exactly. If you take out the logical relation “pigs
are ugly” for example, the system may be able to regenerate
it by inference from the other relations “pigs are
fat” and “fat creatures are ugly.” It
may come out with a different strength than it had before,
but it will still be reproduced, perhaps lossily. If you
take out all memory of the text “War and Peace”
from the mind, but retain a lot of related knowledge, this
related knowledge will cause the system to want to read
War and Peace, which eventually will likely lead the information
about the text to be regenerated. In this case, interaction
with the environment is part of the mind’s autopoietic
dynamics.
Evolution changes the system in accordance with its goals
and its environment; autopoiesis keeps the system the same
as it was before. The mind needs both of these forces; they
need to be properly balanced. The balance of these leads
to productive creativity, and this was the main theme of
my fourth book, From Complexity to Creativity.
I arrived at my list of general principles of the mind by
a kind of unholy combination of introspection, mathematical
analysis, and survey of biology, psychology and computer
science. I spent a long time trying to prove mathematically
that all these general structures and dynamics, and a few
others, were necessary and sufficient for mind – any
system having them would have a mind, and any system not
having them couldn’t have a mind. But eventually I
gave up; I decided that the mathematics of today is not
adequate for proving this kind of thing. I gathered my various
insights and intuitions and conclusions about how the mind
worked, and gave the list a name: the psynet model of mind.
Psynet = “mind-network”, a theory of the mind
as a network of interacting, intertransforming agents. I
realized that the conceptual picture of the mind that I’d
developed was of significant value in itself, apart from
any mathematical formalization I might give it. No one else
working in the AI field seemed to me to have a similarly
comprehensive and powerful conceptual analysis of the mind.
I still think inventing the needed mathematics to usefully
and completely formalize the psynet model is an interesting
challenge – but it’s not as interesting to me
right now as using my intuitions about the general structures
of intelligence to build thinking software.
The general structures and dynamics of the “psynet
model” can be manifested in many many different ways,
in different systems. The process of building Webmind, and
then Novamente, has been in this sense a top-down process.
I started out with an idea about what general principles
had to emerge from the system to make it intelligent, and
this placed a constraint on what the system had to be like.
It had to be built so as to make the right general structures
and dynamics emerge. Aside from that, I didn’t care
very much exactly what the system was like. I had, and still
have, an attitude of being willing to learn via experimentation
in this regard.

My
first serious attempt to build a real AI system (earlier
chatbots and abortive experiments not counted) occurred
in 1994. I used a programming language called Gofer, which
I later benchmarked at 1/10,000 the speed of C (the standard
programming language in the commercial world). Gofer was
a beautiful language, which matched up nicely to my vision
of the mind. This program was called Antimagicians; it was
a population of actors called magicians, and antimagician
actors that annihilated the magicians in complex patterns.
Just about all it ever did was produce a type of error called
a “stack overflow.” This was a shame, because
my model of mind was very simple and compact in this programming
language. But it could only run on one machine, and it ran
incredibly slowly; the only thing it did fast was use up
all the machine’s memory.
Gofer was a “functional” programming language,
meaning not that that it performed useful functions (far
from it!), but rather that it was based on the mathematical
concept of a “function.” Gofer was basically
equivalent to mathematics. It appealed to my sense of formal
elegance; it was perfect in the sense of a Bach fugue. Unfortunately,
though, functional languages do not match well to the von
Neumann computer architecture, so it is very hard to make
them efficient without special hardware. After the debacle
of my stack-overflowing proto-AI system, I abandoned Gofer
and turned back to C++, and then to the new programming
language Java. But I restricted myself to more modest programming
experiments. I made a C-language version of Antimagicians,
which was much simpler and less interesting than the Gofer
version. In Java, I made a genetic algorithm that ran on
multiple machines (coded together with Rosalind Barr at
University of Western Australia), and a simple actors-based
search engine (coded together with Mark Messenger, also
at UWA). I could see from this experience that, while my
AI system in Gofer had small, because Gofer was made for
expressing systems that refer to and organize themselves,
a comparable system in C++ or Java or any other practical
programming language was going to be huge. It took a couple
years for me to summon the guts to attempt such a thing.
One thing that occurred to me as I started to think about
implementation issues, much more than it had in my days
as a pure theorist, was the crucial role of specialization.
My Gofer-based mind had been theoretically capable of intelligence;
it was a general system for recognizing and forming patterns
in itself and its environment. But its generality didn’t
allow it to solve any particularly useful problems within
practical time and space constraints. In that sense, it
had been a miserable failure as an intelligent system. In
practice, I concluded, to get reasonably efficient intelligence
one needs to code specialized cognition algorithms, aimed
at recognizing patterns in particular kinds of data, learning
how to carry out particular kinds of actions, and so forth.
The brain is very much like this: we have 30% of our brain
specialized for visual pattern recognition; regions specialized
for language; regions specialized for body sensations; regions
specialized for social interaction; etc. etc. And then we
have a little bit of general intelligence, which is what
makes us uniquely brilliant among the animal kingdom –
but this general intelligence relies on all the specialized
stuff to give it a meaningful context within which to operate.
Specialization needs to be mediated by rich interaction
between specialized parts. The different specialized parts
of a system need to learn from each other, and learn about
the world together whenever they can. The integration of
various specialized pattern recognition subsystems has played
a huge role in practical Webmind engineering.
Because of all this specialization, it seemed to me in 1994
and 95 that there was no way to build a thinking computer
program on contemporary computer hardware. It seemed to
me that some kind of humongous brainlike supercomputer would
be necessary. And then I discovered the Internet (unlike
Al Gore, I didn’t invent it!). It struck me that the
millions, soon billions, of machines around the world, all
hooked together on the Net, had enough memory and processor
power to create a real computational intelligence. The Java
programming language came out in 1995 and it seemed the
right tool to use to create a networked AI engine embodying
the general principles of mind: recognizing and creating
patterns in itself and the world, using a variety of specialized
methods integrating together into a whole, an evolving autopoietic
whole.
Not only did the Internet give you the computational power
to build a thinking machine; it also provided a really rich
perceptual environment. A mind can’t exist in isolation;
it has to achieve complex goals in a complex environment.
The physical world is obviously complex but building a robot
body is another huge project, comparable in scope to building
a mind. The Internet is arguably rich enough in diverse
details to support intelligence, and it’s a lot easier
to hook your AI system into the Internet than to build it
a robot body. I made up my own complex goal: To build an
AI system whose body was part of the Net, and whose perceptual
world was the Net itself, the Web. A mind for the Web; a
Webmind.
In terms of the conception of intelligence as “achieving
complex goals in complex environments,” the goals
I had in mind when designing the Webmind system were roughly:
* Conversing with humans in simple English, with the goal
not of simulating human conversation, but of expressing
its insights and inferences to humans, and gathering information
and ideas from them.
*
Learning the preferences of humans and AI systems, and providing
them with information in accordance with their preferences.
Clarifying their preferences by asking them questions about
them and responding to their answers.
·
* Communicating with other AI systems, in a manner similar
to its conversations with humans, but using a mixture of
human language and a more formalized and precise computerized
language we have created, called Sasha
*
Composing knowledge files containing its insights, inferences
and discoveries, expressed in Sasha or in simple English.
*
Reporting on its own state, and modifying its parameters
based on its self-analysis to optimize its achievement of
its other goals.
Of course, my ambitions didn’t end there – that
would be wimpy. Subsequent versions of the system were intended
to offer enhanced conversational fluency, and enhanced abilities
at knowledge creation, including theorem proving, scientific
discovery and the composition of knowledge files consisting
of complex discourses. And then of course the holy grail:
progressive self-modification, leading to exponentially
accelerating artificial superintelligence!
I remember a particular moment when my diverse ideas about
AI crystallized in my mind, with amazing clarity. I could
see in my mind exactly how an AI system could be built.
Now all that was left was to work out a few pesky details.
At this point, it had been 13 years since I’d first
set myself the goal of building a thinking machine. I now
had a PhD in math, and had spent countless thousands of
hours studying of cognitive science, physics, computer science,
neurobiology, philosophy of mind. I’d published four
books on the mind, which were idiosyncratic combinations
of mathematics, philosophy and science, all pushing in the
same direction, toward an understanding of the mind that
was both fundamental and precise. I felt I finally had the
answer. And it seemed that the hardware was finally getting
there too. We had cheap computers with gigabytes of RAM,
and we had high-bandwidth Ethernet and Internet, allowing
distributed computing among dozens or even millions of these
powerful, cheap machines, etc.
It all seemed incredibly clear to me. Mind was exquisitely
simple in essence. A mind was a web of patterns, a network
of independent mind actors, each one concerned with recognizing
patterns in other actors, and patterns emergent between
itself and other actors. New actors were created to embody
new patterns. The overall network of mind was continually
re-making itself via recognizing patterns in itself. The
character of a particular sort of mind was determined by
the assemblage of pattern recognition/creation actors inside
it. The art of mind design – an as yet nonexistent
art – would consist of choosing the right assemblage
of types of actors so that the emergent self-reconstructing
behavior of mind would get into a productive dynamical attractor.
From my 13 years of thinking about human and artificial
intelligence, I felt I had a good idea how to choose and
design the right mind actors, so that when these actors
were released to study and transform one another, the self-reconstructing,
self-recognizing dynamic characteristic of mind would emerge.
And so in the fall of 1996 I started creating the Webmind
AI Engine. As I’ve said, I’d been working on
similar things off and on for years; but the actual design
of the Webmind system as it is now was something I started
in the fall of 1996, when I was in Western Australia, working
at UWA as a Research Fellow. Soon enough this got more interesting
than anything else I was working on -- I realized that I
was on the verge of something really cool, and something
that I wasn’t going to be able to implement myself,
or with a couple research assistants. John Pritchard, my
New York e-mail pal, was convincing me that it was plausible
to get funding to start a company building software according
to my designs. The idea was appealing.
At the start of 97 I quit my job at UWA and moved to the
US to work on Webmind design and coding full time. I didn’t
have any clear business plan in mind, but I figured that
once I got some clearly intelligent behavior working, the
venture capitalists would beat a path to my door. Naively
enough, I figured I’d reach that point after a few
months hard work. I figured that after I got some basic
stuff working, I could raise a few hundred thousand dollars
to pay perhaps 5 programmers, and then we’d get the
whole thing implemented in 6 months time – presto!
a thinking machine. Fame and fortune, and truckloads of
beautiful girls, would be mine.
What I had at the end of summer 1997 was ten thousand lines
of Java, largely designed as I went along. This system was
never completed, and of the parts that were completed only
half of them worked. There were lots of details I didn't
understand. This first serious attempt at Webmind had too
much of my theory of mind in it, and not enough computational
practicality. It was beautiful as a mathematical and logical
statement, but still horrible as a computer program. I still
was too closer to Gofer, and hadn’t come to grips
with what I’d have to do to make a useful, efficient
implementation of my model of mind.
But still, the ideas, data structures and dynamics underlying
this first Webmind were conceptually about the same as the
ones underlying Novamente today. The mathematics and the
software design have both changed tremendously, but the
underlying vision is the same. Novamente, like Webmind before
it, is based on the idea that the mind is a collection of
patterns that forms and recognizes patterns in itself and
the world, and in this way achieves complex goals in the
world. It makes this vision concrete by defining some simple
software objects corresponding to patterns and goals.

In
the mid-90’s, starting out on Webmind design, I had
basically a comprehensive knowledge of what was happening
in the AI world. It was a mess. It’s basically the
same way today. There’s no well-understood, commonly
accepted body of scientific knowledge about AI. Instead,
there’s a vast diversity of approaches to various
aspects of the relationship between computation and mind.
Some of the approaches contradict each other and some of
them complement each other. Designing Webmind was a process
of assembling information from various different perspectives
and disciplinary areas into a coherent whole, guided by
a set of governing principles.
Many different subdisciplines within the AI umbrella contributed
to the structuring of Webmind, and then Novamente. Some
of them I was thinking about when I first started designing
Webmind, others emerged as being significant more recently,
further along in the design process, in some cases only
in the transition from Webmind to Novamente. Table 1 gives
an overview of the sorts of things that Novamente draws
from various disciplines. It may be a bit opaque to the
nontechnical reader, but it will mean something to the reader
with some computer science background, and perhaps to others
it will be at least generally evocative.
|
Cognitive
Psychology |
From
cog psych we have taken a number of high-level
structural principles, for instance the notions
of Long-Term Memory, Episodic Memory (memory
of your own history), and Short-Term Memory;
and the distinction between procedural (knowledge
of how to do thing) and declarative knowledge
(factual knowledge).
|
| Introspective
Psychology |
Modern
cognitive psychology is experimentally focused,
but past traditions in psychology have more
openly drawn their inspirations from introspection,
from what each mind intuitive knows about itself.
The overall structure of Novamente owes something
to ideas drawn from these traditions, from Gestaltism
to Buddhist psychology and Peircean philosophy.
|
| Neuroscience |
From
neuroscience we have taken the observation that
mind can be implemented by a parallel distributed
system with activation spreading around it
in complex patterns – i.e. a ‘neural net’, broadly
conceived. We’ve also taken our approach to
localization from what’s known about the brain:
in Novamente, knowledge is distributed, but
not across the whole system; each type of knowledge
is distributed across a part of the system,
just as is done in the brain.
|
| Complexity
Science |
The
emerging science of complex systems has contributed
crucial concepts such as self-organization,
evolution, autopoiesis and emergence. Novamente
is a modular system in which the real intelligence
emerges from interaction between the modules.
Like many complex systems, it displays behaviors
like phase transitions and sensitivity to initial
conditions, and evolution-ecology interactions.
|
| Nonlinear
Dynamics |
One
of the more rigorous subsets of complexity science,
nonlinear dynamics studies the attractors and
transient patterns that emerge as nonlinear
systems evolve over time. Novamente is a highly
nonlinear dynamical system whose attention is
allocated by complex attractor dynamics, and
that specifically studies transients in its
own dynamics so as to self-adaptively modify
its own structure.
|
| Statistical
Pattern Recognition. |
In
its analysis of numerical data (e.g. financial
forecasting) and its lower-level linguistic
processing, Novamente makes use of statistical
pattern recognition tools. What makes it unique
is its ability to integrate statistically recognized
patterns with other types of knowledge, and
to generalize from this knowledge via inference
and other mechanisms.
|
| Multi-Agent
Systems |
With
the advent of distributed and parallel computing,
there is a substantial body of knowledge about
how to make populations of computational agents
cooperate to carry out useful activities. Novamente
is a multi-agent system, albeit a very unusual
one, and its system architecture makes use of
principles from this area of computer science
in many ways.
|
| Computational
Linguistics |
The
last decade’s explosion of knowledge in computational
language processing has produced many techniques
of use within Novamente. The challenge has
been to get all these tools working together
in a common framework focused on extracting,
creating and producing meaning rather than on
syntax analysis
|
| Expert
Systems |
Novamente
allows humans to enter expert knowledge into
it via XML, Sasha or other special formal languages,
similar to standard AI expert systems. Unlike
expert systems, though, it doesn’t take this
knowledge as truth: it takes it as information
given to it by another mind, and feels free
to forget it or modify it as it sees fit.
|
| Machine
Learning and Optimization |
Machine
learning and optimization algorithms are not
real AI systems but they do solve problems that
are crucial to the mind. Novamente uses genetic
algorithms, genetic programming, and statistical
machine learning techniques for various purposes,
internally.
|
| Logic |
While
Novamente is not a logic system in the traditional
sense, it makes use of the reduction of general
relationships to a simple relational formalism,
which was pioneered by mathematical logicians
and logic-inspired AI engineers. It manipulates
relationships using uncertainty-robust, self-organizing
reasoning techniques different from those used
in the logic or AI literature
|
|
Table
1 - Novamente’s Diverse Inspirations
|
Obviously,
this laundry list of component technologies doesn’t
really tell you a damn thing about Novamente. That’s
because the crux of Novamente lies, not in the component
technologies, but in the way these technologies are structured
to form a coherent self-organizing system. But still, the
presence of all these tools made the process of building
Novamente very different than it would have been if none
of the tools existed, and you had to build every component
technology from scratch. Rather than just “how do
you program a mind on current hardware and software?”,
the question becomes more like “Given all these wonderful
tools, and amazingly powerful distributed hardware on which
to implement them, how can we tie them all together in a
harmonious and mutually adaptive way to produce a mind?”

Given
the general conceptual framework I’ve described, and
the practical and conceptual toolbox I’ve listed,
the first step toward actually designing Webmind was deciding
what the “atomic mental object” should be.
Bigger than a neuron, smaller than a machine, was the first
decision. I created a Java object called a Node. A node
is the most basic kind of pattern known to Webmind –
it’s something Webmind recognizes as a whole. A node
says, “This thing is worth distinguishing from its
environment as a whole entity. Here it is. It persists and
maintains its boundaries over time.” We have some
nodes referring to external sensed objects: TextNodes, DataNodes,
WordNodes, and so forth. We have some nodes representing
patterns recognized in the system itself rather than in
the outside world: CategoryNodes of various kinds, AutomatonNodes
representing evolved patterns, etc. There are nodes called
SubgraphImageNodes that represent parts of the mind, grouped
with a boundary drawn around them so as to be considered
as a kind of higher-order individual. And so on, and so
on, and so on.
But nodes are just the start. Webmind is also wired to recognize
certain kinds of patterns involving nodes. Similarity is
the most basic kind of pattern: it’s the recognition
that two different things, occurring at different points
in space or time, are actually a lot like each other, and
can be interchanged for many purposes. Inheritance is also
basic: it’s the recognition that you can substitute
A for -- (though maybe not -- for A) without substantial
loss of information.
How many link types to incorporate was a big question. In
the AI systems known as semantic networks, you have a different
type of link for every relation in the net – a link
type for kick, a link type for eat, and so forth. On the
other hand, in a typical neural net model you have only
one link type; whereas in the brain, there are many types
of neurons and synapses – hundreds of link types,
if you identify a link type with a synapse that’s
reactive to a certain neurostransmitter.
In designing Webmind, we didn’t want to introduce
too many types of links, because this just leads to a network
that represents data in ways it doesn’t understand.
We chose to use a few dozen link types, representing what
I think of as archetypal types of relationships.
What kinds of relationships are “archetypal”
for Novamente? Here I’ll just give a few important
examples. We have similarity links, representing the belief
that one actor is similar to another. There are inheritance
links, representing the belief that one actor is a special
case of another. There are spatiotemporal links, representing
the belief that one actor represents something occurring
near the other one in time or space. There are containment
links, representing the belief that the entity represented
by one actor is contained inside another one. There are
associative links, representing simply the fact that Webmind's
dynamics tend to associate one actor with another. This
chart shows the definitions of these links in a bit more
systematic way:
| Link
Type Pointing from A to B |
Meaning
of the Link |
| Similarity |
A
is similar to B |
| Inheritance: |
|
|
by Extension |
A
is a special case of B |
|
by Intension |
B
is a special case of A |
| SpatioTemporal |
A
occurs at the same time and place as B |
| Temporal |
A
occurs at the same time as B |
| Before |
A
occurs before B |
| After |
A
occurs after B |
| Containment: |
|
|
Part of |
A
is a part of B |
|
Contains |
B
is a part of A |
| Associative |
B
is associated with A |
|
HaloLink |
B
is associated with A by Webmind's Dynamics |
These
link types, and others refining and extending these, are
the elemental types of relationships that Webmind “understood.”
They are a bit, but not a lot, like the various neurotransmitter
receptors in the brain, which make different synapses different.
The brain's receptors do not correspond so neatly to logical
relations. But Webmind is not a brain; it is a mind that
emerges out of digital computer hardware. Digital computer
hardware is closer to logic than cells are.
These links are heterarchal in a sense; any node can link
to any other node. But they are also organized in hierarchies
of composite actors representing, not specific relationships
like links, but collections of relationships. Nodes contain
links; nodegroups contain nodes, lobes contain nodegroups,
and the mother of them all: the Psynet, the whole Webmind,
that contains a lobe for each machine in its network. The
basis of it all is the node: a node containing a bundle
of links expressing its relationship to other nodes, and
also some basic data objects and actors and roles. Nodes
sending out messages -- information gathering and information
carrying actors -- of various types to help them build new
links to other nodes. A gigantic network of interlinked
actors, constantly rebuilding itself, extending across multiple
CPU's and multiple machines.
The nitty-gritty engineering needed to make this all work
is considerable indeed. But the basic concepts are elementary.
It's nothing but Peirce's network of relations, each spreading
attention to the other relations that it stands to in a
peculiar relation of affectability. It's nothing but Nietzsche's
dynamic quanta, each one defined in terms of other dynamic
quanta, each one re-creating itself and each other. It's
beautiful and primal -- but it's not intelligent, without
more detail, more specialization. It’s like the brain
of an infant. All the core abilities are there, but intelligence
develops as it incorporates and processes specialized information.
It’s easy to see how both node and links are patterns
in the sense that they allow one to compress information.
If two parts of something one is describing are similar,
one can save effort by not describing the second one in
detail and just describing it approximately by reference
to the first one. For instance, to describe a picture consisting
of two similar heads, you can draw one head and then just
say “imagine two of these next to each other.”
If one of the parts of the picture inherits from the other,
one can save effort by replacing the more specific one with
the more general one. Of course, there is a loss of information
here. Suppose half of the picture is a general human shape,
and the other half is my shape. My shape inherits from the
general human shape, obviously. But if you describe the
picture by drawing the general human shape and saying “two
of these,” you’re losing a fair bit of information,
though certainly not all of it.
Similarity and inheritance are logical relations, logical
patterns. We also have purely observational patterns, like
temporal relatedness, spatial relatedness, and part-whole
relatedness. And we look for general association relations:
When the system thinks of X, what Y comes to mind? This
Y stands in an associative relation to X.
Nodes in Webmind contain links to other nodes, each link
embodying one of these basic inter-node relationships: similarity,
inheritance, part/whole, spatial, temporal, associative.
Nodes and links are the two levels of pattern that are automatically
and instinctively recognized by Webmind: nodes representing
perceived wholes carved out of the chaos of the world or
mind, and links representing patterns perceived among the
nodes.
We then have special methods of building links. The method
we used most in Webmind (but have basically abandoned in
Novamente) was one I came up with in 1996, inspired by Web
spidering, called Wandering: we have actors that move around
through the network of nodes, traveling from node to node
along links, looking for nodes that are strongly related
and should be joined by new links. This particular method
of link formation may or may not be the best. The key point
is that there is some dynamic by which new and relevant
links are continually formed.
Relevance is determined by how much “activation”
each node has, and activation is spread through the network
by Peirce’s Law of Mind, which is the same at to say,
by basic neural net activation spreading. The Java object
that carries activation through Webmind, we call a Stimulus.
Associative links are built by a process we call “halo
spreading,” in which a node gets active and then measure
how active other nodes become as a consequence, after a
certain period of time. It spreads Stimuli to other nodes
and then collects them after a while, observing how stimulated
they’d become.
Again, there are a lot of ways of doing these things, and
the current ways may or may not be the best. The exact method
of spreading activation or halos is not crucial to Webmind,
but rather just the overall character of the patterns being
recognized and formed.
Halo spreading and reasoning and wandering form new links,
but it’s also crucial to form new nodes, and this
is done by combining old nodes in various ways (fusing them,
splitting them) and also be explicitly evolving new nodes
to satisfy various goals using special nodes called EvolverNodes.
The achieving of goals, crucial to intelligence, is done
using nodes that we now call SchemaNodes, which contain
little programs that control aspects of perception, action
and thought. Perceptions from the outside world come into
Webmind and are translated into nodes right away. These
nodes link to other nodes representing contexts that the
system is operating in, and these contexts link to SchemaNodes,
representing things that might be desirable to do. The goals
as well as the contexts link to the schema, so that the
hottest schema will be the ones that are relevant to the
current goals in the current contexts. Schema look into
the long-term memory of the system and grab out the various
nodes and links contained therein.
There’s also a SelfNode, recording the history of
the system – what psychologists call “episodic
memory” – and predicting the future of the system,
and selecting the system’s goals according to the
metagoal of maximizing system happiness. Yes, we have a
Happiness FeelingNode, and nodes for other basic emotions,
complex emotions being considered combinations and mutations
of simple ones. What makes the system happy – we get
to decide at first, until it mutates and modifies its own
HappinessNode just like we do. Right now, it likes to answer
questions people ask it, it likes to save memory, and it
likes to build a lot of high-strength links – i.e.,
to discover a lot. Schema look into the SelfNode to get
their overall motivation.
Many goals involve making others happy, and for this, models
of other minds need to be maintained; this is done in UserNodes.
There is a loose mapping between these data structures and
things in the brain. Nodes are a bit like neuronal groups
– clusters of 10,000 to 100,000 neurons, that sort
of act as a unified whole. Links are sort of like bunches
of neural connections between one cluster and another. This
intuitive mapping onto the brain can be useful, and it’s
surely not a complete fluke that the structure of the brain
is a lot like the structure of the mind that emerges from
the brain. On the other hand, it’s important not to
overblow the very loose neural modeling aspect of Webmind.
Webmind was supposed to be a mind, not a model of the human
brain, and it’s a definite failure at being a model
of the human brain, not surprisingly.
There’s a lot of complexity here, just like in the
brain. But basically, Webmind's architecture was that of
a massively parallel network, a population of many, many
different information actors – nodes, links, wanderers,
Stimuli spreading activation and collecting halos. The nodes
continually recompute their relationships to other nodes.
Queries put to the system are transformed into nodes that
take advantage of WebMind's self-evolving structure to produce
the needed answers.

All
this – plus or minus a few critical details, and a
lot of non-critical ones -- was outlined roughly and erratically
in some documents I wrote during Spring and Summer 1997.
Some things were designed in detail, others just hinted
at. Because so many details were left out, it wasn’t
quite clear to me, at that point, what a humongous system
this was going to become.
This was still pre-Webmind Inc.; I was working in loose
collaboration with a friend and programmer named John Pritchard,
who liked my thinking in a general way, but never really
came to grips with my ideas, except on a philosophical level.
He wanted to approach things by first building a general
Java infrastructure for dealing with AI, and then implementing
my particular AI theories – an approach which makes
sense, but only if the infrastructure is deeply informed
by the AI theories, which wasn’t the case then.
During summer 1997, John and I parted ways, and my friend
Lisa Pazer and I started the company that was initially
called Intelligenesis Corp., and later changed its name
to Webmind Inc. (because American businesspeople seemed
to have too much trouble spelling the orignal name!). At
that point I gave up coding 10 hours a day, turning that
responsibility over to my newly recruited old friend Ken
Silverman, and spending most of my time on design issues.
I was still coding a few hours a day at that point, but
not like before.
Ken learned Java in a couple weeks, and set to work. We
talked on the phone several hours a day, and he coded for
the rest of his waking hours. He ended up creating a new
Webmind from scratch, based on reading and reinterpreting
print-outs of my eccentric, tangled Java code. My first
version had been useless, but had followed the concepts
of my theory of mind fairly directly. Ken's version followed
the structure of Java more so than my theoretical ideas.
It was a colossal step backward in conceptual elegance.
But it had one fantastic redeeming feature: as of February
1998, it finally worked!
OK, in retrospect, it didn’t really work, but it looked
like it worked at the time. It wasn’t made to exploit
multiprocessor machines, or networks of machines. It wasn't
ready to serve as the infrastructure for the global brain.
It was too small to demonstrate any really interesting emergences,
any of the structures of mind I’d identified in my
theoretical work. But it was our first working prototype,
and we rigged it up to do some simple things like read in
a bunch of Web pages or numerical data series, and decide
which ones were similar to each other. No tremendous intelligence
was apparent yet, but we hadn't expected any. We'd built
the infrastructure for intelligence, but hadn't put in the
specialization that would allow the system to display useful
intelligence in particular areas.
It was very simple in concept, but very complex to actually
implement. We had a network of mental entities, each one
related to other mental entities, and each one constantly
revising its collection of relationships. Each node, and
each actor, was an "object" in the Java programming
language, which proved very well suited to our needs. Writing
Webmind meant writing Java "classes" for all the
different kinds of nodes, wanderers and other objects we
needed. Practical problems kept coming up, problems I had
never thought of when I was writing theoretical books and
scribbling notes on the back of photocopied research papers.
For example, what do you do when the system has recognized
too many relationships in itself and has run out of memory?
How do you decide which relationships to cull? How does
the system manage its time, allocating certain amounts of
CPU time to each node to use in building new relationships?
How does the system determine how much time to spend loading
in new information into new nodes, versus building new relationships
among existing nodes? And so on, and so on, and so on.
We also wanted to build up Webmind's thinking power. This
meant we had to keep increasing our palette of specialized
classes of nodes and links representing particular kinds
of relationships and concepts. The real intelligence, I
was certain, would then emerge from the interactions of
all these specialized nodes and links in the self-organizing
network. But before we could get there, there were dozens
of mechanical issues to be worked out, debugged, tested,
tuned.
In the very early days of Intelligenesis, before we got
funding, the work proceeded in pairs, each pair consisting
of me and someone else. Lisa and I worked on the business
plan and tried to raise money. Ken and I worked on the first
Webmind prototype, which ran on a single computer with a
single processor; Ken doing nearly all the coding, me giving
him designs and suggestions through endless phone calls
and meetings. Jeff and I were taking his nonlinear prediction
algorithms and making them more intelligent and flexible,
integrating them with some of my own AI work. Onar and I
were sending back and forth endless e-mails diagramming
what would later become the language learning component
of Webmind’s natural language system. And Paul, in
looser communication with me than the others, was designing
and coding the Pods system, a very nice system for doing
self-organizing computing on multiple machines and multiprocessor
machines.
In the spring of 1998, Ken integrated Webmind with the Pods
system, producing the first Webmind that had a prayer of
actually running on a lot of machines at once. This was
a system which could serve as the foundation for a global
mind. It exploited the power of Java even more fully than
Ken's first version had -- it was more "object-oriented,"
and used Java's network-computing facilities more thoroughly.
And then things went completely crazy. In a mostly good
way. Lisa finally got us funding, and we started hiring
programmers and scientists. People were coding nodes and
links embodying specialized kinds of intelligence. The system
got smarter, and things got far messier.
The most crucial hire was Pei Wang, a Chinese computer scientist
a few years older than Ken and me, who when we hired him
had spent the last 12 years developing a system of probabilistic
logic called NARS, the Non-Axiomatic Reasoning System. Within
a few months, Pei had integrated many of the ideas of his
NARS reasoning system into Webmind, providing us with a
handy nodes-and-links version of probabilistic logic. He
also introduced a lot of ideas into Webmind as a whole,
apart from its reasoning component. For instance, it was
Pei’s inspiration that every link in Webmind should
have four numbers associated with it: a strength telling
how significant the pattern represented by the link is;
a confidence telling how sure we are of the assessed significance;
an importance telling how useful the node is to the system
as a whole; and a decay rate telling you how fast importance
decays for that particular node.
Toward the end of summer 1998, we also hired Cassio Pennachin,
who at that point was just one among a handful of Java hackers
around the world whom I’d recruited through job ads
on Usenet. Cassio lived in Belo Horizonte, Brasil, and first
took on the job of fixing up some code I’d written
for evolving new structures in the mind using a variant
of genetic programming. This was the beginning of what’s
become an Intelligenesis tradition: Brasilian programmers
receive American code by e-mail and respond very politely
with comments like “Excuse me, but would you be terribly
offended if I made a few changes to this code?” Of
course, you say yes, and a few days later you receive a
completely new version of the software, containing exactly
three lines from your original code, but much better designed
and also more efficient.
Cassio proved to be an excellent manager as well as an excellent
software engineer, and I let him accumulate assistants until,
as of now, we have more than half our engineering staff
in an office Belo Horizonte, with Cassio as our overall
Director of Webmind Development. The Brasilians, so far,
have not made any big AI innovations, but the disciplined
approach to object-oriented design that they’ve brought
us has been just as important as our AI innovations, in
terms of getting Webmind, this humongous piece of Java code,
to actually work. The real importance of this aspect of
their work didn’t become apparent until the end of
1999, with their psycore redesign – but I’m
getting ahead of myself.
The rapidly increasing size of the Webmind codebase was
inevitable because the core code Ken and Paul and I had
written wasn't enough for intelligence in any practical
context. It was just a generic intelligence mechanism, a
self-organizing, relationship-building network. As we introduced
more and more specialized nodes into the system, the system
as a whole changed. New problems emerged. We should have
anticipated that this would happen, but we hadn't really
thought about it. We'd been too busy dealing with the challenges
of formulating the psynet model in Java in a network-friendly
way.
To deal with this blossoming of the Webmind code, in the
summer of 1998, Ken and Paul split Webmind into parts. The
central part, the one they had been working on, they called
Psycore. This contained the generic mechanisms for dealing
with nodes, links and wanderers. In a sense, this was Webmind's
operating system, the code that enabled all the parts to
work together. Then there were the Psymodules, one for each
specialized area of intelligence: natural language, reasoning,
numerical data analysis, etc. If we were to decode the DNA
code that generates the human brain, we might find that
it works in a similar way. The "psycore" would
be the DNA code that describes the features that are common
to all neurons, synapses and neurotransmitters. The "modules"
would be the DNA code which describes the distinct features
of the specialized types of neurons (there are dozens) and
neurotransmitters (there are hundreds), and the particular
patterns of neurons, neurotransmitters and synapses that
make up different parts of the brain.
The brain has hundreds of specialized parts devoted to tasks
such as visual perception, smell, language, episodic memory,
and so forth. Each of these parts is composed of neurons
which share certain fundamental features, but each also
has its unique features and capabilities that scientists
are only beginning to understand. Similarly, when a Webmind
is running on a computer, different parts of the computer's
memory are assigned to different tasks. Each of these parts
of the computer's memory draws on the psycore for its basic
organizational framework, and on more specialized modules
for advanced capabilities.
Each of Webmind’s modules is specialized for recognizing
and forming a particular kind of pattern. And all the different
kinds of nodes and links can learn from each other -- the
real intelligence of Webmind lies here, in the dynamic knowledge
that emerges from the interactions of different species
of nodes and links. This is how Webmind builds its own self;
it’s the essence of Webmind’s mind, of how Webmind’s
patterns create and recognize patterns in themselves and
the world to achieve their complex goals.
I’ll give a quick laundry list of modules, without
going into great detail on any of them.
There was a numerics module, containing data processing
actors that recognize patterns in tables of numbers, using
a variety of algorithms, some standard, some innovative.
DataNode embodies nonlinear data analysis methods and it
recognizes subtle patterns that’ll always be missed
by ordinary data mining and financial analysis software.
There was a natlang module, which deals with language processing.
The natlang module represents texts as TextNodes, linking
down to WordNodes representing words in the text, and other
nodes representing facts, concepts and ideas in the text.
It has text processing actors that recognize key features
and concepts in text, drawing relationships between texts
and other texts, between texts and people, between texts
and numerical data sets. These actors process vast amounts
of text with a fair amount of understanding and a lot of
speed.
The natlang module also contained reading actors, which
are used to study important texts in detail. They proceed
through each text slowly, building a mental model of the
relationships in the text just like a human reader does.
These reading actors really draw Webmind's full set of semantic
relationships into play, every time they read a text.
There was a category module, containing actors that group
other actors together according to measures of association,
and form new nodes representing these groupings. This, remember,
is a manifestation of the basic principle of the dual network.
There were learning actors, that recognized subtle patterns
among other actors, and embody these as new actors. These
spanned various modules, including the reason module, containing
logical inference wanderers, that reasoned according to
a form of probabilistic logic based on Pei's Non-Axiomatic
Reasoning System; and the automata module, containing AutomatonNodes
that carried out evolutionary learning, according to genetic
programming, a simulation of the way species reproduce and
evolve.
In the user module there were actors that model users' minds,
observing what users do, and recording and learning from
this information – these are UserNodes and their associated
Wanderers. There are actors that moderate specific interactions
with users, such as conversations, or interactions on a
graphical user interface. And in the self module there are
self actors, wanderers and stimuli that help the SelfNode
study its own structure and dynamics, and set and pursue
its own goals.
Each of these actors involved in the modules had in itself
only a small amount of intelligence, sometimes no more than
that you might see in competing AI products. The Webmind
core – “psycore”, as we sometimes called
it -- was a platform in which they can all work together,
learning from each other and rebuilding each other, creating
an intelligence in the whole that is vastly greater than
the sum of the intelligences of the parts.
The version of Webmind we completed in the summer of 1998
– the first multi-module version -- worked fine for
about a year. We used it to build the modules essential
for Webmind's core intelligence and for several impressive
applications. It included a module for text-based market
prediction; a natural language module for mapping texts
into networks of meanings; several modules for the evolution
of concepts according to different methods; a module for
Webmind's self-understanding; and so forth. The development
of each module was driven by requirements particular to
certain application areas. The financial modules were driven
by the practical need to predict the markets. The natural
language module was driven by the need to parse financial
text, and understand human queries. The concept learning
modules were driven by the need to learn concepts relevant
to financial prediction and to the processing of human queries.
The self-understanding module was driven by the need to
have the system proactively think about things that humans
were likely to ask it about in the future.
At this point, Webmind benefited greatly from the fact that
we weren't just implementing a theory, we were hard at work
developing practical applications. One of the most profound
pieces of advice I’ve ever received about Artificial
Intelligence came from Danny Hillis, who I discussed above
-- inventor of the Connection Machine parallel processor,
founder of Thinking Machines Inc., and an informal advisor
for Webmind Inc. throughout its lifetime. As we sat in the
South Street Seaport in New York eating dinner one day,
he was discussing a major AI company that had worked for
10 years to design an AI system, without considering in
detail any particular application of the system. Lo and
behold, the system had never done anything useful. Danny’s
comment was: “They were brilliant people with good
ideas, but they made a serious methodological error. They
developed their system for years and years, without any
contact with practical applications.” Our software
was saved from this fate by the fact that we were committed
to producing actual products, simultaneously with working
toward the goal of real AI. We were freed up to commit other
major errors instead!
The Webmind AI Engine itself was never used inside any production-version
software products, but it was used to prototype a number
of AI processes that were later re-implemented inside products.
One of these products, the Webmind Market Predictor, will
be discussed in detail in the following chapter. The reason
the Webmind AI Engine wasn’t used directly in products
was basically that it was too slow-running, and plagued
by hard-to-excise bugs. The Novamente system, as I’ll
discuss a little later, is a more mature effort and doesn’t
have these problems, and it’s being directly used
inside some software products we’re developing for
the bioinformatics market.

Working
on practical problems in parallel with grandiose long-term
goals was valuable – but it had its disadvantages
as well. It pushed us to overspecialize the system, hyperdeveloping
those portions that were needed for products, rather than
developing the whole system in a more evenly-balanced way.
Most of our code was good for the specific tasks it specialized
in, but we had not gotten to the stage where all the modules,
all the different node and link types, were working together
in one big multi-machine Webmind. We were producing cool
research software, but not the global brain I had dreamed
of. We hadn't yet seen the emergence of the dual network,
of the self. And we weren’t able to push straight
toward it because the particular portions of the system
needed for the Webmind Market Predictor – our first
product – needed so much attention.
But overspecialization induced by business needs was far
from our only problem. The truth, as we sourly discovered,
was that our core Java code, implementing the essence of
the psynet model of mind, was just barely adequate for building
products, let alone building real AI – it had too
many bugs and was poorly documented. Ken had implemented
this code brilliantly and painstakingly over a year of 15
hour workdays, but, even so, the task had been too big for
any one human. We could have fixed up his code to make it
product-ready, but we doubted whether we’d ever get
it to the point where it could support the global brain.
So, toward the end of summer 1999, we decided to rewrite
the Webmind code again. Not the whole system, thankfully
-- we were too far along for that -- but this time only
psycore, the central core of the system. This time around,
Ken was helped out not only by me but by Cassio and several
of his colleagues in Belo Horizonte, most notably Andre
Senna and Thiago Maia, two masters of data structures and
algorithms. At this point, there was a lot of pressure,
from some members of staff on both the technical and business
sides of Intelligenesis, to give up on the unified AI architecture
altogether, and just focus on making individual products
as good as they could be, postponing real AI into the future.
But Ken and Cassio and I and others focused on building
real AI resisted this pressure and plowed ahead with building
a new, improved psycore. Among other beloved chunks of code,
Paul’s Pods system met its doom in this rewrite.
The reasons for this redesign are somewhat interesting;
they reveal a lot about the nasty realities of building
big software systems doing complicated, intelligent things.
The erratic bugs and lack of documentation in Ken’s
code were part of the problem, and made Ken the arch-enemy
of the engineering staff for a while. But this stuff was
fixable. There was also a more serious problem with the
system. It just wasn't flexible enough to enable a huge,
multi-module Webmind to be run in a really intelligent way.
When the system was only doing one thing – say, reading
text, or using text to predict markets – then it was
fine. But, it was very bad at regulating several activities
at once.
For example, when loading in a series of texts, one would
see it get slower and slower at reading. The reason was,
the more texts it had in it, the more it had to think about.
It had no time to read more text because it was so busy
thinking about the texts it had already read! I remember
once when Mark Watson, one of our Java AI gurus, noticed
this problem in a Webmind demonstration he had written.
Jim McLoughlin – one of our early hires who built
a lot of Webmind’s numerical and financial analysis
components --showed him a way around it. By hacking the
code, you could get it to do anything, in any particular
situation. But what was needed was intelligent self-control:
the system had to know what processes were important to
it, and regulate the amount of attention it spent on various
things accordingly. Of course, we had always realized this
would be necessary. But we hadn’t realized how deeply
we’d have to code self-control into the system. Ken’s
1998-99 psycore was built to follow its whims, not to control
its dynamics in accordance with goals; and imposing goals
on top of this structure was like trying to get a hyper
child to sit down and listen to a history lesson.
The system was so complicated that we couldn’t easily
make the simple changes we needed to make to turn it into
a real global brain platform. We needed to be able to turn
on and off the different capabilities of the nodes and links
at will -- and have the system do this automatically, adapting
to its circumstances. We needed to be able to take collections
of nodes and links that were stable, no longer evolving,
and "freeze" them into a state that took up very
little memory, providing easy access but no adaptability.
We needed to be able to observe what was going on in a particular
part of the system, and chart its dynamics, to see what
structures were emerging.
Over the period 1998-99, psycore had evolved incrementally,
getting new features whenever a module author needed them.
The natural language team needed psycore to do one thing,
the finance team needed it to do another, the categorization
team needed it to do another, the reasoning team needed
a reasoning module, and so on, and so on. None of these
requests fundamentally changed the architecture of nodes,
links and wanderers -- mental entities relating to each
other and dynamically altering their relationships -- but
they changed the details of how nodes, links and wanderers
worked, and how they could be accessed and changed. The
abundance of new features had made the core code more powerful,
but it had made it messier too, and harder to control. Many
of the new features had similar structures, and in hindsight
could be consolidated into simpler structures. Engineers,
charged with building specialized components of Webmind,
complained that the system offered so many features and
possibilities that it was difficult to figure out how to
use it. They wanted something simpler, with a few good features
rather than a large number of features of varying quality.
Was it really necessary to go through all these revisions?
Why not just figure out everything correctly the first time,
and avoid all the reworking and re-reworking? One answer
is: We should have, we were just inexperienced, so we kept
fucking up. But there’s also another answer, that
I prefer because it’s more flattering to me! This
answer is: evolution doesn't work that way. Webmind, as
a software system, is an engineered system, but it is also
an evolved system. It went through several incarnations,
each one with some fit aspects and some unfit aspects. The
fit aspects survived to the next incarnation; the less fit
aspects didn’t. All large software projects evolve
through multiple generations; Webmind was not unique in
this regard. But the evolution of Webmind had unique aspects
because what is evolving is mind itself. In this evolution
we had to retain both those features that were most useful
for practical applications and those that were in accordance
with the abstract structure of mind.
Evolution’s good at figuring out how to make a system
that can achieve its goals within a certain environment.
In this case, the system was Webmind, and the environment
includes the physical structure of modern computer hardware,
the universe of software that has evolved to adapt to it,
and the practical applications that Webmind was intended
for, like market prediction, news filtering, data analysis,
text analysis, and conversation. Java, wonderful as it is,
wasn’t designed for mind hacking. The von Neumann
architecture was designed for repetitive mathematical calculations,
not for intelligence. But, by the same token, the brain
was designed for sensing and acting, not for abstract thought.
Fiber cells were designed for musculature, not for use as
neurons. Mind can emerge from any sufficiently flexible
substrate, as the features of the substrate gradually adapt
themselves to the requirements imposed on them.
The new psycore had a multi-layered structure, which I invented
based on some conversations with Youlian Troianov, a Bulgarian
software engineer who believes Webmind can never be truly
intelligent because it doesn’t make use of the fundamental
quantum symmetries of the universe (but he kept working
for us anyway, and even now follows Novamente work very
closely). I still don’t completely understand what
Youlian meant when he suggested psycore should have many
layers, but the idea set off a spark in my mind, and the
current psycore does indeed have three layers.
The lowest layer was what we called “abstract actors.”
It was a general framework for computational actors that
group other actors and transform other actors and send messages
to other actors. We chose the word “actors”
here instead of “actors” because “actors”
seems to mean too many things to too many people. Lots of
other possibilities were tossed around, including more interesting
ones like “cells,” “psells”, “psions”,
“psychons” and so forth. Basically, Layer 1
provides a kind of “mind operating system,”
suitable to run on a single machine and a single processor,
or else on a massively parallel hardware system in which
each actor gets its own processing power, like in the brain.
The second layer was “distributed actors” –
this deals with all the horrible nastiness of implementing
a massively parallel system on a collection of multiprocessor
machines networked together by TCP-IP. Scheduling of processes,
sending of messages from one machine to another, and so
forth. Paul’s Pods system was considered as a structuring
principle for this layer, but based on extensive testing
by the Brasilians, we chose some other ideas instead, which
Paul wasn’t terribly happy about.
The third layer, finally, was nodes and links and wanderers
and all the good stuff – all the stuff I invented
in 1997 and Ken and I coded up in the beginning. This layer
comes out very | | |