Technical Overview
Consider that many of the things
that you "know" come from experiences that you have
never had. That's one of the striking qualities of language; we
can share the experiences of others separated from us by space and
time. And though we lack first-hand knowledge, it doesn't prevent
us speaking with authority about, say, the depth of the ocean,
even though we have never seen the bottom.
In the same way, Brainhat
can function with no real knowledge of the world, given a
sufficient foundation of brokered facts to build upon. For brainhat,
this boot-strap collection is called the basic knowledge pool.
The elements of the basic knowledge pool represent simple ideas,
like a ball, or the color red. These ideas are connected
hierarchically to others--e.g. balls are toys, and red is a color.
Links between the elements define the hierarchy's structure.
Everything is the child of something else, and some are the child
or parent of many.
define woman-1
label woman
child human-1
person first
related man-1
define human-1
label human
label person
child mammal-1
wants mood-1
define mammal-1
label mammal
label creature
child animal-1
define animal-1
label animal
child things
Using the basic knowledge pool as a starting point, these simple
concepts can be combined to form arbitrarily complex
relationships. Within brainhat, these structures are called
Complex Concepts (CC)--ideas made from other ideas.
CCs can represent elementary assertions, e.g. "the ball is
red." They can be propositions, such as "if the golden
sun is shining then beautiful people are happy." They can
also be statements of cause-and-effect--"mario is happy
because he saw the princess." CCs can even represent
questions. (Doing research? Search for "descriptive
transformational grammars").
Brainhat casts these Complex Concepts into inverted trees. The
constituent concepts hang from their "roots", like
mobiles of ideas. The more abstract parts of the idea (e.g.
cause-and-effect) live near the top. The actors and their
attributes (golden sun, beautiful people) live near the bottom.
The links between them define their relationships to each other.
o Root
/ \
/ \
CAUSE / \ EFFECT
/ \
Root mario
/ | \ \
SUBJECT / | \ OBJECT \ ATTRIBUTE
/ | \ \
/ | princess happy
mario | VERB
|
saw
At runtime, CCs (e.g.
"the ball is red") are assembled, destroyed, evaluated,
compared and archived. Many live short lives as tendered (though
incorrect) interpretations of something the user may have said.
Others are deductions, generated from within the program. A few
CCs survive to become part of the context of the conversation in
progress, and to be added to the pool of things "known."
Brainhat's disarmingly human-like qualities of
understanding, learning, answering questions and speculating are
simply the products of creation and manipulation of CCs. Parsing
and pattern matching rules tell brainhat how to cast
particular fragments of speech into CCs, or how to recognize a
stored idea within a CC. Processing routines manipulate the CCs to
change their meaning, or combine them to make new. Brainhat
navigates through ambiguity in language by evaluating each CC
against itself (vertically), to see whether it makes sense alone,
and against a context buffer (horizontally), to see how it fairs
against ideas that came before.
Some examples will show the prototype at work: In this first
segment, brainhat learns about a couple of objects, and
answers some questions. Each sentence input is echoed to verify
its meaning.
>> the red ball is round
the ball is round
>> the blue ball is oval
the ball is oval
>> what shape is the red toy ?
ball is round is red
>> what color is the oval toy ?
ball is oval is blue
This next segment shows Brainhat exercising a chain of
reasoning, and explaining the outcome. The notion that "A is
near B implies B is near A" is part of the basic knowledge
pool, and preceeds this example.
>> if luigi sees the princess then luigi is happy
if luigi sees the princess then he is glad
>> if luigi is near the princess then luigi can see the
princess
if luigi is near the princess then he sees she
>> the princess is near luigi
the princess is near luigi
>> is luigi near the princess ?
he is near she yes
>> is luigi happy ?
luigi is near princess is glad yes
>> why is he happy
luigi is glad because he sees the princess
Because Brainhat organizes concepts hierarchically, it can
apply more general cases to specific events. For instance:
>> if a person is near a thing then the person can
see the thing
if a person is near a thing then the person see the thing
>> mario is near a ball
mario is near a ball
>> can mario see the ball
he see a ball yes
>> why can he see the ball
mario see a ball because he is near a ball
>> can mario see a block
maybe
Underneath, parsing and processing is directed by pattern
templates and post-processing routines.
/* Where is x?
*/
define sent-where
label question
rule where $c`tobe`0! $r`csubobj`1
map VERB,SUBJECT
postp SPEAK
postp CHOOSEONE
postp PULLWHERES
postp TOBECOMPACT
postp PUSHTENSE
postp REQUIREWHERES
For example, the lines
above tell brainhat how to parse and evaluate a question of
the form "where <tobe> <something>" (such as
"where are the happy people ?"). The "rule"
line gives the basic format. Sub-rules expand the
"something" portion ($r`csubobj`) and
"to-be" ($c`tobe`) portion of the question. A
"map" directive tells how the components should be
assembled into a CC. Finally, modular post-processing routines (postp
statements) reformat the resulting CCs. Each applies some simple
processing, typically modifying the shape of the CCs that are
passed-in, and handing them off to the routine that appears above
it in the list.
New rules extend brainhat's ability to understand. As an
example, modifying the program to recognize the "where"
question in a different format is a matter of adding a second
syntax rule to the definition above, like so:
$r`csubobj`1! $c`tobe`0! where
This new pattern would match a question like "mario was where
?"
Brainhat also uses pattern matching to identify structures
within CCs. Syntactically, CC pattern templates look very much
like input pattern templates.
$c`color-1`0!$c`toy-1`1
The CC pattern match rule above, for example, matches any of
"red ball", "blue ball", "red
block", "pink toy", and others. The common elements
are that {red,blue,pink} and {ball,block,toy} are all
"children" of colors and toys, respectively.
This introduction was intented simply to introduce the elements of
brainhat programming. The sections that follow give more in
depth (though certainly not exhaustive) overviews of Brainhat
patterns and knowledge pool programming.
Basic Knowledge Pool
Brainhat
learns about the relationships between basic concepts at start-up.
The notions that balls are toys, that pink is a shade of red, for
example, are things that you tell brainhat in advance.
Everything else (e.g. the ball is in the river) are learned as brainhat
executes.
Elements of the basic knowledge pool are kept in a file called data9.in
in the data directory (this will change in later releases). Each
concept that you want brainhat to know about starts with a DEFINE
statement. The definition continues until brainhat reaches
another DEFINE or end-of-file. Within a definition are a
number of tags that identify a concepts relationships to others
around it. Note that concepts can be defined in any order.
However, all references should satisfied; if you refer to another
concept from within a definition, it should exist. A number of tag
types are available. A comprehensive list appears at the end of
this section.
define block-1
label block
child toy-1
wants color-1
wants size-1
The sample above describes a
block. The definition has a unique name, block-1. It also
has a label, block by which you may refer to a
"block" in conversation with brainhat. Multiple
definitions may have the label block (a block can also be a
technique in American football, for example), however the
definition names should be unique. A concept can have multiple
labels, and so be known by multiple names. Each label would appear
on a line by itself.
The child tag identifies block1 as a more specific
example of a toy-1. Concepts can be children of any number
of other concepts (or none). Care should be taken not to create
cycles: no concept should be its own parent.
A wants tag identifies a preference for certain other
concepts that might be used in combination with it. By saying that
block-1 "wants" color, for instance, we are
specifying that if brainhat sees a block discussed in
combination with a reference to a color, we should bias our
thinking towards the toy, in lieu of a football technique.
o block-1
/|\
/ | \
CHILD / | \ WANTS
/ WANTS \
/ | \
toy-1 o | \
| o size-1
o
color-1
In some cases, we want to
identify a concept's uniqueness with respect to some parent.
Colors red and blue, for example, are unique with respect to
color. In conversation, I might refer first to a "red
ball," and then to a "blue ball." Because of your
experience with the uniqueness of color, you (as a person) will
automatically assume that I am talking about two different balls. Brainhat
makes the leap by looking at the balls' attributes, and noting
their orthogonality.
define blue-1
label blue
child color-1
orthogonal color-1
define red-1
label red
child color-1
orthogonal color-1
define pink-1
label pink
child red-1
orthogonal color-1
Brainhat
makes special consideration for concepts that are both orthogonal
and have a parent/child relationship. Pink will not be orthogonal
to red, but both will be orthogonal to blue.
The ultimate parent(s) of each concept determines what part of
speech it can play. Nouns must be children of things;
adjectives are children of attribute-1; verbs are children
of action-1; prepositions are children of preposition;
articles are children of article-1 (which is a child of attribute-1),
and so on. The lineage of a ball, for example, may be ball->toy-1->things,
which makes it a candidate to fill a noun slot.
Actions (verbs) require some special handling. Brainhat
needs the freedom to handle various verb tenses. Accordingly, verb
tenses should be organized as children of the infinitive. Special
tags define the tense, number and person of
each verb. From these, brainhat can choose an appropriate tense,
number and person when speaking.
define tosee-1
label to see
child sense-1
define see-1
label see
child tosee-1
number plural
tense present
person third
define sees-1
label sees
tense present
person third
number single
child tosee-1
define saw-1
label saw
child tosee-1
number single
tense past
person third
The definitions above create
the infinitive form to see, and a couple of subordinate
forms. As a minimum, the infinitive and the third person singular
present form of the verb should be defined.
Input Processing
Brainhat
(as it exists today) attempts to match user input against a set of
input patterns, one at a time, until it finds a fit. (See the file
data/input-patterns). The "fit" is a
parts-of-speech match; it does not presuppose the meaning of the
matched text. Rather, many permutations may be generated, with
many different meanings. "Boy saw bat," for instance,
might generate CCs that represent "bat" as a winged
mammal, and as an wooden baseball mallet. "Saw" could
mean "viewed," or it could mean "cut in half."
As a simplification, a rule that matches "boy saw bat"
might look like this:
define xxx
label sentence
rule $c`things`0! $c`actions`1! $c`things`2
map SUBJECT,VERB,OBJECT
Patterns components
corresponding to "boy", "saw" and
"bat" appear in the corresponding locations. The $c`parent`x
construct says that brainhat should attempt to match a word
of type parent, and assign it to the xth position.
The "!" character indicates the termination of a
pattern component. It may or may not be needed, depending on the
character that follows.
This pattern is pretty inflexible; all parts must be present and
in the prescribed order. The good news is that the pattern can
match a wide variety of input; the sentence "ball hit
wall" could fit the pattern as well.
When an input pattern matches, many compound concepts (CCs)
are created. Each is a permutation representing a possible
interpretation of the input. The map directive describes
what the resulting CCs should look like. There will always
be a root node. From that, components hang down, one level
deep.
o Root
/|\
VERB/ | \SUBJECT
/ | \
hit o | o ball
|
OBJECT|
o wall
The map directive in
our example will create CCs like the one pictured above. In some
cases, one of the components may be specifically nominated as the
"Root." As an example, the pattern below would match
gorilla-like declarations such as "girl happy" or
"ball red."
define xxx2
label sentence
rule $c`things`0! $c`attribute-1`1
map ROOT,ATTRIBUTE
The map directive will
generate forms that Brainhat will interpret as "girl
is happy" or "ball is red" by attaching the attributes
to their subjects. The subject will assume the "Root"
position. The resulting CC would look like this:
o girl
\
\ ATTRIBUTE
\
o happy
Of course, most sentences
aren't as simple as the ones in these examples. A mildly
complicated idea may parse into CCs many levels deep. And the
sentence structure may vary widely. Accordingly, CCs are typically
constructed from other CCs. Matching decends and rises, striving
to build from the bottom up. Expanding a previous example a
little, we might match more complicated utterances such as
"the boy saw the bat," or "the boy saw mary"
using the patterns below:
define xxx3
label sentence
rule $r`subobj`0! $c`actions`1! $r`subobj`2
map SUBJECT,VERB,OBJECT
define zzz
label subobj
rule [$c`article`0! ]$c`things`1
map ATTRIBUTE,ROOT
Rules can invoke other rules:
the r`subobj'x construct instructs Brainhat
to attempt sub-rules of the type subobj and assign matches
to the SUBJECT position. By virtue of delegation the
construction of individual components (subject, object, etc.) to
other rules we can construct multi-level CCs. (Note: rules most
not invoke themselves.)
o Root
/|\
VERB/ | \SUBJECT
/ | \
saw o | o boy
| \
OBJECT| \ ATTRIBUTE
o mary \
o the
Rule components that appear
in "[]"'s are optional. They are mapped if they appear
in the input stream, and ignored otherwise.
There may be multiple rules sharing a common label. These
will be tried one after another whenever $r`label`
is invoked. The first match wins. Accordingly, order matters: the
current version of Brainhat loads the rules into memory
such that the most complicated (least likely to match) form should
appear first, followed by the simplest form, and then by increasingly
more difficult forms.
Upon making a successful match, Brainhat skewers the
permutation candidates (CCs) together and passes them to post
processing routines. These routines may change the shape of the
CCs, eliminate a few, or use them for speech or to direct further
processing.
/* Where is x? */
define sent-where
label question
rule where $c`tobe`0! $r`csubobj`1
map VERB,SUBJECT
postp SPEAK
postp CHOOSEONE
postp PULLWHERES
postp TOBECOMPACT
postp PUSHTENSE
postp REQUIREWHERES
Post processing
routine selection starts at the bottom and proceeds upwards. In
this example, the routines are working to answer a question about
location of something. A CCs represents the question at hand.
Assume that a previous sentence told Brainhat that
"the boy is in the water." Before any post-processing,
the question "where is the boy?" might look like this:
o Root
/ \
SUBJECT / \ VERB
/ \
boy o o is
/ \ |
OBJPREP/ \PREP | TENSE
/ \ |
water o in o o present
Briefly, REQUIREWHERES
tacks a REQUIRES tag onto each of the permutation CCs. The
tag indicates that a prepositional phrase is a must-have for
answering the question. PUSHTENSE grabs the tense of the
verb and applies it to the requirement, making it further
restrictive:
Root o
/|\
SUBJECT / | \ VERB
/ | \
/ | o is
/ | |
o boy | |
| | |
ATTRIBUTE | |
| | |
Root o | |TENSE
/ \ | |
OBJPREP PREP | o present
/ \ |
o o | REQUIRES
water in |
|
|
| TENSE
Root o-------o present
OBJPREP/ \PREPOSITION
/ \
thing o o preposition
Routine TOBECOMPACT
changes the shape of the CC by removing the verb and placing the
subject in the role of "Root." PULLWHERES makes
multiple copies of the CC. Each is the same as the original except
that all but one prepositional phrase remains per copy. (In
our example there is only one prepositional phrase anyway:
"in the water.")
boy o
/|
ATTRIBUTE / |
/ |
/ |
/ |
Root o |
/ \ |
OBJPREP PREP |
/ \ |
o o | REQUIRES
water in |
|
|
| TENSE
Root o-------o present
/ \
OBJPREP/ \PREPOSITION
/ \
thing o o preposition
CHOOSEONE
selects the best result, and SPEAK voices it.
Post processing routines are many in number and function.
|