Brainhat - Technical Notes

Preface
Technical Overview
Vocabulary Programming
Inferences
Creating Memes
Installation
Preprocessing Data Files
Running Brainhat
Debug Overview
Download

These are technical notes, installation and use instructions for Brainhat in the non-database, Linux version. The notes deal with the basics of vocabulary creation, grammar and inferences.

These notes do not cover the SQL back-end, meme-loading, meme-shifting, meme maps or the GUI interface, all of which become active when the database is added.

Technical Overview

Consider that many of the things that you "know" come from experiences that you have never had. That's one of the striking qualities of language; we can share the experiences of others separated from us by space and time. And though we lack first-hand knowledge, it doesn't prevent us speaking with authority about, say, the depth of the ocean, even though we have never seen the bottom.

In the same way, Brainhat can function with no real knowledge of the world, given a sufficient foundation of brokered facts to build upon.
Brainhat's vocabulary contains simple concepts. like a ball, or the color red. These concepts are connected hierarchically to others--e.g. balls are toys, and red is a color. Links between the elements define the taxonomy's structure. Everything is the child of something else, and some are the child or parent of many.

define   woman-1
         label          woman
         child-of       human-1
         person         first
         related        man-1

define   human-1
         label          human
         label          person
         child-of       mammal-1
         wants          mood-1

define   mammal-1
         label          mammal
         label          creature
         child-of       animal-1

define   animal-1
         label          animal
         child-of       things

These simple concepts can be combined to form arbitrarily complex relationships. Within brainhat, these phrase structures are called Complex Concepts (CC)--ideas made from other ideas. CCs can represent elementary assertions, e.g. "the ball is red." They can be propositions, such as "if the golden sun is shining then beautiful people are happy." They can also be statements of cause-and-effect--"mario is happy because he saw the princess." CCs can even represent questions.
Brainhat casts these Complex Concepts into inverted trees. The constituent concepts hang from their "roots", like mobiles of ideas. The more abstract parts of the idea (e.g. cause-and-effect) live near the top. The actors and their attributes (golden sun, beautiful people) live near the bottom. The links between them define their relationships to each other.


                      o Root
                     / \
                    /   \
            CAUSE  /     \  EFFECT
                  /       \
               Root        mario
              / |  \           \
    SUBJECT  /  |   \ OBJECT    \ ATTRIBUTE
            /   |    \           \
           /    |      princess   happy
      mario     | VERB
                |
              saw

At runtime, CCs (e.g. "the ball is red") are assembled, destroyed, evaluated, compared and archived. Many live short lives as tendered (though incorrect) interpretations of something the user may have said. Others are deductions, generated from within the program. A few CCs survive to become part of the context of the conversation in progress, and to be added to the pool of things "known."

Brainhat's disarmingly human-like qualities of understanding, learning, answering questions and speculating are simply the products of creation and manipulation of CCs as part of a transformational grammar. Parsing and pattern matching rules (context free grammar) tell brainhat how to cast particular fragments of speech into CCs, or how to recognize a stored idea within a CC. Processing routines manipulate the CCs to change their meaning, or combine them to make new. Brainhat navigates through ambiguity in language by evaluating each CC against itself (vertically), to see whether it makes sense alone, and against a context buffer (horizontally), to see how it fairs against ideas that came before.

Some examples will show the prototype at work: In this first segment, brainhat learns about a couple of objects, and answers some questions. Each sentence input is echoed to verify its meaning.

>> the red ball is round
the ball is round
>> the blue ball is oval
the ball is oval
>> what shape is the red toy ?
ball is round is red
>> what color is the oval toy ?
ball is oval is blue

This next segment shows Brainhat exercising a chain of reasoning, and explaining the outcome. The notion that "A is near B implies B is near A" is part of the basic knowledge pool, and preceeds this example.

>> if luigi sees the princess then he is happy
if luigi sees the princess then he is glad.
>> if luigi is near the princess then luigi can see the princess
if luigi is near the princess then he sees the princess
>> luigi is near the princess
luigi is near the princess. luigi sees the princess. he is glad.
>> why?
he is glad because luigi sees the princess.
>> why?
luigi sees the princess because luigi is near the princess.

Because Brainhat organizes concepts hierarchically, it can apply more general cases to specific events. For instance:

>> if a person is near a thing then the person can see the thing
if a somebody is near a something then a somebody sees a something
>> mario is near a ball
mario is near a ball. mario sees a ball
>> does mario see a toy?
yes. mario sees a ball.
>> why?
mario sees a ball because mario is near a ball
>> does he see a block
maybe. I do not know.

We can uncouple what Brainhat is told from what it believes. In this next run, we have unset a "verbatim" option to make Brainhat more critical of its input:
>> if i say something is red then something is blue
if You say something is red then something is blue.
>> the ball is red
the ball is red. the ball is blue.
>> why?
the ball is blue because You say the ball is red.
>> do i say that the ball is red?
yes. You say the ball is red.
>> do i say that the ball is blue?
maybe. I do not know.

Underneath, parsing and processing is directed by grammar patterns and match-processing routines.

/* Where is x?
*/
define   sent-where
         label            question
         rule             where $c`tobe`0! $r`csubobj`1
         map              VERB,SUBJECT
         mproc            SPEAK
         mproc            CHOOSEONE
         mproc            PULLWHERES
         mproc            TOBECOMPACT
         mproc            PUSHTENSE
         mproc            REQUIREWHERES

For example, the lines above tell brainhat how to parse and evaluate a question of the form "where <tobe> <something>" (such as "where are the happy people ?"). The "rule" line gives the basic format. Sub-rules expand the "something" portion ($r`csubobj`) and "to-be" ($c`tobe`) portion of the question. A "map" directive tells how the components should be assembled into a CC. Finally, modular post-processing routines (mproc statements) reformat the resulting CCs. Each applies some simple processing, typically modifying the shape of the CCs that are passed-in, and handing them off to the routine that appears above it in the list.

New rules extend brainhat's ability to understand. As an example, modifying the program to recognize the "where" question in a different format is a matter of adding a second syntax rule to the definition above, like so:

$r`csubobj`1! $c`tobe`0! where

This new pattern would match a question like "mario was where ?"

Brainhat also uses pattern matching to identify structures within CCs. Syntactically, CC grammar patterns look very much like input grammar patterns.

$c`color-1`0!$c`toy-1`1

The CC pattern match rule above, for example, matches any of "red ball", "blue ball", "red block", "pink toy", and others. The common elements are that {red,blue,pink} and {ball,block,toy} are all "children" of colors and toys, respectively.

This introduction was intented simply to introduce the elements of brainhat programming. The sections that follow give more in depth (though certainly not exhaustive) overviews of Brainhat input grammar and vocabulary programming.

Vocabulary

Brainhat learns about the relationships between basic concepts at start-up. The notions that balls are toys, that pink is a shade of red, for example, are things that you tell brainhat in advance. Everything else (e.g. the ball is in the river) are learned as brainhat executes.
Elements of the vocabulary are kept in a data directory in the non-database version, or in a database in the SQL version. They may be created with the Brainhat development GUI or with a file editor.

Each concept definition starts with a DEFINE statement. A definition continues until brainhat reaches another DEFINE or end-of-file. Within a definition are a number of tags that identify a concepts relationships to others around it. Concepts can be defined in any order. However, all references should satisfied; if you refer to another concept from within a definition, it should exist.

define   block-1
         label         block
         child-of      toy-1
         wants         color-1
         wants         size-1

The sample above describes a block. The definition has a unique name, block-1. It also has a label, block by which you may refer to a "block" in conversation with brainhat. Multiple definitions may have the label block (a block can also be a technique in American football, for example), however the definition names should be unique. A concept can have multiple labels, and so be known by multiple names. Each label would appear on a line by itself.

The child-of tag identifies block1 as a more specific example of a toy-1. Concepts can be children of any number of other concepts (or none). Care should be taken not to create cycles: no concept should be its own parent.

A wants tag identifies a preference for certain other concepts that might be used in combination with it. By saying that block-1 "wants" color, for instance, we are specifying that if brainhat sees a block discussed in combination with a reference to a color, we should bias our thinking towards the toy, in lieu of a football technique.

                    o  block-1
                   /|\
                  / | \
           CHILD /  |  \  WANTS	
                / WANTS \
               /    |    \
       toy-1  o     |     \
                    |      o size-1
                    o 
                 color-1

In some cases, we want to identify a concept's uniqueness with respect to some parent. Colors red and blue, for example, are unique with respect to color. In conversation, I might refer first to a "red ball," and then to a "blue ball." Because of your experience with the uniqueness of color, you (as a person) will automatically assume that I am talking about two different balls. Brainhat makes the leap by looking at the balls' attributes, and noting their orthogonality.

define   blue-1
         label                  blue
         child-of               color-1
         orthogonal   color-1

define   red-1
         label                  red
         child-of               color-1
         orthogonal   color-1

define   pink-1
         label                  pink
         child-of               red-1
         orthogonal   color-1

Brainhat makes special consideration for concepts that are both orthogonal and have a parent/child relationship. Pink will not be orthogonal to red, but both will be orthogonal to blue.

The ultimate parent(s) of each concept determines what part of speech it can play. Nouns must be children of things; adjectives are children of adjective; verbs are children of action (or tobe); prepositions are children of preposition; articles are children of article-1, and so on. The lineage of a ball, for example, may be ball->toy-1->things, which makes it a candidate to fill a noun slot.

Actions (verbs) require some special handling. Brainhat needs the freedom to handle various verb tenses. Accordingly, verb tenses should be organized as children of the infinitive. Special tags define the tense, number and person of each verb. From these, brainhat can choose an appropriate tense, number and person when speaking.

define   tosee-1
         label         to see
         child-of      sense-1

define   see-1
         label         see
         child-of      tosee-1
         number        plural
         tense         present
         person        third

define   sees-1
         label         sees
         tense         present
         person        third
         number        singular
         child-of      tosee-1

define   saw-1
         label         saw
         child-of      tosee-1
         number        singular
         tense         past
         person        third

The definitions above create the infinitive form to see, and a couple of subordinate forms. As a minimum, the infinitive and the third person singular present form of the verb should be defined.

Input Processing

Brainhat (as it exists today) attempts to match user input against a set of grammar patterns, one at a time, until it finds a fit. (See the file data/input-patterns). The "fit" is a parts-of-speech match; it does not pre-suppose the meaning of the matched text. Rather, many permutations may be generated, with many different meanings. "Boy saw bat," for instance, might generate CCs that represent "bat" as a winged mammal, and as an wooden baseball mallet. "Saw" could mean "viewed," or it could mean "cut in half."

As a simplification, a rule that matches "boy saw bat" might look like this:

define   xxx
         label   sentence
         rule    $c`things`0! $c`actions`1! $c`things`2
         map     SUBJECT,VERB,OBJECT

corresponding to "boy", "saw" and "bat" appear in the corresponding locations. The $c`parent`x construct says that brainhat should attempt to match a word of type parent, and assign it to the xth position. The "!" character indicates the termination of a pattern component. It may or may not be needed, depending on the character that follows.

This pattern is pretty inflexible; all parts must be present and in the prescribed order. The good news is that the pattern can match a wide variety of input; the sentence "ball hit wall" could fit the pattern as well.

When an input pattern matches, many complex concepts (CCs) are created. Each is a permutation representing a possible interpretation of the input. The map directive describes what the resulting CCs should look like. There will always be a root node. From that, components hang down, one level deep.

            o Root
           /|\
      VERB/ | \SUBJECT
         /  |  \
    hit o   |   o ball
            |
      OBJECT|
            o wall

The map directive in our example will create CCs like the one pictured above. In some cases, one of the components may be specifically nominated as the "Root." As an example, the pattern below would match gorilla-like declarations such as "girl happy" or "ball red."


define   xxx2
         label   sentence
         rule    $c`things`0! $c`attribute-1`1
         map     ROOT,ATTRIBUTE

The map directive will generate forms that Brainhat will interpret as "girl is happy" or "ball is red" by attaching the attributes to their subjects. The subject will assume the "Root" position. The resulting CC would look like this:

            o girl
             \
              \ ATTRIBUTE
               \
                o happy

Of course, most sentences aren't as simple as the ones in these examples. A mildly complicated idea may parse into CCs many levels deep. And the sentence structure may vary widely. Accordingly, CCs are typically constructed from other CCs. Matching decends and rises, striving to build from the bottom up. Expanding a previous example a little, we might match more complicated utterances such as "the boy saw the bat," or "the boy saw mary" using the patterns below:

define   xxx3
         label   sentence
         rule    $r`subobj`0! $c`actions`1! $r`subobj`2
         map     SUBJECT,VERB,OBJECT

define   zzz
         label   subobj
         rule    [$c`article`0! ]$c`things`1
         map     ATTRIBUTE,ROOT

Rules can invoke other rules: the r`subobj'x construct instructs Brainhat to attempt sub-rules of the type subobj and assign matches to the SUBJECT position. By virtue of delegation the construction of individual components (subject, object, etc.) to other rules we can construct multi-level CCs.

            o Root
           /|\
      VERB/ | \SUBJECT
         /  |  \
    saw o   |   o boy
            |    \
      OBJECT|     \ ATTRIBUTE
            o mary \
                    o the

Rule components that appear in "[]"'s are optional. They are mapped if they appear in the input stream, and ignored otherwise.

There may be multiple rules sharing a common label. These will be tried one after another whenever $r`label` is invoked. The first match wins. Accordingly, order matters: the current version of Brainhat loads the rules into memory such that the most complicated (least likely to match) form should appear first, followed by the simplest form, and then by increasingly more difficult forms.

Upon making a successful match, Brainhat skewers the permutation candidates (CCs) together and passes them to post processing routines. These routines may change the shape of the CCs, eliminate a few, or use them for speech or to direct further processing.

/* Where is x? */

define   sent-where
         label            question
         rule             where $c`tobe`0! $r`csubobj`1
         map              VERB,SUBJECT
         mproc            SPEAK
         mproc            CHOOSEONE
         mproc            PULLWHERES
         mproc            TOBECOMPACT
         mproc            PUSHTENSE
         mproc            REQUIREWHERES

Post processing routine selection starts at the bottom and proceeds upwards. In this example, the routines are working to answer a question about location of something. A CCs represents the question at hand. Assume that a previous sentence told Brainhat that "the boy is in the water." Before any post-processing, the question "where is the boy?" might look like this:

                o Root
               / \
      SUBJECT /   \ VERB
             /     \
        boy o       o is
           / \      |
   OBJPREP/   \PREP | TENSE
         /     \    |
  water o    in o   o present

Briefly, REQUIREWHERES tacks a REQUIRES tag onto each of the permutation CCs. The tag indicates that a prepositional phrase is a must-have for answering the question. PUSHTENSE grabs the tense of the verb and applies it to the requirement, making it further restrictive:

         Root o 
             /|\
    SUBJECT / | \ VERB
           /  |  \
          /   |   o is
         /    |   |
        o boy |   |
        |     |   |
    ATTRIBUTE |   |
        |     |   |
   Root o     |   |TENSE
       / \    |   |
 OBJPREP PREP |   o present
     /     \  |
    o       o | REQUIRES
  water    in |
              |
              |
              | TENSE
         Root o-------o present
     OBJPREP/   \PREPOSITION
           /     \
    thing o       o preposition

Routine TOBECOMPACT changes the shape of the CC by removing the verb and placing the subject in the role of "Root." PULLWHERES makes multiple copies of the CC. Each is the same as the original except that all but one prepositional phrase remains per copy. (In our example there is only one prepositional phrase anyway: "in the water.")

          boy o 
             /|
  ATTRIBUTE / |
           /  |
          /   |
         /    |
   Root o     |
       / \    |
 OBJPREP PREP |
     /     \  |
    o       o | REQUIRES
  water    in |
              |
              |
              | TENSE
         Root o-------o present
             / \
     OBJPREP/   \PREPOSITION
           /     \
    thing o       o preposition

CHOOSEONE selects the best result, and SPEAK voices it.

Post processing routines are many in number and function.

Vocabulary Programming

The bootstrap vocabulary for Brainhat is a collection of simple concepts, like �ball� or �red.� These ideas are connected hierarchically to others--e.g. balls are toys, and red is a color. Links between the elements define the hierarchy's structure. Everything is the child of something else, and some are the child or parent of many.

define   woman-1
         label          woman
         child          human-1
         person         first
         related        man-1

define   human-1
         label          human
         label          person
         child          mammal-1
         wants          mood-1

define   mammal-1
         label          mammal
         label          creature
         child          animal-1

define   animal-1
         label          animal
         child          things

Brainhat comes with a vocabulary that consists of words that are required or sometimes used in testing. A new application may require additional vocabulary. Let's take a few paragraphs to look at some of the conventions.

Concepts

Concept definitions can appear in any order. One definition ends and another begins whenever Brainhat encounters a define statement. The name given to the definition must be unique. By convention, one cardinally orders different uses of the same word, e.g. ball-1, ball-2, etc., where the first might describe a toy, the second a grand social event.

define   ball-1
         label           ball
         child           toy-1
         wants           color-1
         wants           size-1
         typically       round-1

The sample above shows some of the basic elements of a concept definition. Statements subsequent to define can appear in any order. Tab characters position the columns. Most other characters count; when defining concepts, take care to avoid trailing blanks. Let's look at the components of a concept definition:

�label�

The label in a concept definition describes the names by which the concept will be known. You can have as many labels as you like; they will all act as synonyms for one another. Labels can contain multiple words, separated by spaces.

define   lollipop-1
         label          lollipop
         label          lolly
         label          sucker
         child          candy-1

�child�

The child tag describes the concept's place in the grand scheme of things. A �poodle�, for instance, might be a child of �dog.� In some cases, a concept will have multiple parents. Consider a tomato: techinically, its a fruit, but most people consider it a vegetable.

define   tomato-1
         label          tomato
         label          love apple
         label          tomatoe
         child          fruit-1
         child          vegetable-1

The wants tag provides a little hint to Brainhat about the usage of the concept. The definition of a ball, for example, is more complete if you know the color and shape. A �ticket� definition might want a cop, or might want a ballet, depending on the kind of ticket we mean. Be careful here though; we are not talking about what an animate object being defined might want. Rather we are talking about other concepts that might appear in conjunction with with the one being defined.

�orthogonal�

We can also provide hints by telling Brainhat when a concept is exclusive of others in use. Rooms in a house are typically distinct, for example. We might tell Brainhat that they orthogonal. That way, when we talk about the lamp in the bedroom, Brainhat will know that it is different than the lamp in the kitchen. You may choose any concept as the basis for othogonality. Typically, though, one would choose a basis that has a parent relationship.


define   kitchen-1
         label          kitchen
         child          room-1
         orthogonal     room-1

define   bedroom-1
         label          bedroom
         child          room-1
         orthogonal     room-1

�number�

The number tag is commonly used with verbs and nouns. Possible values include singular and plural.

�tense�

The tense tag is primarily for use with verbs and auxiliary verbs ("enablers") (can, will, does, did). Possible values include past, present, future imperfect, and so-on).

�person�

Used primarily with verbs and auxiliary verbs, the person tag can take the values first, second and third.

Nouns

All concepts that can trace their lineage back to �things� can be treated as nouns. The tags we discussed above all apply to noun concept definition. There are a few other guidelines as well. Particularly, it helps in output generation if Brainhat knows whether a noun is the singular or plural form of a word. And there's also a matter of how the singular and plural forms are related in the hierarchy.

define  berth-1
        label           berth
        label           sleeping berth
        label           sleeping station
        child           place-1
        child           berths-1
        number          singular

define  berths-1
        label           berths
        label           sleeping berths
        label           sleeping stations
        child           place-1
        number          plural

The two concepts above are the singular and plural forms of "berth". They are linked together so that the singular, "berth" is a child of the plural, "berths". Both are children of "place-1" as well. This way, we can talk about a berth as location independent of the plural form, or we can talk about a berth as a special case of berths. If there were a concept "places-1" we might link "berths-1" to that as well.

Verbs

All verbs must be able to trace their lineage back to action. This means that, by following child links, you should be able to find action as an ancestor. Furthermore, verbs should be organized so that the infinitive form is a parent to all of subordinate forms, as in this set of definitions:

define   tosee-1
         label       to see
         child       sense-1

define   see-1
         label       see
         child       tosee-1
         number      plural
         tense       present
         person      third

define   sees-1
         label       sees
         tense       present
         person      third
         number      singular
         child       tosee-1

define   saw-1
         label       saw
         child       tosee-1
         number      singular
         tense       past
         person      third

There is no requirement that any of the subordinate forms be present. However, at a minimum, it is good idea that you define the infinitive and the third person-present-singular forms. Notice how the forms are linked together; the subordinate forms are children of the infinitive.

Prepositions

All prepositions must be able to trace their lineage back to prepositions. This means that, by following child-of links, you should be able to find preposition as an ancestor.

Attributes

All attributes must be able to trace their lineage back to attribute-1. This means that, by following child links, you should be able to find attribute-1 as an ancestor.

Articles

All articles must be able to trace their lineage back to article. This means that, by following child links, you should be able to find article as an ancestor.

Adverbs

Adverbs have adverb-1 as their ultimate parent.

Inferences

Brainhat can evaluate simple inferences to arrive at simple conclusions. I might say, for instance:

if the ball is in the water then the ball is wet

Because every concept that Brainhat knows about is descended from another, broader observations can be applied to specific objects. For example, here is a more general statement about being in the water:

if a thing is in the water then a thing is wet

In either case, I can tell Brainhat that a ball is in the water, the program will be able to infer that it is wet. The difference is that with the second inference, I can throw a block, the princess and spiders into the water, and they'll be wet too.

The program can also chain inferences together to draw more complex conclusions. For example, the inferences and statements below allow Brainhat to answer the question "is mario happy?"

if a man has a girlfriend then a man is happy

if a man likes a woman and a woman likes a man then the man has a girlfriend and the woman has a boyfriend

if a man sees a beautiful woman then a man likes the woman

if a person is near a thing then a person can see a thing

if a man is handsome then the princess likes the man

the princess is beautiful

mario is near the princess

mario is handsome

is mario happy?

Brainhat works its way through the inferences to discover that mario is indeed happy because the princess is his girlfriend. Some other facts are unearthed along the way. Particularly, Brainhat learns that the princess has a boyfriend. (Note, however, there's no inference to suggest that she is happy about it.)

The concepts definitions include some special words that make inferences easier to understand--both for you and for Brainhat. Examples are concepts the "thing1" and "thing2;". I might use these in a inference such as:



if thing1 is near thing2 then thing2 is near thing1

Concepts "thing1" and "thing2" are the same as other concepts except that they have one child tag called x-template, just one other parent ("things" in this case), and no children of their own. They behave differently than other placeholders in a inference in that they need not be placeholders for children of their own, but instead represent children of their (only) parent. You may add x-template concepts as you like.

Note that Brainhat won't understand every kind of inference you might want to pose. Some of the current limitations are:

An single inference can have at most two conditions and two consequences, logically anded or ored together.
Brainhat cannot evaluate through unconstrained objects in an inference: An example of an inference that will fail is "if thing1 is near thing2 and thing2 is near thing3 then thing1 is near thing3." In this case, thing2 is unconstrained; it could be anything in the Universe.

Creating Memes or Scenarios

Say that you own an ice cream parlor. Whenever you bring on a new employee, you expect that you will have to train them. You explain the job: "this lever pumps the cones... never pump ice cream into your mouth... the chocolate shots are in a bucket under the sink..." The new employee bobs his or her head in agreement.

What next? Do you toss the them the keys and say "lock up at eight"? Of course not. What comes next is testing and repetition; you want to make sure that they "get it."

For the purposes of describing Brainhat programming, the dumber the counter sitter, the better the analogy; Brainhat only knows what you tell it. Furthermore, the natural task progression that comes naturally to a human--the concept of "steps"--is meaningless to Brainhat. You must lay out inference milestones that help a conversation graduate from one stage to another. To assure that the scenario you are describing to Brainhat "works" requires testing and repetition.

Let's take a specific example. Say that you want an ice cream parlor exchange to progress from the arrival of a customer to the question "do you want a sugar cone?"

If I am hungry then I want ice cream. If I will have something then I want something. If I want ice cream then ask me if I like vanilla ice cream. If I do not like vanilla ice cream then ask if I like chocolate ice cream. If I I do not like chocolate ice cream then tell me that I like weird ice cream and ask if I am unhealthy. If I like ice cream then ask if I want a sugar cone.

You have chocolate ice cream. You have vanilla ice cream.

Consider that we might take multiple paths along the way. The user might want chocolate or she might want vanilla. She might even ask for ice cream of no particular flavor (wierd). Notice that I have tried to bound the conversation so that we get to the "sugar cone" question by almost any path. This is not a requirement; the conversation can become unscripted at any time. However, once we fall off the subject at hand, Brainhat will run out of things to say. It will be up to the user to take over the lead in the conversation.

How might the conversation go?

>> hi
hello.
>> i am hungry
You are famished. You want ice cream. do You like vanilla ice cream?
>> do you have chocolate ice cream?
yes. I have chocolate ice cream. do You like vanilla ice cream?
>> i like chocolate ice cream.
You like chocolate ice cream. Do you want a cone?
>> yes
You want a cone.

Scenario development takes some patience. Occasionally, Brainhat's logic will take a turn that you hadn't anticipated. Or you may ask the user a question that is likely to be answered in a fashion that Brainhat doesn't interpret correctly. Accordingly, the questions you ask and the statements that you make should lead the interlocutor toward a dialogue you have already tested.

The good news is that you can stuff Brainhat full of amusing, off-topic information so that if the user does stray, there will be something unexpected to discover. Today, for instance, I heard someone in our office tell Brainhat that they don't like cold weather. Brainhat told the person that Egypt is warm, and that the person should go to Egypt! I had coded that into a scenario and forgotten about it. It was a amusing surprise to hear it re-surface.

If you wish to experiment with the ice-cream scenario, here are the additional words you will need to include into Brainhat's input files:

define flavor-1
       label flavor
       child-of attribute-1

define chocolate-1
       label chocolate
       child-of flavor-1
       orthogonal flavor-1

define vanilla-1
       label vanilla
       child-of flavor-1
       orthogonal flavor-1

define ice-cream-1
       label ice cream
       child-of food-1
       typically cold-1

define sugar-cone-1
       label sugar cone
       label cone
       child-of food-1

Installation

This section describes installation of the non-database, Linux (Intel) version of Brainhat. Other versions exist, but are not described in this document.

The Linux server distribution can be used as a stand-alone command-line Brainhat program or as a text-based TCP daemon.

Retrieve the Linux server distribution.
Create an installation directory. Directory /usr/local/etc/brainhat is preferred. Brainhat will run from this directory in daemon mode, if possible, to reduce security risks.
Copy the Brainhat Unix distribution to the directory created in the step above.
Extract with the command:
tar xvfz brainhat.tar.gz
If you will be running Brainhat as a daemon, see that you have a user nobody defined in your passwd file. Brainhat will run as nobody in daemon mode, if possible, to reduce security risks. Make /usr/local/etc/brainhat owned by nobody.

Unpacking the distribution will create several directories:

./data: This directory contains that data files that Brainhat needs when it runs, including the vocabulary (words.txt) and grammar (input-patterns.txt).
./memes: A few pre-prepared Brainhat scenarios.
./robottest: This directory contains sample code for building Brainhat interfaces to other programs (and robots).

A few of the important program modules include:

./brainhat: This is the Brainhat binary.
./simplecpp: Program that preprocesses the data files before ./brainhat is invoked.
./run: A script to preprocess Brainhat data files and invoke the program in interactive mode.

Preprocessing

(This applies only to the non-database version).

The data files in the data directory must be preprocessed before they are available to the program in stand-alone or daemon mode. This will make changes in vocabulary or grammar available to the program.

Preprocessing is not necessary if you are changing meme (english) programming input.

Preprocessing is done by the simplecpp program. If you start Brainhat with either the run or runv scripts, preprocessing takes place automatically.

If you wish to preprocess by hand:


cd data
./simplecpp < data.in > data

Running Brainhat

As a command line, stand-alone program

The quickest way to start Brainhat is to type ./runv in the directory where you unpacked the distribution. This will preprocess the data files in the ./data directory and supply them to Brainhat at startup.

The ./runv command in equivalent to typing:

#!/bin/csh
cd data
echo -n "Pre-processing..."
../simplecpp < data.in > data
echo "Initializing"
cd ..
./brainhat data/data +verbatim

This makes Brainhat preprocess data/data.in into data/data, and feed the result to Brainhat.

As text daemon, listening on port 4144

To run Brainhat as a daemon, become root, preprocess data files if necessary (using simplecpp, as above).

Invoke with:

./brainhat -d +verbatim [data/data]

You may test the daemon by telneting to the localhost on port 4144.

By default, Brainhat will initialize itself with anything it finds in the local brainhat.init file. Brainhat can also input files after it is started by entering "input" on the command line. You may also initialize the program or daemon from a specific file at startup:

./brainhat -d -i filename +verbatim

Start brainhat with the "-a" option to see other alternatives.

Debug Quick Intro

This is an quick introduction to the debug facilities within Brainhat. You can enter debug mode at any time by entering:

>> break

at the ">>" prompt. You can return to processing by entering the

debug> cont

command.

Debug mode is useful for examining Brainhat's context and discourse buffers, and for looking at individual concept definitions from the context and from the basic knowledge pool. The following is the output of the help command within debug mode:



Break in debug at the start:
debug> help
Commands:
DEBUG routine-or-process[ location]
  Enable debug type, and optionally stop at listed locations.
NODEBUG routine-or-process
  Disable debug type.
DUMP [n]
  Dump optional argument concept (if concept).
  Dump optional conchain (if conchain).
  "n" chooses conchain element.
RDUMP [n]
  Dump recursively.
LIST label
  Show all symbols matching label.
WORDS
  Dump pre-prepared word list.
STOP
  Stop.
S[R]DUMP label
  Dump symbol from basic knowledge pool.
C[R]DUMP label
  Dump symbol from context.
X[R]DUMP label
  Dump CC from context.
D[R]DUMP label
  Dump symbol from discourse buffer.
X[R]DUMP label
  Dump CC from context.
SPEAK [n]
  Speak optional argument concept (if concept).
  Speak optional conchain (if conchain).
XSPEAK [n]
  Speak nth context entry.
DSPEAK [n]
  Speak nth discourse entry.
  "n" chooses conchain element.
"location" is one of {start, loc[1-3], finish, all}

debug> cont
>>

Say that you want to examine what Brainhat has recorded for definitions of "block". You could step into debug mode and ask Brainhat to list all concepts known as "block".


>> the block is red
 the block is red.
>> break
 
Break in debug at the start:
debug> list block
block-1 (symtab)
block-1-36d0 (context)

Two definitions appear--one from the clean vocabulary, and one from the dirty context. To examine the clean copy, use the srdump command, as described above. To examine a concept from the context, use the crdump command, as below:

debug> crdump block-1-36d0

define block-1-36d0 [2210] label block attribute red-1-36cf article the-1-3600 typically square-1 wants size-1 wants color-1 child toy-1 define red-1-36cf [2236] label red tense present orthogonal color-1 child color-1 define present [7153] child tense define color-1 [2224] label color child attribute-1 define the-1-3600 [1888] label the child definite child article-1 define square-1 [2350] label square orthogonal shape-1 child shape-1 define shape-1 [2339] label shape child attribute-1 define size-1 [2284] label size typically small-1 child attribute-1 define small-1 [2290] label small orthogonal size-1 child size-1 define color-1 [2224] label color child attribute-1 debug> cont

Debug also allows you to examine the contents of concepts and chains of concepts as they proceed through processing. You can select from the many breakpoints embedded in the code. Most of the post processing routines, for example, can be break-pointed at the entry, exit, and in the middle, as appropriate. Futhermore, a breakpoint may have data associated with it. A concept or a string of concepts (a "conchain") may be available once the break occurs--it depends upon how the breakpoint was coded.

As an example, the following sets a breakpoint at the start of a routine called chooseone that has the job of skinnying down a list of candidate complex concepts into a single winner:


debug> break chooseone start

When the program runs and processing passes to chooseone, the program will go into debug mode. At that point, you may exercise an of the commands listed above. A typical command to issue upon breaking at the start of a post routine would be:


debug> dump

This will list the contents of the conchain passed to chooseone, and thereby allow you to see the field of candidates available to the routine before it makes its selection.

In addition to routine breakpoints, you may also set debug flags (coded to look like breakpoints) that will reveal something about processing along the way. One often uses a flag called ptrace. It will show which rules in input-patterns are matching input.


debug> break ptrace
debug> cont

There is much more to this subject, and much more documentation needed. Explore debug to see what you can find.