|
|
Building a Sentient Computer is Hard Work
I've been at this for about 20 years, now, on and off, if you
count the FORTRAN profiler that formed the basis for the parser.
I picked the approach that was most promising at the time and resolved to
get to the end.
I still have that resolve.
This site has been online since 1996.
Passers-by have confused it for a speech recognition technology.
Others assumed it's a chat bot.
And for the computational linguists and other practitioners, grammar-based natural language processing
like this has been very out of style for the last ten years;
statistical natural language processing has enjoyed the limelight.
There have been plenty of cul-de-sacs to keep people busy.
|
I haven't defended my effort very well.
When you're working on a schedule that's built of decades of stolen moments, one doesn't really see the
need.
And, admittedly, my view of what I am doing has changed from time-to-time.
I thought I'd take a few paragraphs to explain the way I see the project at this time.
But first, I'll describe a few of the cul-de-sacs.
Chat Bots Are a Waste of Your Time
There was a runaway fascination with chat bots.
I think it's subsided, thankfully.
Here's all the science you need to understand about a chat bot:
<pattern>MY NAME IS *</pattern><template><srai>Nice to meet you <star/></srai>.</template>
The snippet shown above is in a language called Artificial Intelligence Markup Language.
The thing matched on the left is substituted on the right.
People have spent great amounts of time constucting huge AIML
sets to mimic dialog.
It's not intelligence.
The thing has no goals, no introspection.
It's recursive string matching with substitution, like an bloated editor macro.
To quote Quint from Jaws:
"...it's got lifeless eyes. Black eyes. Like a doll's eyes. When he comes at ya,
doesn't seem to be living... until he bites ya, and those black eyes roll over white and then...
ah then you hear that terrible high-pitched screamin."
Sounds like someone based a business plan on chat bots.
The Problems with Curve-Fitting
Curve-fitting is a mathematical art where one takes the observed domain and
range of a system and creates a model to mimic its behaviour.
The example everyone remembers from school is linear interpolation:
the teacher gives you some {x,y} coordinate pairs and you have to craft a line through them.
Hopefully, when presented with previously unseen "x" coordinate,
the interpolation will nearly predict where the "y" value
will fall.
Curve-fitting in higher dimensions is problematic.
A model is only an approximation, after all,
and the true nature of the modeled system is separate from the curve that mocks it.
When the data hide piecewise discontinuities or there are sample shortages
in busy sections, the model will produce unexpected results, at times.
The only way improve the model is to stuff it with more actual data, which
in turn can drive up the order.
"Curious"
When the problem space is of some unknown number of dimensions
and inestimable complexity, we can fudge an unseen curve by
creating a trained system.
A neural network is an example.
After the discussion about layers and feedback are over, a neural network
is simply a matrix of coefficients that, when applied against an input vector,
will generate a result that looks like the output data used to train it
against similar inputs.
Neural networks are essentially curve-fitting with all the messy details
of higher order mathematics hidden from view, but with all the
attendant pitfalls.
Pierce Brosnan plays a professor in Mars Attacks.
Performing an autopsy on a dead Martian,
he reaches into the brain and pulls out some goop.
"Curious...," he says.
There's nothing to be understood from looking at the goop.
The goop is the Martian's intelligence in coefficients or probabilities.
It's not anything you can debug.
She always says the right thing
In 1982, Byte published an article about computers writing stories based on
other stories: pick a word, say "the".
Write it down. Look for an arbitrary occurance of the word in the text.
Select the word that follows and repeat the process.
(Don't forget to stop at some point!)
The result will be a block of colocated words.
It may not read like much, but the ordering has
organic support in the source text.
If you extend the look-back to the last two, three or even four
words, the output will start to resemble intelligent use of language.
The word colocations and their N-wise sequence probabilities can be summarized for computation;
we needn't keep the original text around.
The probability that the next token is L, given that the last token was probably K and
the one before that was probably J captures the sense of conditional probability
that forms the basis for a Hidden Markov Model.
It's powerful stuff.
A sufficiently rich model with a large enough training text allows one to guess the next
token in the presence of noise and uncertainty because,
in all probability, these tokens have appeared in the same sequence sometime in the past.
HMM?
Hidden Markov Models (HMM) are what make speech recognition as amazingly good as it is.
The tokens are phonemes.
The systems are trained with parallel corpora—sequences of phonemes matched
to written words or phrases.
When the system believes it has a reasonably high confidence sequence of probable phonemes (or it
runs out of time), it will produce a corresponding best-guess text.
HMMs and parallel corpora are also the basis for automatic machine translation, though it starts
to get uglier because two natural languages will differ in construct and order.
"Leider, spreche ich nicht Deutsches." (Unfortunately, speak I not German).
Lining up the mapping requires a human hand before training.
Myriad other uses for HMMs include mapping human language to parts of speech and predicting
what a given system will do next, based on past actions.
Did you think I was going to bash statistical natural language processing?
Well, "...it's got lifeless eyes. Black eyes. Like a doll's eyes. When he comes at ya,
doesn't seem to be living..."
I like to think that the things my friends say aren't the result of a Markov chain.
I can't be sure.
Actually, thinking about it, I think we might all be HMMs.
In any case, HMMs are a kind of curve-fitting.
The output is a long way from the input, and there's a lot of goop in between.
I was really hoping to compute with knowledge.
That's the fun part.
Noone said this would be easy
The disregard that current practitioners of
computational linguistics have for old-school methods is that
they don't scale very well, and they are brittle.
And they point out that grammar-based systems are too restrictive.
It's all true.
One wouldn't want to create a natural language translator or
mine text using a grammar-based system.
And idiomatic use of language is a challenge.
But once the meaning is acquired, a world of processing
opens up that isn't available to curve-fitters.
The goal is computing with knowledge
Language and reasoning are bound together in your head.
You think about things using one language or another.
This is because language provides tags and primitives you need to
organize your thoughts: actors, actions, recipients of actions,
spacial relationships, and so on.
But the language isn't the thought;
you can ressurect a thought in any number of ways, and possibly
in a number of languages.
It can be the same, computationally.
Once the meaning is acquired, the words are no longer important.
Language is the delivery vehicle.
The relationships of concepts to one another are the basis
for the semantic content of a captured thought.
Accordingly, computing with knowledge isn't really about
natural language at all.
It's about fiddling with ideas.
I outlasted Pascal
How is computing with knowledge different than programming?
It's associative, not serial.
It tolerates generalizing and ambiguity.
And it deals naturally in tenses, including the past, present, future and
hypothetical forms.
Consider:
"If you asked about my pet then you might want to eat my dog."
To Brainhat, that's a computational directive.
It can execute.
Elsewhere on this site you find descriptions of how it works.
Hats off
My hat is off to other projects that are difficult but promising.
Cyc is one; I have experimented with OpenCyc and plan to
incorporate it fairly soon.
Aside from my rant, I see great uses for statistical NLP in this
project, and it's on the drawing board.
How can I help?
— Kevin Dowd, dowd@atlantic.com
|