To: all-ai
Subject: Bayes-limited GSB.
--text follows this line--


I'm not sure it does much good to make excuses.  I'll tell the story, and
we can talk about it at GSB.  If you think it's our fault, then it's our
fault, and you can do whatever you damn please about it then.

It started when Dave told me about a hack by the speech group at ATT- some
of Ken Church's cronies, no doubt.  They took an HMM program from one of
their voice recognizers, trained it on some of Hitler's speeches, put it in
emission mode.  Hooked it up to the net and got some responses from the
Beavis & Butthead crowd.  I figured we can do a hell of a lot better than
that, Dave and I- why not try?

So we take some of the markov code I work on, train it on a random group-
rec.music.jazz- and see if we can get some responses from the net.  You
bet: a few comments about netiquette.  We try a transducer instead; map
from some other people's postings into new ones.  Better.  No negative
feedback, no feedback at all.  It's producing good messages too- I mean,
hell, it can just copy a good post back changing a word or two.  So why no
response?  Eventually we conclude it's the group.  You get into a group
where people think, spend four hours crafting a well-reasoned post and the
only guy that responds is schmoe@aol.com asking if you know how to print
gifs on a timex sinclair.  And then Oded- of course, who else but Oded-
walks in and tells us flaming is where the action is.

So we check- what posts generate the most positive feedback- posts like
themselves.  Wessler codes up a quick Kullback-Leibler hack and it turns
out Oded's right- flame wars are the way to go.  Dave starts directing bits
from every brain-dead right-wing group out there into the training buffer
and we check for feedback.  Amazing.  The thing learns to pick the most
popular posters in a group and flame them to hell and back again.
Unsupervised learning at it's best.  And everybody loves it- they flame on
and on and on.

But it's not saying anything.  Talk a blue streak it does, but it doesn't
say a damn thing.  We might have stopped here.  God only knows we should
have.  But this is my field, and I know how to hack around problems like
this.  Jelinek wrote a paper, you see, back in '68, and I have it.  Talks
about decomposing markov models.  And in our lab we have the person who
knows more about abstraction and decomposition than anyone else, if you can
decipher him- Jcma.  So we start listening to what he has to say about
argument structure. And we train an argument model on flames- any flame
fest anywhere, from alt.syntax.tactical on down- and train a thousand
different content models on any group that will listen.  Make it mildly
non-stationary and put in a context sensitive hack, merge the models, and
let it rip.  And it rips.

For three weeks you could come by our hallway late at night and see us
here.  Dave speeding up the feeds, Carl trying to eradicate the random
ascii infestation in the dictionary faster than it creeps in, Wessler just
wandering the net and laughing and laughing and laughing, and every Sparc
and Indy in the lab running a not-so-niced process called viterbi_cull.

Four million, three hundred thousand states for the biggest model we let
loose, with a vocabulary of just over nine thousand words.  Enough- barely.
A few thousand lines of code just to break apart incoming mail messages.
And a whole bunch of hacks to keep generating new usernames and domains
that I don't want to say any more about.

It was fun.  Until Robert Thau ruined it, in two stages.

First, Oded overhears him talking about mscreps@sfu.ca, talking in tones
that indicate more than mild antipathy.  mscreps@sfu.ca could only be our
program.  Don't ask why- but we can recognize the names it uses a mile off.
So we laugh some more, and start wondering just who is paying attention to
the ravings of a mad markov emitter.  A week or two later he hits us again,
much harder, during a 7ai lunch.  Mentions a speech Jack Kemp gave to some
business vultures in Indiana, about Puerto Ricans and how much money our
government dumps on them.  That was straight from a log file we had been
looking over a few days before, checking for comma aphasia, and had no more
basis in fact than any other statement you get by randomly stringing words
together.  We wait for two more days, until October 27th, then pull the
feed.

Over the last few weeks we've been rewriting the backup tapes and killing
files and gensymed accounts on every machine we ever touched.  It's gone.  

But too late, as you all know.  A group of individuals whose party platform
has a moral basis in flame wars and alt.tasteless will soon take power.
Might as well sell your books now, before they're burned- you won't have to
pay taxes on the profits, at any rate.

In retrospect, we could have guessed.  That's what Bayes is all about, isn't
it?  If A implies B, then B implies A.  We built an idea generator from the
texts of lunatics.  Distribute those ideas, and you up the probability of
irrationality.  Well, now that probability is one.

See you at GSB.  5ish.