To: all-ai Subject: Bayes-limited GSB. --text follows this line-- I'm not sure it does much good to make excuses. I'll tell the story, and we can talk about it at GSB. If you think it's our fault, then it's our fault, and you can do whatever you damn please about it then. It started when Dave told me about a hack by the speech group at ATT- some of Ken Church's cronies, no doubt. They took an HMM program from one of their voice recognizers, trained it on some of Hitler's speeches, put it in emission mode. Hooked it up to the net and got some responses from the Beavis & Butthead crowd. I figured we can do a hell of a lot better than that, Dave and I- why not try? So we take some of the markov code I work on, train it on a random group- rec.music.jazz- and see if we can get some responses from the net. You bet: a few comments about netiquette. We try a transducer instead; map from some other people's postings into new ones. Better. No negative feedback, no feedback at all. It's producing good messages too- I mean, hell, it can just copy a good post back changing a word or two. So why no response? Eventually we conclude it's the group. You get into a group where people think, spend four hours crafting a well-reasoned post and the only guy that responds is schmoe@aol.com asking if you know how to print gifs on a timex sinclair. And then Oded- of course, who else but Oded- walks in and tells us flaming is where the action is. So we check- what posts generate the most positive feedback- posts like themselves. Wessler codes up a quick Kullback-Leibler hack and it turns out Oded's right- flame wars are the way to go. Dave starts directing bits from every brain-dead right-wing group out there into the training buffer and we check for feedback. Amazing. The thing learns to pick the most popular posters in a group and flame them to hell and back again. Unsupervised learning at it's best. And everybody loves it- they flame on and on and on. But it's not saying anything. Talk a blue streak it does, but it doesn't say a damn thing. We might have stopped here. God only knows we should have. But this is my field, and I know how to hack around problems like this. Jelinek wrote a paper, you see, back in '68, and I have it. Talks about decomposing markov models. And in our lab we have the person who knows more about abstraction and decomposition than anyone else, if you can decipher him- Jcma. So we start listening to what he has to say about argument structure. And we train an argument model on flames- any flame fest anywhere, from alt.syntax.tactical on down- and train a thousand different content models on any group that will listen. Make it mildly non-stationary and put in a context sensitive hack, merge the models, and let it rip. And it rips. For three weeks you could come by our hallway late at night and see us here. Dave speeding up the feeds, Carl trying to eradicate the random ascii infestation in the dictionary faster than it creeps in, Wessler just wandering the net and laughing and laughing and laughing, and every Sparc and Indy in the lab running a not-so-niced process called viterbi_cull. Four million, three hundred thousand states for the biggest model we let loose, with a vocabulary of just over nine thousand words. Enough- barely. A few thousand lines of code just to break apart incoming mail messages. And a whole bunch of hacks to keep generating new usernames and domains that I don't want to say any more about. It was fun. Until Robert Thau ruined it, in two stages. First, Oded overhears him talking about mscreps@sfu.ca, talking in tones that indicate more than mild antipathy. mscreps@sfu.ca could only be our program. Don't ask why- but we can recognize the names it uses a mile off. So we laugh some more, and start wondering just who is paying attention to the ravings of a mad markov emitter. A week or two later he hits us again, much harder, during a 7ai lunch. Mentions a speech Jack Kemp gave to some business vultures in Indiana, about Puerto Ricans and how much money our government dumps on them. That was straight from a log file we had been looking over a few days before, checking for comma aphasia, and had no more basis in fact than any other statement you get by randomly stringing words together. We wait for two more days, until October 27th, then pull the feed. Over the last few weeks we've been rewriting the backup tapes and killing files and gensymed accounts on every machine we ever touched. It's gone. But too late, as you all know. A group of individuals whose party platform has a moral basis in flame wars and alt.tasteless will soon take power. Might as well sell your books now, before they're burned- you won't have to pay taxes on the profits, at any rate. In retrospect, we could have guessed. That's what Bayes is all about, isn't it? If A implies B, then B implies A. We built an idea generator from the texts of lunatics. Distribute those ideas, and you up the probability of irrationality. Well, now that probability is one. See you at GSB. 5ish.