The Omniscient Web

<body text=#003366 bgcolor=#FFFFFF link=#000080 vlink=#800000 alink=#000080> Dangerous Ideas is lynx-friendly, at least partially. <BR><BR> The following is only the teaser, but if you ask, I'll take the time to transcribe the slides into a more text friendly version. Just send mail to gremio at ai dot mit dot edu. <HR> <CENTER> <H3> <a href="http://www.ai.mit.edu/lab/dangerous-ideas/">The Seminar on Dangerous Ideas</a> </H3> <H3> 1 p.m. Wednesday, February 27, 2002 </H3> </CENTER> <P> The Web knows everything there is to know; we just have to find ways of getting at it, while simultaneously separating the real knowledge from the junk. The current trend in natural language processing is to use existing corpora to guide the development of text understanding systems. Taken to the natural limit, what if we assumed that data was everything, i.e., when in doubt, just throw in more data? I'll show how we could engineer the World Wide Web, the single largest collection of textual information known to humans, to help solve interesting natural language processing problems such as question answering, word sense disambiguation, coreference resolution, and even commonsense.