Monday, May 12, 2008

What AskWiki Told Me

A former colleague suggested that I run the seven questions I had put to Powerset by AskWiki, which is also trying to take a semantic approach supported by the Wiki culture of community involvement. The extent of my own involvement was to classify the answer I received (if I received one) as Correct, Incorrect, or Uncertain. This seemed like a fair test, since it was again based on content available through Wikipedia.

The reason I inserted that parenthetical remark is that my first two questions (Where is the Floss? Where is St. Oggs?) did not receive any answer. I realized that I had not compared my Powerset results with keywords submitted to Wikipedia. The Wikipedia search for "Floss" took me to the "Embroidery thread" pages (with a pointer to the "Dental floss" page); but the St. Oggs search provided several pointers, the first of which was to the page for The Mill on the Floss. So I am not quite sure why AskWiki choked on both of these questions.

On the other hand AskWiki did something that Powerset did not do at all. It reminded me that one of my questions (Why is there no general algebraic solution for cubic equations?) was wrong! It is not cubic equations that are unsolvable but quintics! AskWiki was clever (sic) enough to figure out the question I had wanted to ask and answered that the insolvability of quintic equations was the result of the Abel-Ruffini theorem. This theorem has its own Wikipedia page, which I was able to consult to confirm that this theorem was, indeed, based on Galois theory; so I took a fair amount of satisfaction in reporting this as a Correct answer!

The remaining answers were all reported as Incorrect. Here is a brief summary of why they were wrong:

  1. How many operas did Cavalli write? I was given a single sentence about the history of opera that did not even mention Cavalli.
  2. When did Brahms write his first string quartet? I was given a sentence from a description of A German Requiem, which at least got the right composer!
  3. Who were the Smithfield martyrs? I was given a pointer to the Wikipedia page for Smithfield in Cumbria.

This leads me to reinforce that "sic" qualifying my use of the adjective "clever." I have no idea why AskWiki gave me an answer consistent with my personal cubic/quintic confusion; but I strongly suspect that it was a fluke. Nevertheless, it does point out another subtlety of that social side of communicative action, which is that the question we ask is not always the question we had intended to ask. Conversation thus not only provides a context for understanding the question being asked, as I had posited in my previous post, but also affords the opportunity to "fine tune" the question before making the commitment to provide the most useful answer. This is an aspect of conversation that Erving Goffman studied extensively and offers further perspective on the limitations of just how "sharp" that "semantic edge" can ever be.

2 comments:

Complex Event Processing said...

The WikiPedia article on "information entropy" contains this interesting observation:

"The entropy of English text is between 1.0 and 1.5 bits per letter, or as low as 0.6 to 1.3 bits per letter, according to estimates by Shannon based on human experiments."

The first result above is based on a statistical cryptanalysis of text, while the second result is an empirical statement about how humans recognize symbols. The relevance of this observation to the behavior of AskWiki may be contingent upon what sorts of probabilistic models AskWiki employs as it tries to make accurate guesses about what it's being asked.

I'm reminded of how good human pattern recognition is where it comes to correctly identifying words wherein individual letters are rendered incompletely. Although the discrepancy between the figures above might seem minor, there are whole sciences devoted to the profound implications of such minor discrepancies: in genetics, for example, these slight differences separate humans from other mammals. No doubt a great many researchers laboring under the rationalist assumptions of the 17th Century's Scientific Revolution would find it repugnant to suppose there are limits to what phenomena they can model given enough time and the proper tools, yet as quantum mechanics must delicately negotiate between observing the world and observing its own observational methods, perhaps computer science researchers will be presented with other sorts of similar limitations to how much information can be extracted from a system.

Complex Event Processing said...

P.S. I hate it when a CAPTCHA can't figure out I'm a human...