Friday, September 12, 2008

Absence of Judgment

There is a lot to be learned from Elinor Mills' account the Securities and Exchange Commission (SEC) investigation into how United Airlines stock took a 75 percent drop in reaction to the online release of a six-year-old news story. The primary concern of the SEC is whether this is yet another case of improper (or, as I would prefer to call it, pathological) conduct facilitated by technology (such as that circulation of an unfounded rumor about Apple, whose primary consequences also involved stock price) or whether this is the latest instance of what I have called "the usual 'nobody's fault' syndrome," arising from deploying a technology whose undesirable consequences had not been adequately anticipated. The virtue of Mills' report is that she focused to laying out the basic evidence that the SEC will have to interpret as part of their analysis:

The SEC has opened a "preliminary inquiry" into the online distribution of a Chicago Tribune article from 2002 about United Airlines' bankruptcy filing, people familiar with the matter said.

The Tribune Co. said in a statement on Wednesday that it believes a single visit to the archived story on the site of its South Florida Sun-Sentinel newspaper during a low-traffic time period resulted in the computer system displaying it under a tab titled "Popular Stories Business: Most Viewed."

The article was then picked up by Google News and displayed with no indicate of the original date of publishing. It was later distributed by Bloomberg.

Google's automated search agent "Googlebot" misclassified the article because it is unable to differentiate between breaking news and frequently viewed stories on the newspaper Web sites, the Tribune said, adding that it had asked Google to stop crawling its sites month ago, but the process had continued.

Asked for comment, Google spokesman Gabriel Stricker said: "The claim that the Tribune Company asked Google to stop crawling its newspaper Web sites is untrue."

Google's crawler had no reason to suspect that the article was old, according to a Google News blog that was first posted on Monday and updated on Wednesday. "The article failed to include a standard newspaper article dateline, but the Sun-Sentinel page had a fresh date above the article on the top of the page of "September 7, 2008" (Eastern)," it said.

Resolving the conflicting claims between the Tribune Company and Google will not be an easy matter, but addressing the content of that Google News blog post may be more relevant to how the SEC decides to rule. Therefore, I would like to flesh out the context from which Mills extracted her quote:

On Saturday, September 6th at 10:36 PM Pacific Daylight Time (or Sunday, September 7th at 1:36 AM Eastern Daylight Time), the Google crawler detected a new link on the Florida Sun-Sentinel's website in a section of the most viewed stories labeled "Popular Stories: Business." The link had newly appeared in that section since the last time Google News' Googlebot webcrawler had visited the page (nineteen minutes earlier), so the crawler followed the link and found an article titled "UAL Files for Bankruptcy." The article failed to include a standard newspaper article dateline, but the Sun-Sentinel page had a fresh date above the article on the top of the page of "September 7, 2008" (Eastern).

Because the Sun-Sentinel included a link to the story in its "Popular Stories" section, and provided a date on the article page of September 7, 2008, the Google News algorithm indexed it as a new story. We removed this story as soon as we were notified that it was posted in error.

While we don't know why the Sun-Sentinel's website included the link in its "Popular Stories" section, our timestamps show that Google News first crawled the UAL story after following the link from the Sun-Sentinel's "Popular Stories" box:
  • At 10:17:35 PM/PDT, our crawler retrieved a copy of the Sun-Sentinel business section page. [Confirming page image] As you can see, no UAL story appears at this time.
  • At 10:36:38 PM/PDT, our crawler retrieved an updated copy of the same section. This updated version included a new link in the "Popular Stories: Business" section to a story titled "UAL Files for Bankruptcy." [Confirming page image]
  • At 10:36:57 PM/PDT, our crawler followed the new link and fetched this copy of the UAL story. [Confirming page image] At that point, our index was updated to include the article with the date that the story was crawled, and the story became searchable on Google News.
  • At 10:39:57 PM/PDT, the Sun-Sentinel received its first referral to the UAL story from Google News, with a user clicking on a Google News link to the Sun-Sentinel's UAL story.
The Tribune Co. (owner of the Sun-Sentinel) has confirmed in its September 9, 2008 press release that the first referral from Google News to the article came after the UAL story appeared in the "Popular Stories" section.

Thus, the real conflict may be between the Sun-Sentinel software (which may or may not have been provided to the newspaper by the Tribune Company) and Google's crawler, which is why this may be a "nobody's fault" problem. The Tribune statement that Mills cites indicates a defect in how "Popular Stories" are detected, resulting in an error that could have been avoided had there been a human editor monitoring the process. (Note that one can imagine situations in which a six-year old article would again become popular; but, by Tribune's own admission, this was not one of those situations.)

On the other hand human intervention could also have caught the detection of an article that lacked a byline: While, in the first two page images from the Google News blog, all of the Heroux columns (present and past) are dated, no dates are provided in the "Popular Stories" window, reflecting the above problem with the Sun-Sentinel software. More interesting, however, is that the retrieved "Popular Story" itself has no date, stating only that the original source was the Chicago Tribune and naming the staff reporters. The only date on this page is the one on the generic header, which (clearly?) has nothing to do with any of the non-generic content on that page. Thus, the Google crawler results needed monitoring by a human editor as much as the "Popular Stories" detector did; and, since Google is not, itself, a newspaper, the SEC may wish to raise the question of whether or not a business that is providing a news service needs to have that service monitored by a professionally qualified news editor. We can probably guess how Google (probably reinforced with Web 2.0 ideology) would answer that question; but I doubt whether their answer would make much difference. After all, it is one thing for the SEC to rule against the moral equivalent of an individual shouting "FIRE!" in a crowded auditorium; but I cannot imagine that they have ever had to rule on a piece of unmonitored software doing the shouting. Lawrence Lessig would probably say that this is precisely the sort of problem that needs to be resolved through "code;" but that code can only arise from the kind of "self-conscious control" he envisages in his Code book. Can we really expect to see the necessary "self-conscious control" arise in Google in response to this unfortunate incident?

No comments: