<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Google abandons PageRank for Wikipedia data?</title>
	<atom:link href="http://www.sgi.nu/diary/2007/02/14/google-abandons-pagerank-for-wikipedia-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sgi.nu/diary/2007/02/14/google-abandons-pagerank-for-wikipedia-data/</link>
	<description>The musings of a research software engineer ...</description>
	<lastBuildDate>Wed, 08 Sep 2010 10:28:22 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: aaron</title>
		<link>http://www.sgi.nu/diary/2007/02/14/google-abandons-pagerank-for-wikipedia-data/comment-page-1/#comment-14478</link>
		<dc:creator>aaron</dc:creator>
		<pubDate>Wed, 14 Feb 2007 03:59:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.sgi.nu/diary/2007/02/14/google-abandons-pagerank-for-wikipedia-data/#comment-14478</guid>
		<description>Andrew info is a mystery.

my first theory was andrew was in the history of the document and google had an old version, but no.

my second theory, as there is a french version of the andy andy article, maybe french to english convertion of andy = andrew, but according to babelfish, it&#039;s just andy. so no.

Third theory was maybe wiki is doing cloaking to the googlebot, but wget says no to that idea.

Why not link to wiki/Andrew_Snoid ? If it was wiki/Andy blah would it still get there ?

why does info james point to wiki/James_(Nip/Tuck) and not wiki/Jimmy_Jimmy ?

Weird behaviour, now i&#039;m confused and wont sleep properly tonight :-&#124; Its definately a different smell to the normal google indexing, esp as site:en.wikipedia.org andy doesn&#039;t list andy andy in the first page.  





Even weirder, no other articles point to the Andy Andy, so googlebot would have to come in from the all pages index.</description>
		<content:encoded><![CDATA[<p>Andrew info is a mystery.</p>
<p>my first theory was andrew was in the history of the document and google had an old version, but no.</p>
<p>my second theory, as there is a french version of the andy andy article, maybe french to english convertion of andy = andrew, but according to babelfish, it&#8217;s just andy. so no.</p>
<p>Third theory was maybe wiki is doing cloaking to the googlebot, but wget says no to that idea.</p>
<p>Why not link to wiki/Andrew_Snoid ? If it was wiki/Andy blah would it still get there ?</p>
<p>why does info james point to wiki/James_(Nip/Tuck) and not wiki/Jimmy_Jimmy ?</p>
<p>Weird behaviour, now i&#8217;m confused and wont sleep properly tonight <img src='http://www.sgi.nu/diary/wp-includes/images/smilies/icon_neutral.gif' alt=':-|' class='wp-smiley' />  Its definately a different smell to the normal google indexing, esp as site:en.wikipedia.org andy doesn&#8217;t list andy andy in the first page.  </p>
<p>Even weirder, no other articles point to the Andy Andy, so googlebot would have to come in from the all pages index.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Lampert</title>
		<link>http://www.sgi.nu/diary/2007/02/14/google-abandons-pagerank-for-wikipedia-data/comment-page-1/#comment-14472</link>
		<dc:creator>Andrew Lampert</dc:creator>
		<pubDate>Wed, 14 Feb 2007 03:09:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.sgi.nu/diary/2007/02/14/google-abandons-pagerank-for-wikipedia-data/#comment-14472</guid>
		<description>Interesting examples Aaron. Things are obviously more complicated than simple title comparisons in some cases.

All the nationalities I tried (Australian, Swedish, Dutch) map to the wiki page about the relevant country. Some have additional titles that map to the country article (so /wiki/Australian goes straight to /wiki/Australia), but others, like Swedish, have a disambiguation page that, by my reckoning, is ignored. I suspect that the Spanish -&gt; Spain style mapping could be due to simple word lookup. It&#039;s not simply word stemming, since you&#039;re never going to get from Dutch -&gt; Netherlands by that route.

The Microsoft example is very puzzling to me. Searching wikipedia using google with the query &lt;i&gt;&#039;microsoft&#039;&lt;/i&gt; returns the obvious /wiki/Microsoft page. The Game Studios page doesn&#039;t appear in the first 10 links. The /wiki/Microsoft page is both a better answer, and seemingly simpler to identify. I can&#039;t think of a reason that the Microsoft_Game_Studios page would be returned instead. Any ideas?

Another weird example is searching for &lt;i&gt;&#039;Andrew info&#039;&lt;/i&gt;. The OneBox result is wiki/Andy_Andy. That article isn&#039;t linked anywhere I can find, and I can&#039;t understand how it would be the chosen article. Again, there&#039;s obviously an Andrew-&gt;Andy mapping, but I can&#039;t see where this is being made using wikipedia data, so perhaps it&#039;s being done separately using Google resources?</description>
		<content:encoded><![CDATA[<p>Interesting examples Aaron. Things are obviously more complicated than simple title comparisons in some cases.</p>
<p>All the nationalities I tried (Australian, Swedish, Dutch) map to the wiki page about the relevant country. Some have additional titles that map to the country article (so /wiki/Australian goes straight to /wiki/Australia), but others, like Swedish, have a disambiguation page that, by my reckoning, is ignored. I suspect that the Spanish -> Spain style mapping could be due to simple word lookup. It&#8217;s not simply word stemming, since you&#8217;re never going to get from Dutch -> Netherlands by that route.</p>
<p>The Microsoft example is very puzzling to me. Searching wikipedia using google with the query <i>&#8216;microsoft&#8217;</i> returns the obvious /wiki/Microsoft page. The Game Studios page doesn&#8217;t appear in the first 10 links. The /wiki/Microsoft page is both a better answer, and seemingly simpler to identify. I can&#8217;t think of a reason that the Microsoft_Game_Studios page would be returned instead. Any ideas?</p>
<p>Another weird example is searching for <i>&#8216;Andrew info&#8217;</i>. The OneBox result is wiki/Andy_Andy. That article isn&#8217;t linked anywhere I can find, and I can&#8217;t understand how it would be the chosen article. Again, there&#8217;s obviously an Andrew->Andy mapping, but I can&#8217;t see where this is being made using wikipedia data, so perhaps it&#8217;s being done separately using Google resources?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: aaron</title>
		<link>http://www.sgi.nu/diary/2007/02/14/google-abandons-pagerank-for-wikipedia-data/comment-page-1/#comment-14471</link>
		<dc:creator>aaron</dc:creator>
		<pubDate>Wed, 14 Feb 2007 02:01:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.sgi.nu/diary/2007/02/14/google-abandons-pagerank-for-wikipedia-data/#comment-14471</guid>
		<description>Interesting to try and work out the algorythm for it.

any thoughts about;
microsoft info goes to /wiki/Microsoft_Game_Studios even though /wiki/Microsoft is a thorough (and more applicable page) 

spanish info goes to /wiki/spain

and &#039;java bad performance info&#039; doesn&#039;t return anything, is google&#039;s indexing so good it doesn&#039;t bother to waste bytes returning common knowledge ? thats clever :P</description>
		<content:encoded><![CDATA[<p>Interesting to try and work out the algorythm for it.</p>
<p>any thoughts about;<br />
microsoft info goes to /wiki/Microsoft_Game_Studios even though /wiki/Microsoft is a thorough (and more applicable page) </p>
<p>spanish info goes to /wiki/spain</p>
<p>and &#8216;java bad performance info&#8217; doesn&#8217;t return anything, is google&#8217;s indexing so good it doesn&#8217;t bother to waste bytes returning common knowledge ? thats clever <img src='http://www.sgi.nu/diary/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
</channel>
</rss>
