The double-edged sword of regression testing
Sat through an interesting seminar from Kevin Schofield, General Manager of Research at Microsoft Research yesterday, while he visited our Marsfield Lab. In what was a relatively short presentation, Kevin covered only a tiny part of the work at MSR in any detail. Despite time constraints, however, a couple of the points he made really made me stop and think, including his clearly heartfelt comments on the disappearance of Jim Gray.
On the technical side, one of the astounding take-home points for me was the magnitude of complexity in Microsoft’s various code bases. Code complexity is something I’ve been thinking about a bit lately, particularly in terms of concurrency. Kevin’s point was in rather a different dimension of complexity – that of software testing. In quantifying this complexity, he noted that running the full suite of regression tests over the Windows code base takes 8 weeks! On a large farm of servers!
Just think about what the impact of 8 weeks would be on your release schedule. Kinda makes it hard to have a reliable yet agile release cycle, no? If Microsoft wants to run their full suite of regression tests to ensure old bugs have not been reintroduced by new code changes, then there is a huge impact on the agility with which Microsoft can release new versions, respond to bugs and release critical security patches. While the magnitude of their problem may be somewhat larger than most due to the age, size and complexity of their code base, I’m quite sure Microsoft is not alone in having to face such a problem.
Understandably, MSR has been working to address this issue, in part by deriving mappings of the code exercised by each and every test in the regression suite. These mappings are stored and used to prioritise the regression tests, such that the tests that cover the modified code are exercised first.
Sounds like a very logical approach, and I’m surprised that I haven’t come across such techniques before. Perhaps I just haven’t looked in the right places. How do you manage your regression test suites? Do you partition or prioritise them in any novel ways?
Google abandons PageRank for Wikipedia data?
Something I hadn’t noticed until recently is that, in addition to information about topics such as weather, stock reports, and news, Google OneBox now provides results from Wikipedia if the word ‘info’ or ‘information’ is included in a user’s search query.
The concept of Google OneBox appears to be an attempt to gradually and subtly integrate question answering style results in with the more familiar ranked list of results. This is done only for well understood domains with comprehensive and trusted data sources – of which Wikipedia is increasingly an excellent example.
The integration of Wikipedia results means that now, if you enter a query such as “csiro info”, you’ll sometimes get a result from Wikipedia above your ranked list of general web links that looks like this:
A little experimentation with this feature reveals some curious results.
If we search for “csir info”, we don’t get any Wikipedia OneBox results. A CSIR page does exist on Wikipedia, but it is actually a disambiguation page that points to several possibly intended topics, including the CSIR in India, the CSIR in South Africa, and Australia’s CSIRO (which was once called the CSIR).
More interesting, however, is that Google itself does not appear to be using its own ranking algorithms to determine which Wikipedia page to display in the OneBox results. If we use the search query “csir site:en.wikipedia.org”, which constrains our search to pages from the English Wikipedia, the highest ranked result is in fact the Wikipedia page about the CSIR in South Africa. The disambiguation page appears as the second result. Thus, if Google were using its own ranking algorithms for selecting Wikipedia results for OneBox, we would expect to see the CSIR South Africa Wikipedia article in our OneBox result for our “csir info” search query.
Instead, it appears that results from Wikipedia are only included in OneBox results if there is an exact (or perhaps very close) match between the search query and the title of a Wikipedia article. Disambiguation pages, which prompt a user to choose between multiple topics that might be referred to by a single phrase or term, appear to never be included in the Google OneBox results, even if the title is an exact match.
More importantly, even if external evidence suggests that an article is relevant to a search query, that article won’t be displayed if it’s title doesn’t match the query terms.
Arguably, this makes perfect sense: only results that Google is very confident about are included in OneBox. What is curious is that in determining this confidence level, Google seems to rate the title(s) of Wikipedia articles as a better indicator of relevance than their own ranking algorithms. For the OneBox results, Google relies on Wikipedia title data, which is really just another form of user-supplied metadata, above any combination of external evidence such as anchor text from pages that link to Wikipedia articles.
I think this can be interpreted as an early example (perhaps the first?) of Google relying on user generated metadata (Wikipedia article titles) above their sophisticated, and highly tuned, mathematical ranking algorithms. Is this a sign of things to come? Is Google beating Wikia at their own game before they’ve even got a beta of their social search engine out the door?