Subdividing Search with Open Source Solr
Habib says another reason open source products tend to find their way into the process is that developers can download them and start working with them without going through a procurement process. "You can imagine what the approval process is like for a purchase at a large corporation. Open source empowers developers come up with solutions," he says.
That's what happened with Solr, which one of the developers working on SouthParkStudios.com began tinkering with on his own initiative, Cohen says. In that case, the advantage was that Solr made it easier to subdivide search results according to categories such as the names of characters featured in that episode, he says.
To achieve those results, the web site must be configured to provide an XML data feed to Solr that exposes the structure and categories stored in the site's content management system (CMS). Because MTV's other major search solution, the Google appliance, relies primarily on crawling web content, its ability to find search facets is limited to identifying different file types or extracting categorization information from HTML meta tags, Cohen says.
So Solr works best with sites such as TheDailyShow.com, which runs off a single CMS, as opposed to MTV.com, which features layers of technology built up over decades, Cohen says. Solr is also used on more complex sites, such as comedycentral.com, as a back end tool for indexing content but not for serving website search requests.
In addition to helping viewers of TheDailyShow.com find segments featuring their favorite comic correspondents, Solr feeds a slider-style timeline user interface widget that sorts through results by when a segment appeared. Cohen says the faceting feature also makes it possible to feed search results to multiple international versions of the MTVMusic site from single index.
The one drawback, Cohen says, is the "tooling" that comes with Solr. The management and monitoring utilities are not as slick as those that come with commercial products, he says.