Solr is an open source search engine managed as part of the Apache Lucene project. Lucene is a core Java library for building search applications. Solr is based on Lucene and was originally developed at CNET for use on websites such as CNET Reviews.
Solr in a Nutshell
You add documents to the Solr index via XML over HTTP. You query it via HTTP GET and receive XML results. • Advanced Full-Text Search Capabilities • Optimized for High Volume Web Traffic • Standards Based Open InterfacesXML and HTTP • Comprehensive HTML Administration Interfaces • Server statistics exposed over Java Management Service• ScalabilityEfficient Replication to other Solr Search Servers • Flexible and Adaptable with XML configuration • Extensible Plugin Architecture Source: lucene.apache.org
How Does Solr Compare to Nutch?
Another Lucene sub-project, Nutch, is a distributed computing architecture for very large search applications. The Hadoop distributed computing system arose as a spin-off of Nutch.While both Nutch and Solr are based on Lucene, Solr is simpler to implement and provides useful features for categorizing, or faceting, search results.
Why is Faceting Important?
Faceting is a technique for sub-dividing a set of search results to make them more useful. It is particularly useful for searching known result sets, such as product catalogs, where the key categories and attributes of records can be defined by an XML schema.
Real-World Example: CNET Reviews
Annotated Image: Courtesy Lucid Imagination
Solr Web Services Example
Web applications retrieve results from Solr using a REST web services interface. Take this query:http://localhost:8983/solr/select?q=camera&facet=true&facet.field=manu&facet.field=camera_typeThe parameters tell Solr to search for "camera" and break down the results by two facets, manufacturer and camera typeSample XML response from Solr: 17 12 12 9 4 17 11 17 9
Another Example, MTV Networks
While the search on MTV.com and other large sites leverage the Google Search Appliance, Solr is a popular choice on the more focused sites managed by the MTV digital team and is emerging as the default choice for its new and revamped websites.Some of the sites using Solr for full-text search: www.thedailyshow.comwww.colbertnation.comwww.mtvmusic.comwww.mtvmusica.eswwww.mtvmusica.commusic.mtv.uol.com.brwww.gametrailers.comwww.jokes.comwww.southparkstudios.comwww.southparkstudios.dewww.parentsconnect.comwww.spongebob.com
SouthParkStudios.com
The South Park website lets users browse search results by facets such as characters and interviews (the categories on the left hand side of the screen).
TheDailyShow.com
In addition to allowing visitors to enter keyword queries, TheDailyShow.com presents facets by date, by guest, and by correspondent as alternative ways of navigating the search index. The sliders on the timeline widget at the top of the screen lets visitors select a specific year, month, and day.
More Information:
Solr project page: http://lucene.apache.org/solr/Consultants and Support: http://wiki.apache.org/solr/Support