Strategies for creating a meaningful Sitecore site search experience – Part 1

Today, data is considered as a strategic asset. Companies are going leaps and bounds to generate meaningful content from data.

Search in general has evolved as customers need sought out information fast and relevant.

As websites are built, providing a search feature has become the norm these days. Customers are focused, they want relevant information within a fraction of a second.

The most common example is Google search and you can read the basics of search, crawling, indexing here: https://www.google.com/search/howsearchworks/crawling-indexing/

Let us move out of the generic search and focus on Sitecore as a Content Management System (CMS) and it’s search capabilities.

If you are using Sitecore, you have the option to use Lucene or SOLR as the indexing mechanism for search.

Sitecore comes with a default set of search indexes that along with your search and indexing provider help to improve the performance of your website search.

When you configure Sitecore servers in a scalable environment, you first decide whether you want to use Lucene or SOLR as your search and indexing provider. You then configure the indexes you need on each server.

At a high-level

  • If you use Lucene, it is file-based. The sharing of indexes is not supported. Each server must maintain its own Lucene indexes.
  • If you use SOLR, the index storage is centralized and can be shared across multiple servers.

The basic configuration of Lucene and SOLR can be obtained through sitecore knowledgebase below

Sitecore 8.1

https://doc.sitecore.net/sitecore_experience_platform/81/setting_up_and_maintaining/search_and_indexing

Sitecore 8.2

https://doc.sitecore.net/sitecore_experience_platform/setting_up_and_maintaining/search_and_indexing

Indexes required in a scalable sitecore environment

https://doc.sitecore.net/sitecore_experience_platform/81/setting_up_and_maintaining/search_and_indexing/indexing/search_indexes_required_in_a_scalable_environment

 

You can also integrate COVEO a search product by itself with sitecore. COVEO for Sitecore: http://www.coveo.com/en/solutions/coveo-for-sitecore

 

We are going to skip all this wonderful information assuming we know the basics of hooking SOLR and Lucene with Sitecore as we explore further.

 

The purpose of this two part blog is to look beyond the basics and list some Sitecore strategies to obtain meaningful search results using SOLR or Lucene.

Strategy 1: Crawlers and Crawler root – Configuration approach

You can customize the sitecore index configuration and update the crawlers to crawl individual sections of the site to generate relevant search results.

For the purpose of this blog we took the example of sitecore_web_index

SOLR

File: Sitecore.ContentSearch.Solr.Index Web.config

<locations hint=”list:AddCrawler”>

<crawler type=”Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch”>

<Database>web</Database>

<Root>/sitecore</Root>

</crawler>

Lucene

File: Sitecore.ContentSearch.Lucene.Index.Web.config

<locations hint=”list:AddCrawler”>

<crawler type=”Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch”>

<Database>web</Database>

<Root>/sitecore</Root>

</crawler>

</locations>

You can also create multiple crawlers for the same index pointing to different locations in sitecore content tree for more meaningful content type to be indexed.

Example either SOLR or Lucene for sitecore_web_index:

<locations hint=”list:AddCrawler”>

<TreeCrawler type=”Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch”>

<Database>web</Database>

<Root>/sitecore/Content/Home</Root>

</TreeCrawler>

<MediaCrawler type=”Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch”>

<Database>web</Database>

<Root>/sitecore/Media Library</Root>

</MediaCrawler>

</locations>

Following are the advantages of having multiple crawlers in the same index:

  1. Improve sitecore/application performance.
  2. Apply business rules directly in the query: such as pagination, page sorting.
  3. Easier to maintain.

 

Having one or multiple crawlers depends on business and technical requirements keeping performance and maintenance in mind.

Strategy 2: Selective indexing Strategy – Configuration approach

You can include/exclude templates and/or fields from getting indexed. The idea over here is to selectively index relevant items for search provider to filter results. This can be achieved by excluding templates and fields that you do not want to index.

Example: Sitecore.ContentSearch.Solr.Index.Web.config

SOLR

<?xml version=”1.0″ encoding=”utf-8″ ?>

<configuration xmlns:patch=”http://www.sitecore.net/xmlconfig/”>

<sitecore>

<contentSearch>

<configuration type=”Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch”>

<indexes hint=”list:AddIndex”>

<configuration ref=”contentSearch/indexConfigurations/defaultSolrIndexConfiguration”>

<documentOptions type=”Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider”>

     <indexAllFields>true</indexAllFields>            

              <!– GLOBALLY EXCLUDE TEMPLATES FROM BEING INDEXED

               This setting allows you to exclude items that are based on specific templates from the index.

            –>

            <exclude hint=”list:AddExcludedTemplate”>

                            <js>{72B84DB6-F483-4F97-815F-D561E3AEC704}</js>

<css>{FAF0DED3-4F58-4C4B-B301-AEA0CF8CC5F1}</css>

<MicroSitePage>{915B3E93-D01D-4C06-8BB4-79FCE4D760F5}</MicroSitePage>

<Services_Page>{01D8C754-52ED-44EE-B36C-0E744A06E068}</ Services_Page>

</exclude>

<exclude hint=”list:AddExcludedField”>

<__Created>{25BED78C-4957-4165-998A-CA1B52F67497}</__Created>

  </exclude>                      

              </documentOptions>

              </configuration>

<strategies hint=”list:AddStrategy”>…

</strategies>

<locations hint=”list:AddCrawler”>…

</locations>   …

</configuration>

</contentSearch>

</sitecore>

</configuration>

Lucene

You can also include and exclude templates/fields for lucene.

Example: Sitecore.ContentSearch.Lucene.Index.Web.config

Format for inclusion and exclusion is similar to SOLR file above.

Note: Use template name and template ID from sitecore or Field Name and Field ID from sitecore.

Leave a Reply