Integrating Sitecore with Algolia

Out of the box Sitecore ships with Solr as its search provider. As a Sitecore developer the amount of Solr knowledge you need is relatively low, as you access it through Sitecore's own APIs. This makes things simple to get going as it doesn't require a huge amount of effort. However this is where all that's good about using Solr seems to end.

It's not that Solr is bad, it's actually very powerful, has a load of config options for boosting fields, result items etc. If there's something you want to do with your search results then it can probably do it. For admin users though it's just a bit of a black box. Results come out in an order and sometimes your not sure why. If you asked a content editor to change the order of search results they would look at you blankly and not have a clue where to start, other than to ask their dev to do it for them.

Algolia on the other hand, has been designed for the end user. They can try searches through the admin interface, drag and drop results into a different order, run campaigns and affect results in numerous other ways. Not only that but it offers analytics so they can see what searches are returning no results along with searches that have results, but no click throughs.

For devs it's also easy to see what's actually in the search index and front end devs can easily integrate through the APIs rather than requiring a .NET dev to write something against Sitecore's search provider for them.

Creating a search with Algolia and Sitecore

In this article I'm going to show you how to populate an Algolia index with data from Sitecore. What I'm not doing is creating a new Sitecore search provider for Algolia. Other people have attempted that before, but it requires a lot of maintenance. You also have to implement a lot of functionality that your unlikely to use!

My aim also isn't to replace Solr. Sitecore uses it for some of it's functionality and is doing that job perfectly fine. My aim is to add a search to the front end of a Sitecore site powered by Algolia so that a content editor can make use of Algolias features. For that I just need relevant content from Sitecore to be added and removed from Algolias index when a publish happens.

Populating Algolias Index from Sitecore

We want our Algolia Index to contain data for published items and to update as future publishes occur. Publishing being when content is published from the Master DB to the Web DB. A good way to do this is to hook into the Sitecore publishing pipeline.

In my solution I am creating a new pipeline processor that calls directly to Algolia. In my case the amount of content is relatively small and for usability when the publish dialogue completes I want the content editors to be confident that the index has updated. A more scalable solution that is less blocking would be to first post the data to a service bus and then have the integration to Algolia subscribe to the bus. This way any delay caused by Algolia wont affect the publishing experience.

The default PublishItem pipeline in Sitecore is as follows:

<publishItem help="Processors should derive from Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor">
<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessingEvent, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckVirtualItem, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckSecurity, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.DetermineAction, Sitecore.Kernel"/>
<processor type="Sitecore.Buckets.Pipelines.PublishItem.ProcessActionForBucketStructure, Sitecore.Buckets" patch:source="Sitecore.Buckets.config"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.MoveItems, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.AddItemReferences, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.RemoveUnknownChildren, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessedEvent, Sitecore.Kernel" runIfAborted="true"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.UpdateStatistics, Sitecore.Kernel" runIfAborted="true">
<traceToLog>false</traceToLog>
</processor>
</publishItem>

The Sitecore.Publishing.Pipelines.PublishItem.PerformAction step in the pipeline is the one which does the actual work of updating the web db.

To capture deletes as well as inserts / updates we need a step to happen both before and after this action. The step before will capture the deletes and the step after will push the update to Algolia.

My code for capturing the deletes is as follows. This is needed as once the PerformAction step has finished, the item no longer exists so we need to grab it first.

using Sitecore.Diagnostics;
using Sitecore.Publishing;
using Sitecore.Publishing.Pipelines.PublishItem;

namespace SitecoreAlgolia
{
  public class DelateAlgoliaItemsAction : PublishItemProcessor
  {
      public override void Process(PublishItemContext context)
      {
          Assert.ArgumentNotNull(context, "context");

          // We just want to process deletes because this is the only time the item being deleted may exist.
          if (context.Action != PublishAction.DeleteTargetItem && context.Action != PublishAction.PublishSharedFields)
              return;

          // Attempt to find the item.  If not found, item has already been deleted. This can occur when more than one langauge is published. The first language will delete the item.
          var item = context.PublishHelper.GetTargetItem(context.ItemId) ??
                     context.PublishHelper.GetSourceItem(context.ItemId);

          if (item == null)
              return;

          // Hold onto the item for the PublishChangesToAlgoliaAction PublishItemProcessor.
          context.CustomData.Add("Item", item);
      }
  }
}

With the deletes captured the next pipeline action will push each change to Algolia.

The first part of my function is going to ignore anything we're not interested in. This includes:

  • Publishes where the result of the operation was skipped or none (as nothings changed)
  • If we don't have an item
  • If the template of the item isn't one we're interested in pushing to Algolia
  • If the item is a standard values
// Skip if the publish operation was skipped or none.
if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) && (context.Result.Operation == PublishOperation.Skipped || context.Result.Operation == PublishOperation.None))
  return;
          
// For deletes the VersionToPublish is the parent, we need to get the item from the previous step
var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
if (item == null)
   return;

// Restrict items to certain templates
var template = TemplateManager.GetTemplate(item);
// SearchableTemplates is a List<ID>
if (!SearchableTemplates.Any(x => template.ID == x))
   return;

// Don't publish messages for standard values
if (item.ParentID == item.TemplateID)
   return;

Next I convert the Sitecore items into a simple poco object. This is what the Algolia client requires for updating the index.

Notice the first property is called ObjectID, this is a required property for Algolia and is used to identify the record for updates and deletes. I'm using the Sitecore Item ID for this.

// Convert item to the model for Algolia
var searchItem = new SearchResultsItem()
{
  ObjectID = item.ID.ToString(),
  Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
  Content = item.Fields[FieldNames.Base.Content].Value,
  Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
};

One thing to note that I've not included here is to be careful with any link fields. If you are wanting to add a URL into Algolia you may find that the site context the publishing pipeline runs in may not be the same as your final website and therefore you need to set some additional URLOptions on the LinkManager to get the correct URLs.

Finally to push to Algolia it's a case of creating the SearchClient, initializing the index and picking the relevant operation on the index. Just make sure you install the Algolia.Search NuGet package.

// Init Algolia Client
SearchClient client = new SearchClient("<Application ID>", "<API Key>");
SearchIndex index = client.InitIndex("<Index Name>");

// Decide what type of update is going to Algolia
var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
switch (operation)
{
  case PublishOperation.Deleted:
      // Delete
     index.DeleteObject(item.ID.ToString());
     break;
  case PublishOperation.Skipped:
    // Skipped
    break;
  default:
    // Created / Update
   index.SaveObject(searchItem);
   break;
}  

My complete class looks like this. For simplicity of the article I've built this quite crudely with everything in one giant function. For production you would want to split up as per good coding standards.

using Algolia.Search.Clients;
using SitecoreAlgolia.Models;
using Sitecore.Data;
using Sitecore.Data.Items;
using Sitecore.Data.Managers;
using Sitecore.Diagnostics;
using Sitecore.Links;
using Sitecore.Publishing;
using Sitecore.Publishing.Pipelines.PublishItem;
using System.Collections.Generic;
using System.Linq;

namespace SitecoreAlgolia
{
  public class PublishChangesToAlgoliaAction : PublishItemProcessor
  {
      private static readonly List<ID> SearchableTemplates = new[] {
              ItemIds.Templates.PageTemplates.EventItem,
              ItemIds.Templates.PageTemplates.Content,
          }.Select(x => new ID(x))
          .ToList();

      public override void Process(PublishItemContext context)
      {
          Assert.ArgumentNotNull(context, "context");

          // Skip if the publish operation was skipped or none.
          if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) &&
              (context.Result.Operation == PublishOperation.Skipped ||
               context.Result.Operation == PublishOperation.None))
              return;
          
          // For deletes the VersionToPublish is the parent, we need to get the item from the previous step
          var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
          if (item == null)
              return;

          // Restrict items to certain templates
          var template = TemplateManager.GetTemplate(item);
          // SearchableTemplates is a List<ID>
          if (!SearchableTemplates.Any(x => template.ID == x))
              return;

          // Don't publish messages for standard values
          if (item.ParentID == item.TemplateID)
              return;

          // Convert item to the model for Algolia
          var searchItem = new SearchResultsItem()
          {
              ObjectID = item.ID.ToString(),
              Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
              Content = item.Fields[FieldNames.Base.Content].Value,
              Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
          };

          // Init Algolia Client
          SearchClient client = new SearchClient("<Application ID>", "<API Key>");
          SearchIndex index = client.InitIndex("<Index Name>");

          // Decide what type of update is going to Algolia
          var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
          switch (operation)
          {
              case PublishOperation.Deleted:
                  // Delete
                  index.DeleteObject(item.ID.ToString());
                  break;
              case PublishOperation.Skipped:
                  // Skipped
                  break;
              default:
                  // Created / Update
                  index.SaveObject(searchItem);
                  break;
          }          
      }
  }
}

To get our code to run we now need to patch them in using a config file.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
  <pipelines>
    <publishItem>
      <processor patch:before="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.DelateAlgoliaItemsAction, SitecoreAlgolia"/>
      <processor patch:after="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.PublishChangesToAlgoliaAction, SitecoreAlgolia"/>
    </publishItem>
  </pipelines>
</sitecore>
</configuration>

And that's it. As you publish changes the Algolia index will get updated and a front end can be implemented against Algolias API's as it would on any other site.

Special thanks to Mike Scutta for his blog post on Data Integrations with Sitecore which served as a basis for the logic here.