Tag: Pipelines
Integrating Sitecore with Algolia

Integrating Sitecore with Algolia

Out of the box Sitecore ships with Solr as its search provider. As a Sitecore developer the amount of Solr knowledge you need is relatively low, as you access it through Sitecore's own APIs. This makes things simple to get going as it doesn't require a huge amount of effort. However this is where all that's good about using Solr seems to end.

It's not that Solr is bad, it's actually very powerful, has a load of config options for boosting fields, result items etc. If there's something you want to do with your search results then it can probably do it. For admin users though it's just a bit of a black box. Results come out in an order and sometimes your not sure why. If you asked a content editor to change the order of search results they would look at you blankly and not have a clue where to start, other than to ask their dev to do it for them.

Algolia on the other hand, has been designed for the end user. They can try searches through the admin interface, drag and drop results into a different order, run campaigns and affect results in numerous other ways. Not only that but it offers analytics so they can see what searches are returning no results along with searches that have results, but no click throughs.

For devs it's also easy to see what's actually in the search index and front end devs can easily integrate through the APIs rather than requiring a .NET dev to write something against Sitecore's search provider for them.

Creating a search with Algolia and Sitecore

In this article I'm going to show you how to populate an Algolia index with data from Sitecore. What I'm not doing is creating a new Sitecore search provider for Algolia. Other people have attempted that before, but it requires a lot of maintenance. You also have to implement a lot of functionality that your unlikely to use!

My aim also isn't to replace Solr. Sitecore uses it for some of it's functionality and is doing that job perfectly fine. My aim is to add a search to the front end of a Sitecore site powered by Algolia so that a content editor can make use of Algolias features. For that I just need relevant content from Sitecore to be added and removed from Algolias index when a publish happens.

Populating Algolias Index from Sitecore

We want our Algolia Index to contain data for published items and to update as future publishes occur. Publishing being when content is published from the Master DB to the Web DB. A good way to do this is to hook into the Sitecore publishing pipeline.

In my solution I am creating a new pipeline processor that calls directly to Algolia. In my case the amount of content is relatively small and for usability when the publish dialogue completes I want the content editors to be confident that the index has updated. A more scalable solution that is less blocking would be to first post the data to a service bus and then have the integration to Algolia subscribe to the bus. This way any delay caused by Algolia wont affect the publishing experience.

The default PublishItem pipeline in Sitecore is as follows:

1<publishItem help="Processors should derive from Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor">
2<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessingEvent, Sitecore.Kernel"/>
3<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckVirtualItem, Sitecore.Kernel"/>
4<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckSecurity, Sitecore.Kernel"/>
5<processor type="Sitecore.Publishing.Pipelines.PublishItem.DetermineAction, Sitecore.Kernel"/>
6<processor type="Sitecore.Buckets.Pipelines.PublishItem.ProcessActionForBucketStructure, Sitecore.Buckets" patch:source="Sitecore.Buckets.config"/>
7<processor type="Sitecore.Publishing.Pipelines.PublishItem.MoveItems, Sitecore.Kernel"/>
8<processor type="Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel"/>
9<processor type="Sitecore.Publishing.Pipelines.PublishItem.AddItemReferences, Sitecore.Kernel"/>
10<processor type="Sitecore.Publishing.Pipelines.PublishItem.RemoveUnknownChildren, Sitecore.Kernel"/>
11<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessedEvent, Sitecore.Kernel" runIfAborted="true"/>
12<processor type="Sitecore.Publishing.Pipelines.PublishItem.UpdateStatistics, Sitecore.Kernel" runIfAborted="true">
13<traceToLog>false</traceToLog>
14</processor>
15</publishItem>

The Sitecore.Publishing.Pipelines.PublishItem.PerformAction step in the pipeline is the one which does the actual work of updating the web db.

To capture deletes as well as inserts / updates we need a step to happen both before and after this action. The step before will capture the deletes and the step after will push the update to Algolia.

My code for capturing the deletes is as follows. This is needed as once the PerformAction step has finished, the item no longer exists so we need to grab it first.

1using Sitecore.Diagnostics;
2using Sitecore.Publishing;
3using Sitecore.Publishing.Pipelines.PublishItem;
4
5namespace SitecoreAlgolia
6{
7 public class DelateAlgoliaItemsAction : PublishItemProcessor
8 {
9 public override void Process(PublishItemContext context)
10 {
11 Assert.ArgumentNotNull(context, "context");
12
13 // We just want to process deletes because this is the only time the item being deleted may exist.
14 if (context.Action != PublishAction.DeleteTargetItem && context.Action != PublishAction.PublishSharedFields)
15 return;
16
17 // Attempt to find the item. If not found, item has already been deleted. This can occur when more than one langauge is published. The first language will delete the item.
18 var item = context.PublishHelper.GetTargetItem(context.ItemId) ??
19 context.PublishHelper.GetSourceItem(context.ItemId);
20
21 if (item == null)
22 return;
23
24 // Hold onto the item for the PublishChangesToAlgoliaAction PublishItemProcessor.
25 context.CustomData.Add("Item", item);
26 }
27 }
28}
29

With the deletes captured the next pipeline action will push each change to Algolia.

The first part of my function is going to ignore anything we're not interested in. This includes:

  • Publishes where the result of the operation was skipped or none (as nothings changed)
  • If we don't have an item
  • If the template of the item isn't one we're interested in pushing to Algolia
  • If the item is a standard values
1// Skip if the publish operation was skipped or none.
2if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) && (context.Result.Operation == PublishOperation.Skipped || context.Result.Operation == PublishOperation.None))
3 return;
4
5// For deletes the VersionToPublish is the parent, we need to get the item from the previous step
6var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
7 if (item == null)
8 return;
9
10// Restrict items to certain templates
11var template = TemplateManager.GetTemplate(item);
12// SearchableTemplates is a List<ID>
13if (!SearchableTemplates.Any(x => template.ID == x))
14 return;
15
16// Don't publish messages for standard values
17if (item.ParentID == item.TemplateID)
18 return;

Next I convert the Sitecore items into a simple poco object. This is what the Algolia client requires for updating the index.

Notice the first property is called ObjectID, this is a required property for Algolia and is used to identify the record for updates and deletes. I'm using the Sitecore Item ID for this.

1// Convert item to the model for Algolia
2var searchItem = new SearchResultsItem()
3{
4 ObjectID = item.ID.ToString(),
5 Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
6 Content = item.Fields[FieldNames.Base.Content].Value,
7 Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
8};

One thing to note that I've not included here is to be careful with any link fields. If you are wanting to add a URL into Algolia you may find that the site context the publishing pipeline runs in may not be the same as your final website and therefore you need to set some additional URLOptions on the LinkManager to get the correct URLs.

Finally to push to Algolia it's a case of creating the SearchClient, initializing the index and picking the relevant operation on the index. Just make sure you install the Algolia.Search NuGet package.

1// Init Algolia Client
2SearchClient client = new SearchClient("<Application ID>", "<API Key>");
3SearchIndex index = client.InitIndex("<Index Name>");
4
5// Decide what type of update is going to Algolia
6var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
7switch (operation)
8{
9 case PublishOperation.Deleted:
10 // Delete
11 index.DeleteObject(item.ID.ToString());
12 break;
13 case PublishOperation.Skipped:
14 // Skipped
15 break;
16 default:
17 // Created / Update
18 index.SaveObject(searchItem);
19 break;
20}

My complete class looks like this. For simplicity of the article I've built this quite crudely with everything in one giant function. For production you would want to split up as per good coding standards.

1using Algolia.Search.Clients;
2using SitecoreAlgolia.Models;
3using Sitecore.Data;
4using Sitecore.Data.Items;
5using Sitecore.Data.Managers;
6using Sitecore.Diagnostics;
7using Sitecore.Links;
8using Sitecore.Publishing;
9using Sitecore.Publishing.Pipelines.PublishItem;
10using System.Collections.Generic;
11using System.Linq;
12
13namespace SitecoreAlgolia
14{
15 public class PublishChangesToAlgoliaAction : PublishItemProcessor
16 {
17 private static readonly List<ID> SearchableTemplates = new[] {
18 ItemIds.Templates.PageTemplates.EventItem,
19 ItemIds.Templates.PageTemplates.Content,
20 }.Select(x => new ID(x))
21 .ToList();
22
23 public override void Process(PublishItemContext context)
24 {
25 Assert.ArgumentNotNull(context, "context");
26
27 // Skip if the publish operation was skipped or none.
28 if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) &&
29 (context.Result.Operation == PublishOperation.Skipped ||
30 context.Result.Operation == PublishOperation.None))
31 return;
32
33 // For deletes the VersionToPublish is the parent, we need to get the item from the previous step
34 var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
35 if (item == null)
36 return;
37
38 // Restrict items to certain templates
39 var template = TemplateManager.GetTemplate(item);
40 // SearchableTemplates is a List<ID>
41 if (!SearchableTemplates.Any(x => template.ID == x))
42 return;
43
44 // Don't publish messages for standard values
45 if (item.ParentID == item.TemplateID)
46 return;
47
48 // Convert item to the model for Algolia
49 var searchItem = new SearchResultsItem()
50 {
51 ObjectID = item.ID.ToString(),
52 Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
53 Content = item.Fields[FieldNames.Base.Content].Value,
54 Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
55 };
56
57 // Init Algolia Client
58 SearchClient client = new SearchClient("<Application ID>", "<API Key>");
59 SearchIndex index = client.InitIndex("<Index Name>");
60
61 // Decide what type of update is going to Algolia
62 var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
63 switch (operation)
64 {
65 case PublishOperation.Deleted:
66 // Delete
67 index.DeleteObject(item.ID.ToString());
68 break;
69 case PublishOperation.Skipped:
70 // Skipped
71 break;
72 default:
73 // Created / Update
74 index.SaveObject(searchItem);
75 break;
76 }
77 }
78 }
79}
80

To get our code to run we now need to patch them in using a config file.

1<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
2 <sitecore>
3 <pipelines>
4 <publishItem>
5 <processor patch:before="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.DelateAlgoliaItemsAction, SitecoreAlgolia"/>
6 <processor patch:after="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.PublishChangesToAlgoliaAction, SitecoreAlgolia"/>
7 </publishItem>
8 </pipelines>
9 </sitecore>
10</configuration>

And that's it. As you publish changes the Algolia index will get updated and a front end can be implemented against Algolias API's as it would on any other site.

Special thanks to Mike Scutta for his blog post on Data Integrations with Sitecore which served as a basis for the logic here.

Pipelines - remember the big picture

Pipelines - remember the big picture

Sitecore pipelines are great. With them you can relatively easily add and remove functionality as you wish. Pipelines like httpRequestBegin, httpRequestProcessed and mvc.beginRequest are also really useful if you need some logic to run on a page load that shouldn't really be part of a rendering. This could be anything from login checks to updating the way 404 pages are returned. However you do need to remember the big picture of what you are changing.

Pipelines don't just effect the processes of the website your building, Sitecore uses them too. That means when you add a new processor to the httpRequestBegin pipeline, that's going to effect every request in the admin CMS too. Just checking the context item also isn't enough as some things. e.g. opening a node in a tree view, will have the context of the node you clicked on!

Adding this snippet of code to the beginning of your process should keep you safe though.

1using Sitecore.Diagnostics;
2using Sitecore.Pipelines.HttpRequest;
3using Sitecore;
4using Sitecore.SecurityModel;
5
6namespace CustomPiplelineNamespace
7{
8 public class CustomPipeline : HttpRequestProcessor
9 {
10 public override void Process(HttpRequestArgs args)
11 {
12 //Check args isn't null
13 Assert.ArgumentNotNull(args, "args");
14 //Check we have a site
15 if (Context.Site == null)
16 return;
17 //Check the sites domain isn't one for sitecore
18 if (Context.Site.Domain != DomainManager.GetDomain("sitecore"))
19 return;
20 //Check that we're not in a redirect
21 if (args.Url.FilePathWithQueryString.ToUpperInvariant().Contains("redirected=true".ToUpperInvariant()))
22 return;
23 //Check that we're not in the page editor
24 if (Context.PageMode.IsPageEditor)
25 return;
26
27 // DO CODE
28 }
29 }
30}

Updating the response headers on your 404 Page in Sitecore

A few weeks ago I blogged about how to create a custom 404 Page in Sitecore. Following on from that, one thing you may notice in the response header of your 404 Page is the status code is 200 Ok, rather than 404 Page not found.

When Sitecore can't find a page what actually happens is a 302 redirect is issued to the page not found page, which as its an ordinary page will return a 200 Ok. Thankfully Google is actually quite good at detecting pages a being 404's even when they return the wrong status code, but it would be better if our sites issues the correct headers.

Method 1

The simplest solution is to create a view rendering with the following logic and place it somewhere on your page not found page. This will update the response headers with the correct values.

1@{
2 Response.TrySkipIisCustomErrors = true;
3 Response.StatusCode = 404;
4 Response.StatusDescription = "Page not found";
5}

However personally I don't think this a particularly neat solution. The contents of a view should really be left for what's going in a page rather than interfering with its headers, even if it does have access to the Response object.

Method 2

Rather than using a view my solution is to add some code to the httpRequestEnd pipeline that will check the context items Id against a setting where we will store the Id of the 404 page item in Sitecore and if the two match then update the response header.

The solution will look like this

Pipeline logic

1using Sitecore.Configuration;
2using Sitecore.Data;
3using Sitecore.Pipelines.HttpRequest;
4
5namespace Pipelines.HttpRequest
6{
7 public class PageNotFoundResponseHeader : HttpRequestProcessor
8 {
9 private static readonly string PageNotFoundID = Settings.GetSetting("PageNotFound");
10
11 public override void Process(HttpRequestArgs args)
12 {
13 if (Sitecore.Context.Item != null &amp;&amp; Sitecore.Context.Item.ID == new ID(PageNotFoundID))
14 {
15 args.Context.Response.TrySkipIisCustomErrors = true;
16 args.Context.Response.StatusCode = 404;
17 args.Context.Response.StatusDescription = "Page not found";
18 }
19 }
20 }
21}

Patch config file

1<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
2 <sitecore>
3 <pipelines>
4 <httpRequestEnd>
5 <processor
6 patch:after="processor[@type='Sitecore.Pipelines.PreprocessRequest.CheckIgnoreFlag, Sitecore.Kernel']"
7 type="Pipelines.HttpRequest.PageNotFoundResponseHeader, MyProjectName" />
8 </httpRequestEnd>
9 </pipelines>
10 <settings>
11 <!-- Page Not Found Item Id -->
12 <setting name="PageNotFound" value="ID of 404 Page" />
13 </settings>
14 </sitecore>
15</configuration>

What's the TrySkipIisCustomErrors property

Quite simply this stops a scenario where you end up on IIS's 404 page rather than your own. If you don't set this, when you update the header status code to 404, IIS likes to return the page from it's settings rather than continuing with your own.