Tag: Publishing
Integrating Sitecore with Algolia

Integrating Sitecore with Algolia

Out of the box Sitecore ships with Solr as its search provider. As a Sitecore developer the amount of Solr knowledge you need is relatively low, as you access it through Sitecore's own APIs. This makes things simple to get going as it doesn't require a huge amount of effort. However this is where all that's good about using Solr seems to end.

It's not that Solr is bad, it's actually very powerful, has a load of config options for boosting fields, result items etc. If there's something you want to do with your search results then it can probably do it. For admin users though it's just a bit of a black box. Results come out in an order and sometimes your not sure why. If you asked a content editor to change the order of search results they would look at you blankly and not have a clue where to start, other than to ask their dev to do it for them.

Algolia on the other hand, has been designed for the end user. They can try searches through the admin interface, drag and drop results into a different order, run campaigns and affect results in numerous other ways. Not only that but it offers analytics so they can see what searches are returning no results along with searches that have results, but no click throughs.

For devs it's also easy to see what's actually in the search index and front end devs can easily integrate through the APIs rather than requiring a .NET dev to write something against Sitecore's search provider for them.

Creating a search with Algolia and Sitecore

In this article I'm going to show you how to populate an Algolia index with data from Sitecore. What I'm not doing is creating a new Sitecore search provider for Algolia. Other people have attempted that before, but it requires a lot of maintenance. You also have to implement a lot of functionality that your unlikely to use!

My aim also isn't to replace Solr. Sitecore uses it for some of it's functionality and is doing that job perfectly fine. My aim is to add a search to the front end of a Sitecore site powered by Algolia so that a content editor can make use of Algolias features. For that I just need relevant content from Sitecore to be added and removed from Algolias index when a publish happens.

Populating Algolias Index from Sitecore

We want our Algolia Index to contain data for published items and to update as future publishes occur. Publishing being when content is published from the Master DB to the Web DB. A good way to do this is to hook into the Sitecore publishing pipeline.

In my solution I am creating a new pipeline processor that calls directly to Algolia. In my case the amount of content is relatively small and for usability when the publish dialogue completes I want the content editors to be confident that the index has updated. A more scalable solution that is less blocking would be to first post the data to a service bus and then have the integration to Algolia subscribe to the bus. This way any delay caused by Algolia wont affect the publishing experience.

The default PublishItem pipeline in Sitecore is as follows:

1<publishItem help="Processors should derive from Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor">
2<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessingEvent, Sitecore.Kernel"/>
3<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckVirtualItem, Sitecore.Kernel"/>
4<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckSecurity, Sitecore.Kernel"/>
5<processor type="Sitecore.Publishing.Pipelines.PublishItem.DetermineAction, Sitecore.Kernel"/>
6<processor type="Sitecore.Buckets.Pipelines.PublishItem.ProcessActionForBucketStructure, Sitecore.Buckets" patch:source="Sitecore.Buckets.config"/>
7<processor type="Sitecore.Publishing.Pipelines.PublishItem.MoveItems, Sitecore.Kernel"/>
8<processor type="Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel"/>
9<processor type="Sitecore.Publishing.Pipelines.PublishItem.AddItemReferences, Sitecore.Kernel"/>
10<processor type="Sitecore.Publishing.Pipelines.PublishItem.RemoveUnknownChildren, Sitecore.Kernel"/>
11<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessedEvent, Sitecore.Kernel" runIfAborted="true"/>
12<processor type="Sitecore.Publishing.Pipelines.PublishItem.UpdateStatistics, Sitecore.Kernel" runIfAborted="true">
13<traceToLog>false</traceToLog>
14</processor>
15</publishItem>

The Sitecore.Publishing.Pipelines.PublishItem.PerformAction step in the pipeline is the one which does the actual work of updating the web db.

To capture deletes as well as inserts / updates we need a step to happen both before and after this action. The step before will capture the deletes and the step after will push the update to Algolia.

My code for capturing the deletes is as follows. This is needed as once the PerformAction step has finished, the item no longer exists so we need to grab it first.

1using Sitecore.Diagnostics;
2using Sitecore.Publishing;
3using Sitecore.Publishing.Pipelines.PublishItem;
4
5namespace SitecoreAlgolia
6{
7 public class DelateAlgoliaItemsAction : PublishItemProcessor
8 {
9 public override void Process(PublishItemContext context)
10 {
11 Assert.ArgumentNotNull(context, "context");
12
13 // We just want to process deletes because this is the only time the item being deleted may exist.
14 if (context.Action != PublishAction.DeleteTargetItem && context.Action != PublishAction.PublishSharedFields)
15 return;
16
17 // Attempt to find the item. If not found, item has already been deleted. This can occur when more than one langauge is published. The first language will delete the item.
18 var item = context.PublishHelper.GetTargetItem(context.ItemId) ??
19 context.PublishHelper.GetSourceItem(context.ItemId);
20
21 if (item == null)
22 return;
23
24 // Hold onto the item for the PublishChangesToAlgoliaAction PublishItemProcessor.
25 context.CustomData.Add("Item", item);
26 }
27 }
28}
29

With the deletes captured the next pipeline action will push each change to Algolia.

The first part of my function is going to ignore anything we're not interested in. This includes:

  • Publishes where the result of the operation was skipped or none (as nothings changed)
  • If we don't have an item
  • If the template of the item isn't one we're interested in pushing to Algolia
  • If the item is a standard values
1// Skip if the publish operation was skipped or none.
2if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) && (context.Result.Operation == PublishOperation.Skipped || context.Result.Operation == PublishOperation.None))
3 return;
4
5// For deletes the VersionToPublish is the parent, we need to get the item from the previous step
6var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
7 if (item == null)
8 return;
9
10// Restrict items to certain templates
11var template = TemplateManager.GetTemplate(item);
12// SearchableTemplates is a List<ID>
13if (!SearchableTemplates.Any(x => template.ID == x))
14 return;
15
16// Don't publish messages for standard values
17if (item.ParentID == item.TemplateID)
18 return;

Next I convert the Sitecore items into a simple poco object. This is what the Algolia client requires for updating the index.

Notice the first property is called ObjectID, this is a required property for Algolia and is used to identify the record for updates and deletes. I'm using the Sitecore Item ID for this.

1// Convert item to the model for Algolia
2var searchItem = new SearchResultsItem()
3{
4 ObjectID = item.ID.ToString(),
5 Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
6 Content = item.Fields[FieldNames.Base.Content].Value,
7 Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
8};

One thing to note that I've not included here is to be careful with any link fields. If you are wanting to add a URL into Algolia you may find that the site context the publishing pipeline runs in may not be the same as your final website and therefore you need to set some additional URLOptions on the LinkManager to get the correct URLs.

Finally to push to Algolia it's a case of creating the SearchClient, initializing the index and picking the relevant operation on the index. Just make sure you install the Algolia.Search NuGet package.

1// Init Algolia Client
2SearchClient client = new SearchClient("<Application ID>", "<API Key>");
3SearchIndex index = client.InitIndex("<Index Name>");
4
5// Decide what type of update is going to Algolia
6var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
7switch (operation)
8{
9 case PublishOperation.Deleted:
10 // Delete
11 index.DeleteObject(item.ID.ToString());
12 break;
13 case PublishOperation.Skipped:
14 // Skipped
15 break;
16 default:
17 // Created / Update
18 index.SaveObject(searchItem);
19 break;
20}

My complete class looks like this. For simplicity of the article I've built this quite crudely with everything in one giant function. For production you would want to split up as per good coding standards.

1using Algolia.Search.Clients;
2using SitecoreAlgolia.Models;
3using Sitecore.Data;
4using Sitecore.Data.Items;
5using Sitecore.Data.Managers;
6using Sitecore.Diagnostics;
7using Sitecore.Links;
8using Sitecore.Publishing;
9using Sitecore.Publishing.Pipelines.PublishItem;
10using System.Collections.Generic;
11using System.Linq;
12
13namespace SitecoreAlgolia
14{
15 public class PublishChangesToAlgoliaAction : PublishItemProcessor
16 {
17 private static readonly List<ID> SearchableTemplates = new[] {
18 ItemIds.Templates.PageTemplates.EventItem,
19 ItemIds.Templates.PageTemplates.Content,
20 }.Select(x => new ID(x))
21 .ToList();
22
23 public override void Process(PublishItemContext context)
24 {
25 Assert.ArgumentNotNull(context, "context");
26
27 // Skip if the publish operation was skipped or none.
28 if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) &&
29 (context.Result.Operation == PublishOperation.Skipped ||
30 context.Result.Operation == PublishOperation.None))
31 return;
32
33 // For deletes the VersionToPublish is the parent, we need to get the item from the previous step
34 var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
35 if (item == null)
36 return;
37
38 // Restrict items to certain templates
39 var template = TemplateManager.GetTemplate(item);
40 // SearchableTemplates is a List<ID>
41 if (!SearchableTemplates.Any(x => template.ID == x))
42 return;
43
44 // Don't publish messages for standard values
45 if (item.ParentID == item.TemplateID)
46 return;
47
48 // Convert item to the model for Algolia
49 var searchItem = new SearchResultsItem()
50 {
51 ObjectID = item.ID.ToString(),
52 Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
53 Content = item.Fields[FieldNames.Base.Content].Value,
54 Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
55 };
56
57 // Init Algolia Client
58 SearchClient client = new SearchClient("<Application ID>", "<API Key>");
59 SearchIndex index = client.InitIndex("<Index Name>");
60
61 // Decide what type of update is going to Algolia
62 var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
63 switch (operation)
64 {
65 case PublishOperation.Deleted:
66 // Delete
67 index.DeleteObject(item.ID.ToString());
68 break;
69 case PublishOperation.Skipped:
70 // Skipped
71 break;
72 default:
73 // Created / Update
74 index.SaveObject(searchItem);
75 break;
76 }
77 }
78 }
79}
80

To get our code to run we now need to patch them in using a config file.

1<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
2 <sitecore>
3 <pipelines>
4 <publishItem>
5 <processor patch:before="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.DelateAlgoliaItemsAction, SitecoreAlgolia"/>
6 <processor patch:after="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.PublishChangesToAlgoliaAction, SitecoreAlgolia"/>
7 </publishItem>
8 </pipelines>
9 </sitecore>
10</configuration>

And that's it. As you publish changes the Algolia index will get updated and a front end can be implemented against Algolias API's as it would on any other site.

Special thanks to Mike Scutta for his blog post on Data Integrations with Sitecore which served as a basis for the logic here.

Sitecore html cache not clearing on publish

Sitecore html cache not clearing on publish

So you've got separate content management and content delivery servers, but when you publish the change is only visible on the content management box.

A likely cause is that you've enabled some caching but haven't updated the config files to clear the cache on your content delivery server.

Sitecores config files contain a list of handlers for what should happen when the event publish:end and publish:end:remote are triggered. Publish end is for the content management server, whereas publish end remote is for your delivery servers. The handler we're interested in is Sitecore.Publishing.HtmlCacheClearer which contains a list of sites to have the cache's cleared on.

By default this will contain one entry for website, the default name given to your site in the sites config when you install sitecore. However you will have changed this if your solution supports multiple sites, or if you changed it as part of some future planning to support multiple sites. If your site is missing, just add it to the live (via a patch file of course)

1<!-- Html Cache clear on publish events -->
2<!-- Force FULL cache clear on publish-->
3<event name="publish:end">
4 <handler type="Sitecore.Publishing.HtmlCacheClearer, Sitecore.Kernel" method="ClearCache" patch:source="BaseSettings.config">
5 <sites hint="list">
6 <site>SiteOne</site>
7 <site>SiteTwo</site>
8 <site>SiteThree</site>
9 </sites>
10 </handler>
11</event>
12<!-- Html Cache clear on publish events -->
13<!-- Force FULL cache clear on publish-->
14<event name="publish:end:remote">
15 <handler type="Sitecore.Publishing.HtmlCacheClearer, Sitecore.Kernel" method="ClearCache" patch:source="BaseSettings.config">
16 <sites hint="list">
17 <site>SiteOne</site>
18 <site>SiteTwo</site>
19 <site>SiteThree</site>
20 </sites>
21 </handler>
22</event>

Note: in the sample above I have removed all other handlers to simplify the example. You should not remove these from your solution.

For more info on cache clearing and optimising it, see John Wests blog series on the subject here https://community.sitecore.net/technical_blogs/b/sitecorejohn_blog/posts/sitecore-output-cache-clearing-optimization-1-8-introduction-john-west-sitecore-blog