Sitecore

Integrating Sitecore with Algolia

Out of the box Sitecore ships with Solr as its search provider. As a Sitecore developer the amount of Solr knowledge you need is relatively low, as you access it through Sitecore's own APIs. This makes things simple to get going as it doesn't require a huge amount of effort. However this is where all that's good about using Solr seems to end.

It's not that Solr is bad, it's actually very powerful, has a load of config options for boosting fields, result items etc. If there's something you want to do with your search results then it can probably do it. For admin users though it's just a bit of a black box. Results come out in an order and sometimes your not sure why. If you asked a content editor to change the order of search results they would look at you blankly and not have a clue where to start, other than to ask their dev to do it for them.

Algolia on the other hand, has been designed for the end user. They can try searches through the admin interface, drag and drop results into a different order, run campaigns and affect results in numerous other ways. Not only that but it offers analytics so they can see what searches are returning no results along with searches that have results, but no click throughs.

For devs it's also easy to see what's actually in the search index and front end devs can easily integrate through the APIs rather than requiring a .NET dev to write something against Sitecore's search provider for them.

Creating a search with Algolia and Sitecore

In this article I'm going to show you how to populate an Algolia index with data from Sitecore. What I'm not doing is creating a new Sitecore search provider for Algolia. Other people have attempted that before, but it requires a lot of maintenance. You also have to implement a lot of functionality that your unlikely to use!

My aim also isn't to replace Solr. Sitecore uses it for some of it's functionality and is doing that job perfectly fine. My aim is to add a search to the front end of a Sitecore site powered by Algolia so that a content editor can make use of Algolias features. For that I just need relevant content from Sitecore to be added and removed from Algolias index when a publish happens.

Populating Algolias Index from Sitecore

We want our Algolia Index to contain data for published items and to update as future publishes occur. Publishing being when content is published from the Master DB to the Web DB. A good way to do this is to hook into the Sitecore publishing pipeline.

In my solution I am creating a new pipeline processor that calls directly to Algolia. In my case the amount of content is relatively small and for usability when the publish dialogue completes I want the content editors to be confident that the index has updated. A more scalable solution that is less blocking would be to first post the data to a service bus and then have the integration to Algolia subscribe to the bus. This way any delay caused by Algolia wont affect the publishing experience.

The default PublishItem pipeline in Sitecore is as follows:

<publishItem help="Processors should derive from Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor">
<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessingEvent, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckVirtualItem, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckSecurity, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.DetermineAction, Sitecore.Kernel"/>
<processor type="Sitecore.Buckets.Pipelines.PublishItem.ProcessActionForBucketStructure, Sitecore.Buckets" patch:source="Sitecore.Buckets.config"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.MoveItems, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.AddItemReferences, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.RemoveUnknownChildren, Sitecore.Kernel"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessedEvent, Sitecore.Kernel" runIfAborted="true"/>
<processor type="Sitecore.Publishing.Pipelines.PublishItem.UpdateStatistics, Sitecore.Kernel" runIfAborted="true">
<traceToLog>false</traceToLog>
</processor>
</publishItem>

The Sitecore.Publishing.Pipelines.PublishItem.PerformAction step in the pipeline is the one which does the actual work of updating the web db.

To capture deletes as well as inserts / updates we need a step to happen both before and after this action. The step before will capture the deletes and the step after will push the update to Algolia.

My code for capturing the deletes is as follows. This is needed as once the PerformAction step has finished, the item no longer exists so we need to grab it first.

using Sitecore.Diagnostics;
using Sitecore.Publishing;
using Sitecore.Publishing.Pipelines.PublishItem;

namespace SitecoreAlgolia
{
  public class DelateAlgoliaItemsAction : PublishItemProcessor
  {
      public override void Process(PublishItemContext context)
      {
          Assert.ArgumentNotNull(context, "context");

          // We just want to process deletes because this is the only time the item being deleted may exist.
          if (context.Action != PublishAction.DeleteTargetItem && context.Action != PublishAction.PublishSharedFields)
              return;

          // Attempt to find the item.  If not found, item has already been deleted. This can occur when more than one langauge is published. The first language will delete the item.
          var item = context.PublishHelper.GetTargetItem(context.ItemId) ??
                     context.PublishHelper.GetSourceItem(context.ItemId);

          if (item == null)
              return;

          // Hold onto the item for the PublishChangesToAlgoliaAction PublishItemProcessor.
          context.CustomData.Add("Item", item);
      }
  }
}

With the deletes captured the next pipeline action will push each change to Algolia.

The first part of my function is going to ignore anything we're not interested in. This includes:

  • Publishes where the result of the operation was skipped or none (as nothings changed)
  • If we don't have an item
  • If the template of the item isn't one we're interested in pushing to Algolia
  • If the item is a standard values
// Skip if the publish operation was skipped or none.
if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) && (context.Result.Operation == PublishOperation.Skipped || context.Result.Operation == PublishOperation.None))
  return;
          
// For deletes the VersionToPublish is the parent, we need to get the item from the previous step
var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
if (item == null)
   return;

// Restrict items to certain templates
var template = TemplateManager.GetTemplate(item);
// SearchableTemplates is a List<ID>
if (!SearchableTemplates.Any(x => template.ID == x))
   return;

// Don't publish messages for standard values
if (item.ParentID == item.TemplateID)
   return;

Next I convert the Sitecore items into a simple poco object. This is what the Algolia client requires for updating the index.

Notice the first property is called ObjectID, this is a required property for Algolia and is used to identify the record for updates and deletes. I'm using the Sitecore Item ID for this.

// Convert item to the model for Algolia
var searchItem = new SearchResultsItem()
{
  ObjectID = item.ID.ToString(),
  Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
  Content = item.Fields[FieldNames.Base.Content].Value,
  Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
};

One thing to note that I've not included here is to be careful with any link fields. If you are wanting to add a URL into Algolia you may find that the site context the publishing pipeline runs in may not be the same as your final website and therefore you need to set some additional URLOptions on the LinkManager to get the correct URLs.

Finally to push to Algolia it's a case of creating the SearchClient, initializing the index and picking the relevant operation on the index. Just make sure you install the Algolia.Search NuGet package.

// Init Algolia Client
SearchClient client = new SearchClient("<Application ID>", "<API Key>");
SearchIndex index = client.InitIndex("<Index Name>");

// Decide what type of update is going to Algolia
var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
switch (operation)
{
  case PublishOperation.Deleted:
      // Delete
     index.DeleteObject(item.ID.ToString());
     break;
  case PublishOperation.Skipped:
    // Skipped
    break;
  default:
    // Created / Update
   index.SaveObject(searchItem);
   break;
}  

My complete class looks like this. For simplicity of the article I've built this quite crudely with everything in one giant function. For production you would want to split up as per good coding standards.

using Algolia.Search.Clients;
using SitecoreAlgolia.Models;
using Sitecore.Data;
using Sitecore.Data.Items;
using Sitecore.Data.Managers;
using Sitecore.Diagnostics;
using Sitecore.Links;
using Sitecore.Publishing;
using Sitecore.Publishing.Pipelines.PublishItem;
using System.Collections.Generic;
using System.Linq;

namespace SitecoreAlgolia
{
  public class PublishChangesToAlgoliaAction : PublishItemProcessor
  {
      private static readonly List<ID> SearchableTemplates = new[] {
              ItemIds.Templates.PageTemplates.EventItem,
              ItemIds.Templates.PageTemplates.Content,
          }.Select(x => new ID(x))
          .ToList();

      public override void Process(PublishItemContext context)
      {
          Assert.ArgumentNotNull(context, "context");

          // Skip if the publish operation was skipped or none.
          if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) &&
              (context.Result.Operation == PublishOperation.Skipped ||
               context.Result.Operation == PublishOperation.None))
              return;
          
          // For deletes the VersionToPublish is the parent, we need to get the item from the previous step
          var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
          if (item == null)
              return;

          // Restrict items to certain templates
          var template = TemplateManager.GetTemplate(item);
          // SearchableTemplates is a List<ID>
          if (!SearchableTemplates.Any(x => template.ID == x))
              return;

          // Don't publish messages for standard values
          if (item.ParentID == item.TemplateID)
              return;

          // Convert item to the model for Algolia
          var searchItem = new SearchResultsItem()
          {
              ObjectID = item.ID.ToString(),
              Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
              Content = item.Fields[FieldNames.Base.Content].Value,
              Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
          };

          // Init Algolia Client
          SearchClient client = new SearchClient("<Application ID>", "<API Key>");
          SearchIndex index = client.InitIndex("<Index Name>");

          // Decide what type of update is going to Algolia
          var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
          switch (operation)
          {
              case PublishOperation.Deleted:
                  // Delete
                  index.DeleteObject(item.ID.ToString());
                  break;
              case PublishOperation.Skipped:
                  // Skipped
                  break;
              default:
                  // Created / Update
                  index.SaveObject(searchItem);
                  break;
          }          
      }
  }
}

To get our code to run we now need to patch them in using a config file.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
  <pipelines>
    <publishItem>
      <processor patch:before="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.DelateAlgoliaItemsAction, SitecoreAlgolia"/>
      <processor patch:after="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.PublishChangesToAlgoliaAction, SitecoreAlgolia"/>
    </publishItem>
  </pipelines>
</sitecore>
</configuration>

And that's it. As you publish changes the Algolia index will get updated and a front end can be implemented against Algolias API's as it would on any other site.

Special thanks to Mike Scutta for his blog post on Data Integrations with Sitecore which served as a basis for the logic here.

Reducing the size of Sitecore Master DB

When it comes to Sitecore development, an issue every developer has likely experienced is the size of the databases in relation to the size of their hard disk. In an ideal world production DBs would contain production data, test environments would have data ideally suited for testing and developer workstation would have the minimum required to develop the solution.

In reality though, I think most people have the experience of everything being a copy from production. This ends up being the case due to a clients requirements that UAT needs to looks the same as prod, QA needs prod content to replicate a bug and although critical Sitecore items may have been serialized, not having any content in your local makes it a bit hard to dev.

When a website is new and doesn't have much content this isn't a huge issue, but when you inherit one 5 years old with a 25gb DB, things start to become a problem. Not only is the hard disc space required an issue, but just setting a new developer up takes hours from download times.

After getting a new laptop and being faced with the challenge of needing to copy multiple DBs (and not even having enough space to back them up on my existing machine), I decided to finally do something about reducing the size of them,

Removing old item versions

Having a history of item versions is a great feature for content editors, however as a dev I don't really need them on my local. They also hold references to media items that aren't used any more.

This Sitecore Powershell script from Webbson does exactly that and even lets your specify how many version you want to keep. I went for 1.

<#
This script will remove old versions of items in all languages so that the items only contains a selected number of versions.
#>

$item = Get-Item -Path "master:\content"
$dialogProps = @{
  Parameters = @(
      @{ Name = "item"; Title="Branch to analyse"; Root="/sitecore/content/Home"},
      @{ Name = "count"; Value=10; Title="Max number of versions";  Editor="number"},
      @{ Name = "remove"; Value=$False; Title="Do you wish to remove items?"; Editor="check"}
  )
  Title = "Limit item version count"
  Description = "Sitecore recommends keeping 10 or fewer versions on any item, but policy may dictate this to be a higher number."
  Width = 500
  Height = 280
  OkButtonName = "Proceed"
  CancelButtonName = "Abort"
}

$result = Read-Variable @dialogProps 

if($result -ne "ok") {
  Close-Window
  Exit
}

$items = @()
Get-Item -Path master: -ID $item.ID -Language * | ForEach-Object { $items += @($_) + @(($_.Axes.GetDescendants())) | Where-Object { $_.Versions.Count -gt $count } | Initialize-Item }
$ritems = @()
$items | ForEach-Object {
  $webVersion = Get-Item -Path web: -ID $_.ID -Language $_.Language
  if ($webVersion) {
      $minVersion = $webVersion.Version.Number - $count
      $ritems += Get-Item -Path master: -ID $_.ID -Language $_.Language -Version * | Where-Object { $_.Version.Number -le $minVersion }
  }
}
if ($remove) {
  $toRemove = $ritems.Count
  $ritems | ForEach-Object {
      $_ | Remove-ItemVersion
  }
  Show-Alert "Removed $toRemove versions"
} else {
  $reportProps = @{
      Property = @(
          "DisplayName",
          @{Name="Version"; Expression={$_.Version}},
          @{Name="Path"; Expression={$_.ItemPath}},
          @{Name="Language"; Expression={$_.Language}}
      )
      Title = "Versions proposed to remove"
      InfoTitle = "Sitecore recommendation: Limit the number of versions of any item to the fewest possible."
      InfoDescription = "The report shows all items that have more than <b>$count versions</b>."
  }
  $ritems | Show-ListView @reportProps
}

Close-Window

Removing unpublished items

After a few failed attempts at reducing the size of the DB's, I discovered that the content editors working on the website had seemingly never deleted any content. Instead that had just marked things as unpublishable. I can see the logic in this, but after 5+ years, they have a lot of unpublished content filling up the content tree.

Well if it's unpublished I probably don't need it on my local machine so lets delete it.

Here's a script I wrote, the first part removes items set to never publish. After running just this part I found lots of the content items had the item set to publish but the version set to hidden. The second part loops through versions on items and removes any version set to hidden. If the item has no version left then it is removed too.

// Remove items set to never publish
Get-ChildItem -Path "master:\sitecore\content" -Recurse | 
Where-Object { $_."__Never publish" -eq "1" } | Remove-Item -Recurse -Force -Confirm:$false
  
// Loop through items and remove versions set to never publish, then remove the item if it has no versions left
foreach($item in Get-ChildItem -Path "master:\sitecore\content" -Recurse) {

$item
foreach ($version in $item.Versions.GetVersions($true))
{
   $version
      $version."__Hide version"
      if ($version."__Hide version" -eq "1" ) {
          $version| Remove-ItemVersion -Recurse  -Confirm:$false
      }
}

if ($item.Versions.GetVersions($true).count -eq 0) {
   $item | Remove-Item -Recurse -Force -Confirm:$false
}
}

Remove dead links

In the next step I rebuild the links DB, but I kept ending up with entries in the link table with target items that didn't exist. After a bit of searching I came across an admin page for clearing up dead links.

/sitecore/admin/RemoveBrokenLinks.aspx

With this page you can remove all those pesky dead links caused by editors deleting items and leaving the links behind.

Remove broken links screen in Sitecore

Clean Up DBs

With our content reduced the DB's now need a clean up before we do anything else.

In the admin section there is a DB Cleanup page that will let you perform various tasks on the DB. I suggest doing all of these.

/sitecore/admin/DBCleanup.aspx

Sitecore Database cleanup page

Once this is done navigate to the control panel and rebuild the link database. From the control panel you can also run the clean up database script, but it won't give you as much feedback.

/sitecore/client/Applications/ControlPanel.aspx?sc_bw=1

Sitecore rebuild link database screen

Remove unused media

With all the old versions/items/dead links removed and the DB's cleaned up its time to get rid of any unused media items. It's likely if you have a large DB that most of the space will be taken up by the media items. Fortunately with another PowerShell script we can removed any media that isn't linked too.

This PowerShell script is an adapted version of one by Michael West. You can find his version here https://michaellwest.blogspot.com/2014/10/sitecore-powershell-extensions-tip.html?_sm_au_=iVVB4RsPtStf5MfN

The main difference is I've been more aggressive and removed the checks on item owner and age.

filter Skip-MissingReference {
  $linkDb = [Sitecore.Globals]::LinkDatabase
  if($linkDb.GetReferrerCount($_) -eq 0) {
      $_
  }
}

$items = Get-ChildItem -Path "master:\sitecore\media library" -Recurse | 
  Where-Object { $_.TemplateID -ne [Sitecore.TemplateIDs]::MediaFolder } |
  Skip-MissingReference

if($items) {
  Write-Log "Removing $($items.Length) item(s)."
  $items | Remove-Item
}

Shrink databases

Lastly through SQL management studio, shrink your database and files to recover unused space you hopefully now have from removing all of that media.

In my case I was able to turn a 20+ GB database into a 7 GB database by doing these steps.

If your local is running with both web and master DB, you should now do a full publish. The item versions which are published should stay exactly the same as we only removed items set to not publish. You should however get a reduction in your web DB from the media items being removed.

Creating a SSL for your local Sitecore Site

When you install Sitecore, the installer will quite handily setup some SSL certificates for you. That way when you test locally your site will correctly run under https. However for various reasons you may not have used the installer to setup your local instance, in which case you need to do it yourself.

Creating a self signed SSL certificate however is one of those things that's always been far harder than is should. Previously I've written about how you can do it using mkcert, but recently I've found another way.

Creating a new self-signed SSL certificate with PowerShell

First open a PowerShell window or if you use the new Windows Terminal then one of those will do. Make sure you run it as an administrator or you'll run into permissions errors.

Then run the following command filling in your site URL and a friendly name.

New-SelfSignedCertificate -CertStoreLocation Cert:\LocalMachine\My -DnsName "my-site.local" -FriendlyName "MySiteName" -NotAfter (Get-Date).AddYears(5)

This will create a cert with the expiry date set in 5 years time.

Next, it needs moving to the Trusted Root Certification Authorities store.

Click Start and type:

certlm.msc

Find the certificate you just created in your personal certificates, and copy it into the trusted root certificates.

IIS HTTPS Site Bindings

To instruct your site to use the new certificate you need to update the IIS bindings for your site.

Go to IIS > selects site > Bindings... and then choose the https bindings.

You should have something like this.

In the SSL certificate drop down, pick your newly created certificate.

At this point you should have an SSL certificate which browsers actually like!

Other Sitecore Settings

Despite the newly working certificate you may still run into issues with Sitecore which could either be due to SSL thumbprints in config files or config settings for URLs not including https. e.g. In {IDENTITYSERVER_ROOT}/Config/production/Sitecore.IdentityServer.Host.xml there is a setting for AllowedCorsOrigins which will need the https version of the url.