Tag: Search
Integrating Sitecore with Algolia

Integrating Sitecore with Algolia

Out of the box Sitecore ships with Solr as its search provider. As a Sitecore developer the amount of Solr knowledge you need is relatively low, as you access it through Sitecore's own APIs. This makes things simple to get going as it doesn't require a huge amount of effort. However this is where all that's good about using Solr seems to end.

It's not that Solr is bad, it's actually very powerful, has a load of config options for boosting fields, result items etc. If there's something you want to do with your search results then it can probably do it. For admin users though it's just a bit of a black box. Results come out in an order and sometimes your not sure why. If you asked a content editor to change the order of search results they would look at you blankly and not have a clue where to start, other than to ask their dev to do it for them.

Algolia on the other hand, has been designed for the end user. They can try searches through the admin interface, drag and drop results into a different order, run campaigns and affect results in numerous other ways. Not only that but it offers analytics so they can see what searches are returning no results along with searches that have results, but no click throughs.

For devs it's also easy to see what's actually in the search index and front end devs can easily integrate through the APIs rather than requiring a .NET dev to write something against Sitecore's search provider for them.

Creating a search with Algolia and Sitecore

In this article I'm going to show you how to populate an Algolia index with data from Sitecore. What I'm not doing is creating a new Sitecore search provider for Algolia. Other people have attempted that before, but it requires a lot of maintenance. You also have to implement a lot of functionality that your unlikely to use!

My aim also isn't to replace Solr. Sitecore uses it for some of it's functionality and is doing that job perfectly fine. My aim is to add a search to the front end of a Sitecore site powered by Algolia so that a content editor can make use of Algolias features. For that I just need relevant content from Sitecore to be added and removed from Algolias index when a publish happens.

Populating Algolias Index from Sitecore

We want our Algolia Index to contain data for published items and to update as future publishes occur. Publishing being when content is published from the Master DB to the Web DB. A good way to do this is to hook into the Sitecore publishing pipeline.

In my solution I am creating a new pipeline processor that calls directly to Algolia. In my case the amount of content is relatively small and for usability when the publish dialogue completes I want the content editors to be confident that the index has updated. A more scalable solution that is less blocking would be to first post the data to a service bus and then have the integration to Algolia subscribe to the bus. This way any delay caused by Algolia wont affect the publishing experience.

The default PublishItem pipeline in Sitecore is as follows:

1<publishItem help="Processors should derive from Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor">
2<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessingEvent, Sitecore.Kernel"/>
3<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckVirtualItem, Sitecore.Kernel"/>
4<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckSecurity, Sitecore.Kernel"/>
5<processor type="Sitecore.Publishing.Pipelines.PublishItem.DetermineAction, Sitecore.Kernel"/>
6<processor type="Sitecore.Buckets.Pipelines.PublishItem.ProcessActionForBucketStructure, Sitecore.Buckets" patch:source="Sitecore.Buckets.config"/>
7<processor type="Sitecore.Publishing.Pipelines.PublishItem.MoveItems, Sitecore.Kernel"/>
8<processor type="Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel"/>
9<processor type="Sitecore.Publishing.Pipelines.PublishItem.AddItemReferences, Sitecore.Kernel"/>
10<processor type="Sitecore.Publishing.Pipelines.PublishItem.RemoveUnknownChildren, Sitecore.Kernel"/>
11<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessedEvent, Sitecore.Kernel" runIfAborted="true"/>
12<processor type="Sitecore.Publishing.Pipelines.PublishItem.UpdateStatistics, Sitecore.Kernel" runIfAborted="true">
13<traceToLog>false</traceToLog>
14</processor>
15</publishItem>

The Sitecore.Publishing.Pipelines.PublishItem.PerformAction step in the pipeline is the one which does the actual work of updating the web db.

To capture deletes as well as inserts / updates we need a step to happen both before and after this action. The step before will capture the deletes and the step after will push the update to Algolia.

My code for capturing the deletes is as follows. This is needed as once the PerformAction step has finished, the item no longer exists so we need to grab it first.

1using Sitecore.Diagnostics;
2using Sitecore.Publishing;
3using Sitecore.Publishing.Pipelines.PublishItem;
4
5namespace SitecoreAlgolia
6{
7 public class DelateAlgoliaItemsAction : PublishItemProcessor
8 {
9 public override void Process(PublishItemContext context)
10 {
11 Assert.ArgumentNotNull(context, "context");
12
13 // We just want to process deletes because this is the only time the item being deleted may exist.
14 if (context.Action != PublishAction.DeleteTargetItem && context.Action != PublishAction.PublishSharedFields)
15 return;
16
17 // Attempt to find the item. If not found, item has already been deleted. This can occur when more than one langauge is published. The first language will delete the item.
18 var item = context.PublishHelper.GetTargetItem(context.ItemId) ??
19 context.PublishHelper.GetSourceItem(context.ItemId);
20
21 if (item == null)
22 return;
23
24 // Hold onto the item for the PublishChangesToAlgoliaAction PublishItemProcessor.
25 context.CustomData.Add("Item", item);
26 }
27 }
28}
29

With the deletes captured the next pipeline action will push each change to Algolia.

The first part of my function is going to ignore anything we're not interested in. This includes:

  • Publishes where the result of the operation was skipped or none (as nothings changed)
  • If we don't have an item
  • If the template of the item isn't one we're interested in pushing to Algolia
  • If the item is a standard values
1// Skip if the publish operation was skipped or none.
2if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) && (context.Result.Operation == PublishOperation.Skipped || context.Result.Operation == PublishOperation.None))
3 return;
4
5// For deletes the VersionToPublish is the parent, we need to get the item from the previous step
6var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
7 if (item == null)
8 return;
9
10// Restrict items to certain templates
11var template = TemplateManager.GetTemplate(item);
12// SearchableTemplates is a List<ID>
13if (!SearchableTemplates.Any(x => template.ID == x))
14 return;
15
16// Don't publish messages for standard values
17if (item.ParentID == item.TemplateID)
18 return;

Next I convert the Sitecore items into a simple poco object. This is what the Algolia client requires for updating the index.

Notice the first property is called ObjectID, this is a required property for Algolia and is used to identify the record for updates and deletes. I'm using the Sitecore Item ID for this.

1// Convert item to the model for Algolia
2var searchItem = new SearchResultsItem()
3{
4 ObjectID = item.ID.ToString(),
5 Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
6 Content = item.Fields[FieldNames.Base.Content].Value,
7 Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
8};

One thing to note that I've not included here is to be careful with any link fields. If you are wanting to add a URL into Algolia you may find that the site context the publishing pipeline runs in may not be the same as your final website and therefore you need to set some additional URLOptions on the LinkManager to get the correct URLs.

Finally to push to Algolia it's a case of creating the SearchClient, initializing the index and picking the relevant operation on the index. Just make sure you install the Algolia.Search NuGet package.

1// Init Algolia Client
2SearchClient client = new SearchClient("<Application ID>", "<API Key>");
3SearchIndex index = client.InitIndex("<Index Name>");
4
5// Decide what type of update is going to Algolia
6var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
7switch (operation)
8{
9 case PublishOperation.Deleted:
10 // Delete
11 index.DeleteObject(item.ID.ToString());
12 break;
13 case PublishOperation.Skipped:
14 // Skipped
15 break;
16 default:
17 // Created / Update
18 index.SaveObject(searchItem);
19 break;
20}

My complete class looks like this. For simplicity of the article I've built this quite crudely with everything in one giant function. For production you would want to split up as per good coding standards.

1using Algolia.Search.Clients;
2using SitecoreAlgolia.Models;
3using Sitecore.Data;
4using Sitecore.Data.Items;
5using Sitecore.Data.Managers;
6using Sitecore.Diagnostics;
7using Sitecore.Links;
8using Sitecore.Publishing;
9using Sitecore.Publishing.Pipelines.PublishItem;
10using System.Collections.Generic;
11using System.Linq;
12
13namespace SitecoreAlgolia
14{
15 public class PublishChangesToAlgoliaAction : PublishItemProcessor
16 {
17 private static readonly List<ID> SearchableTemplates = new[] {
18 ItemIds.Templates.PageTemplates.EventItem,
19 ItemIds.Templates.PageTemplates.Content,
20 }.Select(x => new ID(x))
21 .ToList();
22
23 public override void Process(PublishItemContext context)
24 {
25 Assert.ArgumentNotNull(context, "context");
26
27 // Skip if the publish operation was skipped or none.
28 if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) &&
29 (context.Result.Operation == PublishOperation.Skipped ||
30 context.Result.Operation == PublishOperation.None))
31 return;
32
33 // For deletes the VersionToPublish is the parent, we need to get the item from the previous step
34 var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
35 if (item == null)
36 return;
37
38 // Restrict items to certain templates
39 var template = TemplateManager.GetTemplate(item);
40 // SearchableTemplates is a List<ID>
41 if (!SearchableTemplates.Any(x => template.ID == x))
42 return;
43
44 // Don't publish messages for standard values
45 if (item.ParentID == item.TemplateID)
46 return;
47
48 // Convert item to the model for Algolia
49 var searchItem = new SearchResultsItem()
50 {
51 ObjectID = item.ID.ToString(),
52 Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
53 Content = item.Fields[FieldNames.Base.Content].Value,
54 Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
55 };
56
57 // Init Algolia Client
58 SearchClient client = new SearchClient("<Application ID>", "<API Key>");
59 SearchIndex index = client.InitIndex("<Index Name>");
60
61 // Decide what type of update is going to Algolia
62 var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
63 switch (operation)
64 {
65 case PublishOperation.Deleted:
66 // Delete
67 index.DeleteObject(item.ID.ToString());
68 break;
69 case PublishOperation.Skipped:
70 // Skipped
71 break;
72 default:
73 // Created / Update
74 index.SaveObject(searchItem);
75 break;
76 }
77 }
78 }
79}
80

To get our code to run we now need to patch them in using a config file.

1<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
2 <sitecore>
3 <pipelines>
4 <publishItem>
5 <processor patch:before="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.DelateAlgoliaItemsAction, SitecoreAlgolia"/>
6 <processor patch:after="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.PublishChangesToAlgoliaAction, SitecoreAlgolia"/>
7 </publishItem>
8 </pipelines>
9 </sitecore>
10</configuration>

And that's it. As you publish changes the Algolia index will get updated and a front end can be implemented against Algolias API's as it would on any other site.

Special thanks to Mike Scutta for his blog post on Data Integrations with Sitecore which served as a basis for the logic here.

Populating the internal search report in Sitecore

Populating the internal search report in Sitecore

Out the box Sitecore ships with a number of reports pre-configured. Some of these will show data without you doing anything. e.g. The pages report will automatically start showing the top entry and exit pages as a page view is something Sitecore can track.

Other's like the internal search report will just show a message of no data to display, which can be confusing/frustrating for your users. Particularly when they've just spent money on a license fee to get great analytics data only to see a blank report.

The reason it doesn't show any information is relatively straight forward. Sitecore doesn't know how your site search is going to work and therefore it can't do the data capture part of the process. That part of the process however is actually quite simple to do.

Sitecore has a set of page events that can be registered in the analytics tracker. Some of these like Page Visited will be handled by Sitecore. In this instance the one we are interested in is Search and will we have to register it manually.

To register the search event use some code like this (note, there is a constant that references the item id of the search event). The query parameter should be populated with the search term the user entered.

1using Sitecore.Analytics;
2using Sitecore.Analytics.Data;
3using Sitecore.Data.Items;
4using Sitecore.Diagnostics;
5using SitecoreItemIds;
6
7namespace SitecoreServices
8{
9 public class SiteSearch
10 {
11 public static void TrackSiteSearch(Item pageEventItem, string query)
12 {
13 Assert.ArgumentNotNull(pageEventItem, nameof(pageEventItem));
14 Assert.IsNotNull(pageEventItem, $"Cannot find page event: {pageEventItem}");
15
16 if (Tracker.IsActive)
17 {
18 var pageEventData = new PageEventData("Search", ContentItemIds.Search)
19 {
20 ItemId = pageEventItem.ID.ToGuid(),
21 Data = query,
22 DataKey = query,
23 Text = query
24 };
25 var interaction = Tracker.Current.Session.Interaction;
26 if (interaction != null)
27 {
28 interaction.CurrentPage.Register(pageEventData);
29 }
30 }
31 }
32 }
33}

Now after triggering the code to be called a few times, your internal search report should start to be populated like this.

Sitecore Search and Indexing: Creating a simple search

With Sitecore 7, Sitecore introduced the new Sitecore.ContentSearch API which out of the box can query Lucene and SOLR based index's.

Searching the index's has been made easier through Linq to Sitecore that allows you to construct a query using Linq, the same as you would use with things like Entity Framework or Linq to SQL.

To do a query you first need a search context. Here I'm getting the a context on one of the default index's:

1using (var context = ContentSearchManager.GetIndex("sitecore_web_index").CreateSearchContext()) { ... }

Next a simple query would look like this. Here I'm doing a where parameter on the "body" field:

1using (var context = ContentSearchManager.GetIndex("sitecore_web_index").CreateSearchContext())
2{
3 IQueryable<SearchResultItem> searchQuery = context.GetQueryable<SearchResultItem>().Where(item => item["body"] == “Sitecore”)
4}

But what if you want to add a search to your site. Typically you would want to filter on more than one field, what the user enters may be a collection of words rather than an exact phrase and you'd also like some intelligent ordering to your results.

Here I am splitting the search term on spaces and then building a predicate that has an "or" between each of its conditions. For each condition rather than doing a .Contains on a specific field, I'm doing it on a content field that will contain data for all fields in the item.

1using (var context = ContentSearchManager.GetIndex("sitecore_web_index").CreateSearchContext())
2{
3 IQueryable<SearchResultItem> query = context.GetQueryable<SearchResultItem>();
4
5 var predicate = PredicateBuilder.True<SearchResultItem>();
6
7 foreach (string term in criteria.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
8 {
9 predicate = predicate.Or(p => p.Content.Contains(searchTerm.Trim()));
10 }
11
12 SearchResults<SearchResultItem> searchResults = query.Where(predicate).GetResults();
13
14 results = (from hit in searchResults.Hits
15 select hit.Document).ToList();
16}

The intelligent ordering of results you will get for free based on what was search for.

Introduction to Real Time Search

The Real Time Search this article relates to is a dll that comes included in the Web Client Software Factory one of Microsoft's patterns and practices downloads. What it can do and what this example will demonstrate is adding the ability cause a post back that will refresh an update panel as the users types into a search box. This can give a great effect and make a web app really user friendly in scenarios like searching photo's, emails or any general list.

First late me state though that this is in no way the most optimal way program. In most scenarios you could built a better result using something like JSON as their will be a lot less data transfer, which is also a general reason to avoid update panels. However this is also very quick and very easy to implement, not to mention if you've ever used update panels before you already know 90% of what's needed. This can also only work in situations where you have a good search that is going to return the result quickly, rather than leaving the user sitting there trying to work out why nothing's happening and where the search button has gone.

Implementing Real Time Search

For this example I will be filtering a table from a DB based on the search criteria and refreshing a Grid View with the results. I will be using a normal C# Web Site project with the Adventure Works sample DB from Microsoft. DB connection will be done using LINQ to EntityFramework, however there is no need to use this it is just my preference for the example.

First off set up you're website and db and make sure both are working with no problems. As results will be displayed in an Update Panel, get one of these along with a script manager added to your page, so it looks something like this:

1<form id="form1" runat="server">
2<div>
3<asp:ScriptManager ID="ScriptManager1" runat="server">
4</asp:ScriptManager>
5<asp:UpdatePanel ID="UpdatePanel1" runat="server">
6<ContentTemplate></ContentTemplate>
7</asp:UpdatePanel>
8</div>
9</form>

Next let's get the search working in the normal method, so I'm going to create my Entity Model and add a textbox and gridview to show the results. Again you can connect and show your results however you want. You should now have something like this in your aspx file:

1<form id="form1" runat="server">
2<div>
3<asp:ScriptManager ID="ScriptManager1" runat="server">
4</asp:ScriptManager>
5
6<asp:TextBox ID="txtSearch" runat="server" OnTextChanged="TextChanged" Text=""></asp:TextBox>
7
8<asp:UpdatePanel ID="UpdatePanel1" runat="server">
9<ContentTemplate>
10
11<asp:LinqDataSource ID="LinqDataSource1" runat="server" onselecting="LinqDataSource1_Selecting">
12</asp:LinqDataSource>
13
14<asp:GridView ID="GridView1" runat="server" AutoGenerateColumns="False" DataSourceID="LinqDataSource1">
15<Columns>
16<asp:BoundField DataField="ProductID" HeaderText="ProductID"
17ReadOnly="True" SortExpression="ProductID" />
18<asp:BoundField DataField="Name" HeaderText="Name"
19ReadOnly="True" SortExpression="Name" />
20<asp:BoundField DataField="ProductNumber" HeaderText="ProductNumber" ReadOnly="True" SortExpression="ProductNumber" />
21<asp:BoundField DataField="Color" HeaderText="Color"
22ReadOnly="True" SortExpression="Color" />
23<asp:BoundField DataField="SafetyStockLevel" HeaderText="SafetyStockLevel"
24ReadOnly="True" SortExpression="SafetyStockLevel" />
25<asp:BoundField DataField="ReorderPoint" HeaderText="ReorderPoint"
26ReadOnly="True" SortExpression="ReorderPoint" />
27</Columns>
28</asp:GridView>
29</ContentTemplate>
30
31</asp:UpdatePanel>
32</div>
33</form>

And this in your code behind:

1protected void LinqDataSource1_Selecting(object sender, LinqDataSourceSelectEventArgs e)
2{
3Model.AdventureWorks2008Entities AdventureWorkds = new Model.AdventureWorks2008Entities();
4var products = from p in AdventureWorkds.Product
5where p.Name.Contains(txtSearch.Text)
6select p;
7e.Result = products;
8}

Next its time to add the Real Time Search. Make sure you have the dll downloaded (you may need to compile the download to get it) and add it to your bin folder. Add the following to your page in the relevant places:

1<%@ Register Assembly="RealTimeSearch" Namespace="RealTimeSearch" TagPrefix="cc1" %>
2
3<cc1:RealTimeSearchMonitor ID="RealTimeSearchMonitor1" runat="server" AssociatedUpdatePanelID="UpdatePanel1">
4<ControlsToMonitor>
5<cc1:ControlMonitorParameter EventName="TextChanged" TargetID="txtSearch" />
6</ControlsToMonitor>
7</cc1:RealTimeSearchMonitor>

Important things to notice here are the AssociatedUpdatePanelId which tells the control what it has to refresh and the controls to monitor section which sets what the control to watch is called and the event name that will be fired when the post back is created. You will now need to corresponding control in your code behind like so:

1protected void TextChanged(object sender, EventArgs e)
2{
3GridView1.DataBind();
4}

Run the site and  you should find that the grid view now updates as you type (all be it with a slight delay).

To improve you can add other controls like update progress to show something happening which will help with any delays in displaying the results.

How to search inside Stored Procedures?

A common problem faced by many developers when it comes to databases and SQL Server is how to search the text inside a stored procedure.

In many systems particularly older Classic ASP solutions, functional code has been moved from the actual application to stored procedures inside the database. This is usually because it will either run faster here, or because it was just a lot easier to perform the necessary task using TSQL. Following this though comes the problem of how you can search what's in all those stored procedures, especially when you're getting into the hundreds of them. Let's say there was a Users table that contained fields for an address, but that now needs to be moved to a table of its own, you would need to search all the code for things accessing those table columns but SQL Server Management Studio certainly doesn't provide any search box's with the power to do this.

Never fear though syscomments is here. Syscomments contains the original text from amongst other things all the Stored Procedures in the DB s all you need to do is search that for what you're looking for:

1Select OBJECT_NAME(id), [text]
2From syscomments
3Where [text] like '%Create%'

The function OBJECT_NAME will also help you by converting the id number in the result set into the actual name of the stored procedure (or view, function etc). If you wanted to limit the result to just stored procedures you can add the following line to the where clause:

1AND OBJECTPROPERTY(id, 'IsProcedure') = 1