Sitecore: Extend profile matching over multiple visits

In Sitecore, to gain a better understanding of our visitors interests we have the ability to define Profile Keys and Cards to tag our content with. As our visitors navigate through the site, this data is used by Sitecore to build a profile of the visitor. A pre-defined Pattern Card that most resembles the visitors profile is then assigned to the visitor which can be used as the basis of selecting the content that should be displayed on a page for that visitor.

However what this doesn’t do is carry the visitors profile over multiple sessions. Each time a visitor comes back to the site within a new session, the visitors profile key values are reset back to zero.

So what’s Sitecore actually doing?

Before working out how to carry this information between visits, lets look at how a profile is actually being created.

If we look in the Profiles table within the Analytics database we can see the profile data that’s been recorded for a visitors visit.

Sitecore profile data

The Pattern Values column contains the current profile key scores for each key the visitor has a score for. e.g.

background=40;scope=50

If the visitor was to visit a page which has scope score of 5 and background score of 10 these values would be added to the visitors current key scores. e.g.

background=50;scope=55

When a pattern card is assigned, the card with the closest shape of keys is chosen. e.g. If the visitor has a high value for background and low value for scope they will be assigned a pattern card with similar proportional key values.

How do we extend this over multiple visits?

So the easiest way to carry the visit information from one visit to the next would be to simply copy the profile key values from the last session to the next. The code for this would look similar to the following:

var currentVisitIndex = Tracker.CurrentVisit.VisitorVisitIndex;
 
if (currentVisitIndex <= 1 || !Tracker.CurrentVisit.Profiles.Any())
{
    return;
}

var previousProfiles = Tracker.Visitor.GetVisit(currentVisitIndex - 1, VisitLoadOptions.All).Profiles;

foreach (var profile in previousProfiles)
{
    var currentProfile = Tracker.CurrentVisit.GetOrCreateProfile(profile.ProfileName);

    currentProfile.BeginEdit();

    foreach (var ProfileKey in profile.Values)
    {
        currentProfile.Score(ProfileKey.Key, ProfileKey.Value);
    }
    currentProfile.UpdatePattern();

    currentProfile.EndEdit();
}

Now the visitors profile is how it was when they left and crucially we can use this data to personalize the sites homepage for the visitor.

So why shouldn’t we do this?

As simple as this is, it comes with one potentially massive downside. If we go back to the way the profile values are built up they key values are essentially just being accumulated. Each time the visitor visits an item with a background score of 10, the visitors background profile key score in increased by 10.

Our visitors are humans going through different stages of there life, with constantly changing jobs and interests. There’s nothing to ever reduce a profile keys score other than the fact everything is normally zeroed on each visit. By copying the data from the last visit on the start of the next this would never happen and the profile key’s will continue to count up forever. The key value obtained from an item viewed 2 months ago would counted as just as important as the value from another key viewed on an item today.

So if you were running a travel site and a visitor looked at summer holidays for 3 weeks they will have a profile highly weighted towards summer holidays. If they then started to look at winter holidays we wouldn’t want them to have to look at winter holidays for 3 weeks just to have an even likeness of summer and winter.

Overcoming this issue isn’t so simple and largely depends on your business needs. If your visitors interests could change each week then you need something that will degrade the old visit data values quickly. Whereas if your trying to differentiate between people that are in a 2 week vs 6 month buying pattern, you need to retain that data a lot longer.

Some things we can do when copying the data from the visitors previous profile though could include:

  • Halving the profile scores, or reducing by a different factor. This would reduce the importance of values obtained on previous visits. So if a visitor received a 10 on the first visit, it would be worth 5 on the second, 2.5 on the third etc
  • Look at the date of the last visit. Is it to old to be relevant still or can we use the age to determine what factor we should reduce the scores by
  • Look at a combination of multiple last visits to establish what the recent scores were

All these ideas though need to be used on conjunction with what your trying to profile. If it’s age then you know people are going to get older. If it’s an interest that will change frequently then you know the data needs to degrade quickly, but if it’s male/female then that doesn’t necesserally need to degrade at all.

Pragmatically add request tracking for an item in Sitecore

Sitecore’s Engagement Analytic’s engine automatically tracks all page requests. When you assign profile cards to items this also triggers data about a persons interests to start being built up against their visit record, which can then be used to create a personalized site experience.

However what if you want items that are not pages to also contribute to a users profile?

This scenario came about with a recent site we took over that the client wanted to add personalisation too. The site (for unknown reasons) had been built with one product page which looked at a querystring parameter to determine the product information to be displayed (the querystring format was hidden from end users through the use of a URL rewrite). Product data was being stored as Sitecore items allowing them to have profile cards assigned, but as the item was never visited the profile cards values were never applied to the users profile.

After a bit of searching through the Sitecore.Analytics.dll I stumbled across the TrackingFieldProcessor class. This class contains a process function that takes an item parameter an in turn triggers all the functionality for processing campaigns, profiles and events related to the item.

To use it your code would look like this:

Sitecore.Data.Database db = Sitecore.Configuration.Factory.GetDatabase("web");
(new TrackingFieldProcessor()).Process(db.GetItem(new ID("395BDEF7-16CB-4C94-B9B6-A6EAC148401F")));

This will cause the profile key values to be updated but in the visitor history it still looks like the visitor was only looking at the one page. To change those values we can do this:

VisitorDataSet.PagesRow rawUrl = Tracker.CurrentVisit.GetOrCreateCurrentPage();
rawUrl.Url = "new url value";
rawUrl.UrlText = "new url value";

Note: I was unable to find any documentation on these functions, or any official way of doing this. Use at your own risk!

Muddlings with Sitecore Index’s

On a recent Sitecore project we needed to have a faceted product search. For this we opted to use the Lucene based Search and Indexing functionality that comes with Sitecore. Overall this proved very easy to use, but here are the details of a couple of issues we encountered.

Items Duplicating on Publish and never Deleting

The first issue we found was that although the index was being built and we could read it. If we ever deleted an item, it wasn’t removed from the index. Equally if you ever saved and published an item, it would become duplicated in the index.

Doing a manual rebuild of the index would clear the items back down to what we would normally expect. But for some reason changes were clearly just being added to the index rather than updating it.

Looking through Sitecores “Sitecore Search and Indexing Guide” (http://sdn.sitecore.net/upload/sitecore7/75/sitecore_search_and_indexing_guide_sc75-a4.pdf) wasn’t much help, as far as we could tell the index was set up correctly. Comparing to the default index that comes with a blank install of Sitecore didn’t help much either.

In the end it transpired in your index’s field name definition you must include a field for “_uniqueid”. We had assumed that some sort of config like this must be needed, however Sitecore’s indexing guide doesn’t actually mention it anywhere.

<fieldNames hint="raw:AddFieldByFieldName">
  <field fieldName="_uniqueid"            storageType="YES" indexType="TOKENIZED"    vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
    <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
  </field>

Index not updating in the Content Delivery environment

At this point our index’s were working fine in our own test environments. Upon deploying to our clients servers however the index’s were never updating on either there Content Management or Content Delivery servers. Doing a manual rebuild of the index would cause the Content Management servers index to update, but the Content Delivery servers index constantly remained empty.

Clearly there was some sort of difference between there’s and our environments. We didn’t have any direct access to there servers, so we checked out the config settings that are view-able by going to /sitecore/admin/showconfig.aspx

Sure enough there was a difference.

This Sitecore install was running 7.2 and prior to that the latest version they had used was 6.6. They had set up a custom config setting which removed the hooks section from config. This was because some of the default hooks Sitecore has interfered with performance monitoring tools they use on there sites. Unfortunetly it was also removing a hook that loads Sitecore.ContentSearch. Without this index’s are never updated on events.

<hooks>
<hook type="Sitecore.ContentSearch.Hooks.Initializer, Sitecore.ContentSearch" patch:source="Sitecore.ContentSearch.config"/>
</hooks>