Tag: Bulk Insert

Bulk Inserting data using Entity Framework

Using tools like Entity Framework makes life far easier for a developer. Recently I blogged about how using them is what makes .Net Core one of the best platforms for prototype development, but the benefits don’t end there. They are also great from a security perspective by cutting a lot of risk around SQL injection attacks just by avoiding easy mistakes when using regular ADO.NET.

However, they do have some downsides, a main one being that they are particularly slow when it comes to doing bulk inserts to a database.

For example, assume you have an application which regularly receives an xml import file consisting of 200,000 records and each one either needs to be an insert of an update into the db. You’ll quickly learn that looping through the whole lot and then calling save changes results in a process taking an extremely long time to run, it may even just timeout. You then decide to get rid of that long save changes line by breaking it up into blocks of 500 and call save changes for each of those. That may save the timeout issue, but it still results in a process potentially lasting around an hour.

The problem is that this is a scenario Entity Framework or EF.Core just weren’t designed to handle. As a solution you could opt to drop Entity Framework altogether and revert to something like a native SQL Bulk Insert command, but what if you need to be doing some processing in code on the record before the import happens? What if you have one of those classic not quite always valid XML, XML files which would cause SQLs Bulk Insert to fail.

The solution is to use an open source extension called EFCore.BulkExtensions.

EFCore.BulkExtensions

EFCore.BulkExtensions is a set of extension methods to Entity Framework that provide the functionality to do bulk inserts. You can add it to your project using NuGet and you’ll find the project on GitHub here https://github.com/borisdj/EFCore.BulkExtensions

Usage is also very simple to do. Let’s assume you have some existing tradition EF code that loops through a collection and for each one create a new db item and adds it to the db:

public void DoImport(List<foo> collection)
{
  foreach (var item in collection)
  {
      Jobs job = new Jobs();
      
      job.DateAdded = DateTime.UtcNow;
      job.Name = item.Name;
      job.Location = item.Location;

      await dbContext.Jobs.AddAsync(job);
  }

  await dbContext.SaveChangesAsync();
}

Rather than adding each item to the Entity Framework db context, you instead create a list of those objects and then call a BulkInsert function with them on your db context.

public void DoImport(List<foo> collection)
{
  List<Jobs> importJobs
  foreach (var item in collection)
  {
      Jobs job = new Jobs();
      
      job.DateAdded = DateTime.UtcNow;
      job.Name = item.Name;
      job.Location = item.Location;
      
      importJobs.Add(job);
  }

  await dbContext.BulkInsert(importJobs);
}

If also works for updates, but rather than creating a new item, first retrieve it form the db and then at the end call BulkInsertOrUpdate with the list.

await dbContext.BulkInsertOrUpdate(importJobs);

From my experience doing this took my import process that would run for over an hour down to something which would complete in a few minutes.

Invert list selection with Sitecore PowerShell

I recently needed to run a script on a block of Sitecore content in invert the selection of a checklist and multilist. As I couldn't find any example of how to do this at the time, I thought I'd share what I wrote.

#script to update tier
Get-ChildItem -r -Path "master:\content\Home" -Language * | ForEach-Object {
  if ($_.PSobject.Properties.name -match "Tier") {
      [String[]]$tiers = $_.Tier -split "\|"

      $_.Editing.BeginEdit()
      $newtiers = Get-ChildItem 'master:\content\Lookups\Tiers\' | Where-Object { $tiers -notcontains $_.Id }
      $_.Tier = $newtiers.Id -join "|"
      $_.Editing.EndEdit()
  }
}

Get-ChildItem -r -Path "master:\content\Home" -Language * | ForEach-Object {

This line is getting the child items from the home node in the master db. The -r specified that it should be recursive and the -Language * specifies to include all languages. The results are then piped to a for each loop.

if ($_.PSobject.Properties.name -match "Tier") {

The field I needed to update was called Tier, as this was included in multiple templates I checked that the object included a field called Tier, rather than checking the template types.

[String[]]$tiers = $_.Tier -split "\|"

List fields in Sitecore get stored as a pipe separated list of the item Id's for the selected items. In order to do a comparison in Powershell I needed to turn the string back into an array using the split command. Notice the backslash that is needed before the pipe.

$_.Editing.BeginEdit()

To edit an item you need to begin an edit

$newtiers = Get-ChildItem 'master:\content\Components\Lookups\Tiers\' | Where-Object { $tiers -notcontains $_.Id }

This is where we get the new list of tiers to set as the selected ones for the item. The Get-ChildItem command is retrieving the original options that could be selected and the where-object statement is then excluding the ones that are in the $tiers array we just made.

$_.Tier = $newtiers.Id -join "|"

To save the new list we need to convert the results of the query into a pipe separated list using a join.

$_.Editing.EndEdit() } }

End the editing to save and close the the if and loop statements.