Bulk Inserting data using Entity Framework

Using tools like Entity Framework makes life far easier for a developer. Recently I blogged about how using them is what makes .Net Core one of the best platforms for prototype development, but the benefits don’t end there. They are also great from a security perspective by cutting a lot of risk around SQL injection attacks just by avoiding easy mistakes when using regular ADO.NET.

However, they do have some downsides, a main one being that they are particularly slow when it comes to doing bulk inserts to a database.

For example, assume you have an application which regularly receives an xml import file consisting of 200,000 records and each one either needs to be an insert of an update into the db. You’ll quickly learn that looping through the whole lot and then calling save changes results in a process taking an extremely long time to run, it may even just timeout. You then decide to get rid of that long save changes line by breaking it up into blocks of 500 and call save changes for each of those. That may save the timeout issue, but it still results in a process potentially lasting around an hour.

The problem is that this is a scenario Entity Framework or EF.Core just weren’t designed to handle. As a solution you could opt to drop Entity Framework altogether and revert to something like a native SQL Bulk Insert command, but what if you need to be doing some processing in code on the record before the import happens? What if you have one of those classic not quite always valid XML, XML files which would cause SQLs Bulk Insert to fail.

The solution is to use an open source extension called EFCore.BulkExtensions.

EFCore.BulkExtensions

EFCore.BulkExtensions is a set of extension methods to Entity Framework that provide the functionality to do bulk inserts. You can add it to your project using NuGet and you’ll find the project on GitHub here https://github.com/borisdj/EFCore.BulkExtensions

Usage is also very simple to do. Let’s assume you have some existing tradition EF code that loops through a collection and for each one create a new db item and adds it to the db:

public void DoImport(List<foo> collection)
{
    foreach (var item in collection)
    {
        Jobs job = new Jobs();
        
        job.DateAdded = DateTime.UtcNow;
        job.Name = item.Name;
        job.Location = item.Location;

        await dbContext.Jobs.AddAsync(job);
    }

    await dbContext.SaveChangesAsync();
}

Rather than adding each item to the Entity Framework db context, you instead create a list of those objects and then call a BulkInsert function with them on your db context.

public void DoImport(List<foo> collection)
{
    List<Jobs> importJobs
    foreach (var item in collection)
    {
        Jobs job = new Jobs();
        
        job.DateAdded = DateTime.UtcNow;
        job.Name = item.Name;
        job.Location = item.Location;
        
        importJobs.Add(job);
    }

    await dbContext.BulkInsert(importJobs);
}

If also works for updates, but rather than creating a new item, first retrieve it form the db and then at the end call BulkInsertOrUpdate with the list.

await dbContext.BulkInsertOrUpdate(importJobs);

From my experience doing this took my import process that would run for over an hour down to something which would complete in a few minutes.

Top reasons .Net is amazing for prototyping

The other day someone told me .net was slow to get something built, and to be fair to the person I can see why he would have thought that. Most of his interaction with .net projects have been on complex with large enterprise applications that often have integrated multiple other applications.

However, I would maintain that .net is a framework that is actually really fast to develop on and that what he had perceived as being slow was the complexities of a project rather than the actual coding time.

In fact, I would say it’s so fast to get something built in it, it actually becomes the fastest thing to develop a prototype in. Here are my top reasons why.

ASP.NET Core

ASP.NET Core inherits all the best bits from the ASP.NET Framework that came before it, giving the framework almost 20 years of refinement since its original release in 2002. The days of WebForms in the original ASP.NET are now long behind us and we now have the choice of building web applications with either MVC or Razor Pages.

Razor provides the perfect combination of a view language providing helpers to render your html without limiting what can be done on the front end. How you code your HTML is still completely up to you, the helpers just provide features like binding that make it even faster to do.

Another great thing about .Net core over that which came before it, is its platform independent. Rather than being confined to just Windows, you can run it on Mac or Linux too.

Great starter templates

What kicks of a great prototype project is starting with great templates, and ASP.NET Core has a bunch.

As already mentioned, you can build a Web Application with either Razor Pages or MVC, but the templates also provide you with the base for building an API, Angular App, React.js or React.js and Redux or you can simply create an Empty application.

My preference is to go for MVC as it’s what I’m the most familiar with and a key thing for rapidly building a prototype is that you develop rapidly. The idea is to focus on creating something new and unique, not learn how to develop in a new framework.

The MVC Web Application gives you a base site to work with a few pages already set up, bootstrap and jQuery are already included so you start right at the point of working on your logic rather than spending time doing setup.

SQL Server and EF Core

I’ve always been a bit of a database guy. I’m not sure why, but its a topic that has always just made sense to me, and despite being a topic that can get quite complex, the reasons behind it being complex always feel logical.

When it comes to building a prototype though there are two aspects which make storage with .net core super simple.

Firstly, Entity Framework Core (EF Core) means you don’t really need to know any SQL or spend any time writing it. It helps if you do, but at a minimum all you need to be doing is creating a model in your code, adding a few lines for a DB context that tells EF.Core that a model is a table and how they relate. Then turn on migrations and you’re done. When you run the application, the DB gets created for you and each time you change your model, you just add another migration and the next time the application runs the application the DB schema gets updated.

Querying your DB is done by writing LINQ queries against your entity framework model, allowing you to have next to no understanding of SQL and how the DB works. Everything is just done by magic for you.

The second part is SQL Server and its different versions. Often when you think of SQL Server you think of the big DB engine with its many many components that you install on a server or your local machine but there’s two others which are even more important. LocalDB and Azure SQL.

LocalDB is an option that can be installed as part of Visual Studio. No separate application is needed or services to be running in the background. It is essentially the minimum required to start the DB engine to be used for development purposes. In practical terms this means when you start your application EF.Core can run off a LocalDB which didn’t require any setup, but as far as your application in concerned it is no different than working with any other version of SQL Server.

Azure SQL as the name implies is SQL Server on Azure. The only thing I really need to say about this is that you can swap LocalDB and Azure SQL with ease. They may be different but as far as your prototype is concerned, they are the same.

Scaffolding

The only thing quicker than writing code is having someone else do it for you. So we’ve created our application from a template, added a model which generated our database and now its time to create some pages. Well the good news is it’s still not time to write much code because Visual Studio can scaffold out pages based on our model for us!

Adding a controller to our project in Visual Studio gives us some options on what should be generated for us, one of which is MVC Controller with views, using Entity Framework. What that means is given a model it will create controllers and views for listing items, creating them, editing them and deleting them. No coding by us required!

Now it’s unlikely that is exactly what you’re after, but it’s generally a good starting place and deleting code you don’t need is far quicker then writing it.

Azure

Lastly there is Azure. You may have spotted a theme to all these points and that is they all remove any effort required to do any setup and instead focus on building your own logic, and this point is no different.

I remember a time, when if I wanted a server to put an application on, I had to request it, and then wait a while. What I would get back would either be a server that already had resources running on it, or a blank server that would need applications installed on it. e.g. SQL Server or .Net Framework. IIS wouldn’t have been configured and it would be a number of hours before my application would be running.

With Azure you don’t even really need to leave Visual Studio. From the publish dialog box you can create a new App Service and DB, and then publish. All the connection strings are sorted out for you. There are service plans which cost next to nothing, a domain is configured for you and at the end of the publish the website opens and is working. The whole process has taken less than 10 minutes.

Logging with .net core and Application Insights

When you start builing serverless applications like Azure functions or Azure web jobs, one of the first things you will need to contend with is logging.

Traditionally logging was simply achieved by appending rows to a text file that got stored on the same server your application was running on. Tools like log4net made this simpler by bringing some structure to the proces and providing functionality like automatic time stamps, log levels and the ability to configure what logs should actually get written out.

With a serverless application though, writing to the hard disk is a big no no. You have no guarantee how long that server will exist for and when your application moves, that data will be lost. In a world where you might want to scale up and down, having logs split between servers is also hard to retrieve when an error does happen.

.net core

The first bit of good news is that .net core supports a logging API. Here I am configuring it in a web job to output logs to the console and to Application insights. This is part of the host builder config in the program.cs file.

//3. LOGGING function execution :
//3 a) for LOCAL - Set up console logging for high-throughput production scenarios.
hostBuilder.ConfigureLogging((context, b) =>
{
    b.AddConsole();

    // If the key exists in appsettings.json, use it to enable Application Insights rather than console logging.
    //3 b) PROD - When the project runs in Azure, you can't monitor function execution by viewing console output (3 a). 
    // -The recommended monitoring solution is Application Insights. For more information, see Monitor Azure Functions.
    // -Make sure you have an App Service app and an Application Insights instance to work with.
    //-Configure the App Service app to use the Application Insights instance and the storage account that you created earlier.
    //-Set up the project for logging to Application Insights.
    string instrumentationKey = context.Configuration["APPINSIGHTS_INSTRUMENTATIONKEY"];
    if (!string.IsNullOrEmpty(instrumentationKey))
    {
        b.AddApplicationInsights(o => o.InstrumentationKey = instrumentationKey);
    }
});

Microsofts documentation on logging in .NET Core and ASP.NET can be found here.

Creating a log in your code is then as simple as using dependency injection on your classes to inject an instance of ILogger and then using it’s functions to create a log.

public class MyClass
{
    private readonly ILogger logger;

    public MyClass(ILogger logger)
    {
        this.logger = logger;
    }

    public void Foo()
    {
        try
        {
            // Logging information
            logger.LogInformation("Foo called");
        }
        catch (Exception ex)
        {
            // Logging an error
            logger.LogError(ex, "Something went wrong");
        }
    }
}

Application Insights

When your application is running in Azure, Application Insights is where all your logs will live.

What’s great about App Insights is it will give you the ability to write queries against all your logs.

So for instance if I wanted to find all the logs for an import function starting, I can write a filter for messages containing “Import function started”.

One of my favourite where commands is ago(30m). It will output the time for a given timespan in the past. This is great when your running the same query frequently and are only interested in the last x amount of time, as you can simply write where timestamp > ago(30m) for the last 30 minutes of logs rather than trying to remember for date time format your string should be in

Queries can also be saved or pinned to a dashboard if they are a query you need to run frequently.

For all regular logs your application makes you need to query the traces. What can be confusing with this though is the errors.

With the code above I had a try catch block and in the catch block I called logger.LogError(ex, “Something went wrong”); so in my logs I expect to see the message and as I passed an exception I also expect to see an exception. But if we look at this example from application insights you will see an error in the traces log but no strack trace or anything else from the exception.

LogError will write to both the traces and exceptions log. If you want to view the exceptions you have to look at the exceptions log, not the traces.

This is just the start of the functionality that Application Insights provides, but if your just starting out, hopefully this is a good indication not only of how easy it is to add logging to your application, but also how much added value App Insights can offer over just having text files.

Two ways to import an XML file with .Net Core or .Net Framework

It’s always the simple stuff you forget how you do. For years I’ve mainly been working with JSON files, so when faced with that task of reading an XML file my brain went “I can do that” followed by “actually how did I used to do that?”.

So here’s two different methods. They work on .Net Core and theoretically .Net Framework (my project is .Net Core and haven’t checked that they do actually work on framework).

My examples are using an XML in the following format:

<?xml version="1.0" encoding="utf-8"?>
<jobs>
    <job>
        <company>Construction Co</company>
        <sector>Construction</sector>
        <salary>£50,000 - £60,000</salary>
        <active>true</active>
        <title>Recruitment Consultant - Construction Management</title>
    </job>
    <job>
        <company>Medical Co</company>
        <sector>Healthcare</sector>
        <salary>£60,000 - £70,000</salary>
        <active>false</active>
        <title>Junior Doctor</title>
    </job>
</jobs>

Method 1: Reading an XML file as a dynamic object

The first method is to load the XML file into a dynamic object. This is cheating slightly by first using Json Convert to convert the XML document into a JSON string and then deserializing that into a dynamic object.

using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Dynamic;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace XMLExportExample
{
    class Program
    {
        static void Main(string[] args)
        {
            string jobsxml = "<?xml version=\"1.0\" encoding=\"utf-8\"?><jobs>    <job><company>Construction Co</company><sector>Construction</sector><salary>£50,000 - £60,000</salary><active>true</active><title>Recruitment Consultant - Construction Management</title></job><job><company>Medical Co</company><sector>Healthcare</sector><salary>£60,000 - £70,000</salary><active>false</active><title>Junior Doctor</title></job></jobs>";

            byte[] byteArray = Encoding.UTF8.GetBytes(jobsxml);
            MemoryStream stream = new MemoryStream(byteArray);
            XDocument xdoc = XDocument.Load(stream);

            string jsonText = JsonConvert.SerializeXNode(xdoc);
            dynamic dyn = JsonConvert.DeserializeObject<ExpandoObject>(jsonText);

            foreach (dynamic job in dyn.jobs.job)
            {
                string company;
                if (IsPropertyExist(job, "company"))
                    company = job.company;

                string sector;
                if (IsPropertyExist(job, "sector"))
                    company = job.sector;

                string salary;
                if (IsPropertyExist(job, "salary"))
                    company = job.salary;

                string active;
                if (IsPropertyExist(job, "active"))
                    company = job.active;

                string title;
                if (IsPropertyExist(job, "title"))
                    company = job.title;

                // A property that doesn't exist
                string foop;
                if (IsPropertyExist(job, "foop"))
                    foop = job.foop;
            }

            Console.ReadLine();
        }

        public static bool IsPropertyExist(dynamic settings, string name)
        {
            if (settings is ExpandoObject)
                return ((IDictionary<string, object>)settings).ContainsKey(name);

            return settings.GetType().GetProperty(name) != null;
        }
    }
}

A foreach loop then goes through each of the jobs, and a helper function IsPropertyExist checks for the existence of a value before trying to read it.

Method 2: Deserializing with XmlSerializer

My second approach is to turn the XML file into classes and then deserialize the XML file into it.

This approch requires more code, but most of it can be auto generated by visual studio for us, and we end up with strongly typed objects.

Creating the XML classes from XML

To create the classes for the XML structure:

1. Create a new class file and remove the class that gets created. i.e. Your just left with this

using System;
using System.Collections.Generic;
using System.Text;

namespace XMLExportExample
{

}

2. Copy the content of the XML file to your clipboard
3. Select the position in the file you want to the classes to go and then go to Edit > Paste Special > Paste XML as Classes

If your using my XML you will now have a class file that looks like this:

using System;
using System.Collections.Generic;
using System.Text;

namespace XMLExportExample
{

    // NOTE: Generated code may require at least .NET Framework 4.5 or .NET Core/Standard 2.0.
    /// <remarks/>
    [System.SerializableAttribute()]
    [System.ComponentModel.DesignerCategoryAttribute("code")]
    [System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true)]
    [System.Xml.Serialization.XmlRootAttribute(Namespace = "", IsNullable = false)]
    public partial class jobs
    {

        private jobsJob[] jobField;

        /// <remarks/>
        [System.Xml.Serialization.XmlElementAttribute("job")]
        public jobsJob[] job
        {
            get
            {
                return this.jobField;
            }
            set
            {
                this.jobField = value;
            }
        }
    }

    /// <remarks/>
    [System.SerializableAttribute()]
    [System.ComponentModel.DesignerCategoryAttribute("code")]
    [System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true)]
    public partial class jobsJob
    {

        private string companyField;

        private string sectorField;

        private string salaryField;

        private bool activeField;

        private string titleField;

        /// <remarks/>
        public string company
        {
            get
            {
                return this.companyField;
            }
            set
            {
                this.companyField = value;
            }
        }

        /// <remarks/>
        public string sector
        {
            get
            {
                return this.sectorField;
            }
            set
            {
                this.sectorField = value;
            }
        }

        /// <remarks/>
        public string salary
        {
            get
            {
                return this.salaryField;
            }
            set
            {
                this.salaryField = value;
            }
        }

        /// <remarks/>
        public bool active
        {
            get
            {
                return this.activeField;
            }
            set
            {
                this.activeField = value;
            }
        }

        /// <remarks/>
        public string title
        {
            get
            {
                return this.titleField;
            }
            set
            {
                this.titleField = value;
            }
        }
    }

}

Notice that the active field was even picked up as being a bool.

Doing the Deserialization

To do the deserialization, first create an instance of XmlSerializer for the type of the object we want to deserialize too. In my case this is jobs.

            var s = new System.Xml.Serialization.XmlSerializer(typeof(jobs));

Then call Deserialize passing in a XML Reader. I’m creating and XML reader on the stream I used in the dynamic example.

            jobs o = (jobs)s.Deserialize(XmlReader.Create(stream));

The complete file now looks like this:

using System;
using System.IO;
using System.Text;
using System.Xml;

namespace XMLExportExample
{
    class Program
    {
        static void Main(string[] args)
        {
            string jobsxml = "<?xml version=\"1.0\" encoding=\"utf-8\"?><jobs>    <job><company>Construction Co</company><sector>Construction</sector><salary>£50,000 - £60,000</salary><active>true</active><title>Recruitment Consultant - Construction Management</title></job><job><company>Medical Co</company><sector>Healthcare</sector><salary>£60,000 - £70,000</salary><active>false</active><title>Junior Doctor</title></job></jobs>";

            byte[] byteArray = Encoding.UTF8.GetBytes(jobsxml);
            MemoryStream stream = new MemoryStream(byteArray);

            var s = new System.Xml.Serialization.XmlSerializer(typeof(jobs));
            jobs o = (jobs)s.Deserialize(XmlReader.Create(stream));

            Console.ReadLine();
        }
    }
}

And thats it. Any missing nodes in your XML will just be blank rather than causing an error.