Blog
Reducing the size of Sitecore Master DB

Reducing the size of Sitecore Master DB

When it comes to Sitecore development, an issue every developer has likely experienced is the size of the databases in relation to the size of their hard disk. In an ideal world production DBs would contain production data, test environments would have data ideally suited for testing and developer workstation would have the minimum required to develop the solution.

In reality though, I think most people have the experience of everything being a copy from production. This ends up being the case due to a clients requirements that UAT needs to looks the same as prod, QA needs prod content to replicate a bug and although critical Sitecore items may have been serialized, not having any content in your local makes it a bit hard to dev.

When a website is new and doesn't have much content this isn't a huge issue, but when you inherit one 5 years old with a 25gb DB, things start to become a problem. Not only is the hard disc space required an issue, but just setting a new developer up takes hours from download times.

After getting a new laptop and being faced with the challenge of needing to copy multiple DBs (and not even having enough space to back them up on my existing machine), I decided to finally do something about reducing the size of them,

Removing old item versions

Having a history of item versions is a great feature for content editors, however as a dev I don't really need them on my local. They also hold references to media items that aren't used any more.

This Sitecore Powershell script from Webbson does exactly that and even lets your specify how many version you want to keep. I went for 1.

1<#
2 This script will remove old versions of items in all languages so that the items only contains a selected number of versions.
3#>
4
5$item = Get-Item -Path "master:\content"
6$dialogProps = @{
7 Parameters = @(
8 @{ Name = "item"; Title="Branch to analyse"; Root="/sitecore/content/Home"},
9 @{ Name = "count"; Value=10; Title="Max number of versions"; Editor="number"},
10 @{ Name = "remove"; Value=$False; Title="Do you wish to remove items?"; Editor="check"}
11 )
12 Title = "Limit item version count"
13 Description = "Sitecore recommends keeping 10 or fewer versions on any item, but policy may dictate this to be a higher number."
14 Width = 500
15 Height = 280
16 OkButtonName = "Proceed"
17 CancelButtonName = "Abort"
18}
19
20$result = Read-Variable @dialogProps
21
22if($result -ne "ok") {
23 Close-Window
24 Exit
25}
26
27$items = @()
28Get-Item -Path master: -ID $item.ID -Language * | ForEach-Object { $items += @($_) + @(($_.Axes.GetDescendants())) | Where-Object { $_.Versions.Count -gt $count } | Initialize-Item }
29$ritems = @()
30$items | ForEach-Object {
31 $webVersion = Get-Item -Path web: -ID $_.ID -Language $_.Language
32 if ($webVersion) {
33 $minVersion = $webVersion.Version.Number - $count
34 $ritems += Get-Item -Path master: -ID $_.ID -Language $_.Language -Version * | Where-Object { $_.Version.Number -le $minVersion }
35 }
36}
37if ($remove) {
38 $toRemove = $ritems.Count
39 $ritems | ForEach-Object {
40 $_ | Remove-ItemVersion
41 }
42 Show-Alert "Removed $toRemove versions"
43} else {
44 $reportProps = @{
45 Property = @(
46 "DisplayName",
47 @{Name="Version"; Expression={$_.Version}},
48 @{Name="Path"; Expression={$_.ItemPath}},
49 @{Name="Language"; Expression={$_.Language}}
50 )
51 Title = "Versions proposed to remove"
52 InfoTitle = "Sitecore recommendation: Limit the number of versions of any item to the fewest possible."
53 InfoDescription = "The report shows all items that have more than <b>$count versions</b>."
54 }
55 $ritems | Show-ListView @reportProps
56}
57
58Close-Window

Removing unpublished items

After a few failed attempts at reducing the size of the DB's, I discovered that the content editors working on the website had seemingly never deleted any content. Instead that had just marked things as unpublishable. I can see the logic in this, but after 5+ years, they have a lot of unpublished content filling up the content tree.

Well if it's unpublished I probably don't need it on my local machine so lets delete it.

Here's a script I wrote, the first part removes items set to never publish. After running just this part I found lots of the content items had the item set to publish but the version set to hidden. The second part loops through versions on items and removes any version set to hidden. If the item has no version left then it is removed too.

1
2
3// Remove items set to never publish
4Get-ChildItem -Path "master:\sitecore\content" -Recurse |
5Where-Object { $_."__Never publish" -eq "1" } | Remove-Item -Recurse -Force -Confirm:$false
6
7// Loop through items and remove versions set to never publish, then remove the item if it has no versions left
8foreach($item in Get-ChildItem -Path "master:\sitecore\content" -Recurse) {
9
10 $item
11 foreach ($version in $item.Versions.GetVersions($true))
12 {
13 $version
14 $version."__Hide version"
15 if ($version."__Hide version" -eq "1" ) {
16 $version| Remove-ItemVersion -Recurse -Confirm:$false
17 }
18 }
19
20 if ($item.Versions.GetVersions($true).count -eq 0) {
21 $item | Remove-Item -Recurse -Force -Confirm:$false
22 }
23}

Remove dead links

In the next step I rebuild the links DB, but I kept ending up with entries in the link table with target items that didn't exist. After a bit of searching I came across an admin page for clearing up dead links.

/sitecore/admin/RemoveBrokenLinks.aspx

With this page you can remove all those pesky dead links caused by editors deleting items and leaving the links behind.

Remove broken links screen in Sitecore

Clean Up DBs

With our content reduced the DB's now need a clean up before we do anything else.

In the admin section there is a DB Cleanup page that will let you perform various tasks on the DB. I suggest doing all of these.

/sitecore/admin/DBCleanup.aspx

Sitecore Database cleanup page

Once this is done navigate to the control panel and rebuild the link database. From the control panel you can also run the clean up database script, but it won't give you as much feedback.

/sitecore/client/Applications/ControlPanel.aspx?sc_bw=1

Sitecore rebuild link database screen

Remove unused media

With all the old versions/items/dead links removed and the DB's cleaned up its time to get rid of any unused media items. It's likely if you have a large DB that most of the space will be taken up by the media items. Fortunately with another PowerShell script we can removed any media that isn't linked too.

This PowerShell script is an adapted version of one by Michael West. You can find his version here https://michaellwest.blogspot.com/2014/10/sitecore-powershell-extensions-tip.html?_sm_au_=iVVB4RsPtStf5MfN

The main difference is I've been more aggressive and removed the checks on item owner and age.

1filter Skip-MissingReference {
2 $linkDb = [Sitecore.Globals]::LinkDatabase
3 if($linkDb.GetReferrerCount($_) -eq 0) {
4 $_
5 }
6}
7
8$items = Get-ChildItem -Path "master:\sitecore\media library" -Recurse |
9 Where-Object { $_.TemplateID -ne [Sitecore.TemplateIDs]::MediaFolder } |
10 Skip-MissingReference
11
12if($items) {
13 Write-Log "Removing $($items.Length) item(s)."
14 $items | Remove-Item
15}

Shrink databases

Lastly through SQL management studio, shrink your database and files to recover unused space you hopefully now have from removing all of that media.

In my case I was able to turn a 20+ GB database into a 7 GB database by doing these steps.

If your local is running with both web and master DB, you should now do a full publish. The item versions which are published should stay exactly the same as we only removed items set to not publish. You should however get a reduction in your web DB from the media items being removed.

Which JavaScript for loop should I use?

Which JavaScript for loop should I use?

When given a choice of 4 seemingly different ways to do the same thing, I've always been the person that wants to know what the difference is? To often we can blindly write code the way the person before us did and not understand the reason to do it a certain way, but what if the scenario has changed. When multiple options exist there's usually a reason why. It can be a language style improvement to make things easier to write, but more often than not there's a fundamental difference in what it actually does.

So, for loops, four seems like a lot, gotta be some differences right?

for

This is likely to be the first type of for loop you encounter when learning to program. It's a very manual type of loop where you create a initial expression, a condition expression to keep running the loop and an increment expression.

1for (let i = 0; i < 5; i++) {
2 console.log(i);
3}
4
5// Result:
6// 0
7// 1
8// 2
9// 3
10// 4

The big downside of the for loop is it doesn't actually loop through a collection. e.g. If you had an array and wanted to loop through each of it's objects you could do this as follows:

1const arr = ['a', 'b', 'c'];
2
3for (let i = 0; i < arr.length; ++i) {
4 console.log(arr[i]);
5}

In effect the loop is incrementing a number and you then use that number to access the position of the array. The result is you have more code to write that isn't serving any real purpose.

However as this is a much more manual loop process you have far more control over the loop. e.g.

  • You could increment by a different number.
  • You could go backwards.
  • The condition might not be linked to the length of an array.
  • You might not even be looping through an array.

Pros

  • Potentially faster in performance.
  • Break statement can be used to come out of the loop.
  • It works with the await keyword.
  • Not just for looping through arrays.

Cons

  • Not as readable as others.
  • You have more code to write.

for...in

You probably don't want this one. Here's an example:

1const arr = ['a', 'b', 'c'];
2
3for (const i in arr) {
4 console.log(arr[i]);
5}
6
7// Result:
8// "a"
9// "b"
10// "c"

Although we're now specifically looping through a collection like an array and don't need to do all of that i < array.length stuff, what we're given isn't the object in the array but a property name to access it on the object.

Here's another example:

1const arr = ['a', 'b', 'c'];
2arr.foo = 'John'
3
4for (const i in arr) {
5 console.log(arr[i]);
6}
7
8// Result:
9// "a"
10// "b"
11// "c"
12// "John"

That looks a bit weird! The for...in loop doesn't just loop through an array, it loops through an objects enumerable properties, which could be the items in an array, but it could be other stuff too.

For this reason, unless your doing something quite niche its probably not the loop you are looking for and is likely to cause you an issue which your objects has one more property than you were expecting.

Pros

  • Can loop through all the enumerable properties on an object.

Cons

  • Looping through all the properties of an object isn't likely what you want.

forEach

As the name implies, a for each loop will iterate through each element in an array. Here's an example:

1const arr = ['a', 'b', 'c'];
2
3arr.forEach((i) => {
4 console.log(i);
5})
6
7// Result:
8// "a"
9// "b"
10// "c"

The forEach function takes an anonymous function as a parameter and will then call that function for each object in the array, passing in the object from the array.

This offers a big improvement over the initial for loop as we have a lot less code to write and we're actually given the object rather than a variable to go an find it.

However there are some downsides due to it effectively being a function call passing an anonymous function to execute.

Firstly you can't stop the forEach part way through. With the others you can use the break keyword to stop the iteration.

Secondly there's added confusion around the scope of what this is. Assuming your function is in a class, unless you use the arrow syntax (as in the example) you won't be able to access any of the other functions in your class as passing a regular function would change the scope.

1// This works
2class foo
3{
4 myFunction() {
5 const arr = ['a', 'b', 'c'];
6
7 arr.forEach((i) => {
8 this.outputValue(i)
9 })
10 }
11
12 outputValue(x) {
13 console.log(x);
14 }
15}
16
17// This Doesn't
18class foo
19{
20 myFunction() {
21 const arr = ['a', 'b', 'c'];
22
23 arr.forEach(function(i) {
24 this.outputValue(i)
25 })
26 }
27
28 outputValue(x) {
29 console.log(x);
30 }
31}

You also can't use an await within the foreach loop. e.g.

1async function run() {
2 const arr = ['a', 'b', 'c'];
3 arr.forEach(el => {
4 // SyntaxError
5 await new Promise(resolve => setTimeout(resolve, 1000));
6 console.log(el);
7 });
8}

Pros

  • Much shorter to write and easier to read.
  • Iterator provides the object from the array.

Cons

  • Easy to mess up context of this.
  • No way to break the loop.
  • Async/Await doesn't work on functions within the loop.
  • Can only loop through arrays.
  • It's performance is slower (nice article here comparing the performance differences), not to such an extent it would matter on a small object, but large objects could start to take a hit.

for...of

When looping through an array, the for...of loop combines the best of all the other loops. It has the same simple syntax of the for...in loop but instead of enumerating over enumerable properties, it loops through the iterable objects such as the items in an array.

Like the forEach loop you are provided with the object in the array rather than it's index or property name.

1const arr = ['a', 'b', 'c'];
2
3for (const i of arr) {
4 console.log(i);
5}
6
7// Result:
8// "a"
9// "b"
10// "c"

You can also break the loop and access properties outside of the loop.

1const stopValue = 'b'
2const arr = ['a', 'b', 'c'];
3
4for (const i of arr) {
5 console.log(i);
6 if (i == stopValue)
7 break;
8}
9
10// Result:
11// "a"
12// "b"

Pros

  • Lowest amount of extra code to write.
  • Iterator provides the object.
  • Doesn't use an anonymous function so scope doesn't change.
  • Loop can be stopped as needed.
  • Async still works.
  • Works with more than just arrays.

Cons

  • Have to be looping over the iterable items in an object.
  • Can't easily access the index value.

Conclusion

If you want to iterate through something like an array, for...of would be my recommendation to use. A for...in loop is likely to be less relevant but has its place, and for all other loops which don't relate to the objects in an array a good old for loop still has it's place.

Creating a SSL for your local Sitecore Site

Creating a SSL for your local Sitecore Site

When you install Sitecore, the installer will quite handily setup some SSL certificates for you. That way when you test locally your site will correctly run under https. However for various reasons you may not have used the installer to setup your local instance, in which case you need to do it yourself.

Creating a self signed SSL certificate however is one of those things that's always been far harder than is should. Previously I've written about how you can do it using mkcert, but recently I've found another way.

Creating a new self-signed SSL certificate with PowerShell

First open a PowerShell window or if you use the new Windows Terminal then one of those will do. Make sure you run it as an administrator or you'll run into permissions errors.

Then run the following command filling in your site URL and a friendly name.

1New-SelfSignedCertificate -CertStoreLocation Cert:\LocalMachine\My -DnsName "my-site.local" -FriendlyName "MySiteName" -NotAfter (Get-Date).AddYears(5)

This will create a cert with the expiry date set in 5 years time.

Next, it needs moving to the Trusted Root Certification Authorities store.

Click Start and type:

1certlm.msc

Find the certificate you just created in your personal certificates, and copy it into the trusted root certificates.

IIS HTTPS Site Bindings

To instruct your site to use the new certificate you need to update the IIS bindings for your site.

Go to IIS > selects site > Bindings... and then choose the https bindings.

You should have something like this.

In the SSL certificate drop down, pick your newly created certificate.

At this point you should have an SSL certificate which browsers actually like!

Other Sitecore Settings

Despite the newly working certificate you may still run into issues with Sitecore which could either be due to SSL thumbprints in config files or config settings for URLs not including https. e.g. In {IDENTITYSERVER_ROOT}/Config/production/Sitecore.IdentityServer.Host.xml there is a setting for AllowedCorsOrigins which will need the https version of the url.

SonarQube for .NET Framework with GitHub Actions

SonarQube for .NET Framework with GitHub Actions

If you haven't tried SonarQube or SonarCloud out then I suggest you do. The cloud version is quite straightforward to setup and from my experience the stuff it finds can be quite insightful. Like all these tools, at times you'll disagree with what they say, but there's always the option to change the rules.

What I particularly like with SonarQube is the examples you get with each bug that clearly explains why there's an issue and what you need to do in order to fix it.

What I didn't like however were the instructions for setting a project using .NET Framework. There are instructions labelled .NET, but this heavily assumes your using .NET Core, which while that might be our general preference, products like Sitecore could force your hand back to .NET Framework and all those legacy projects didn't just go away.

How to setup SonarQube using GitHub Actions for .NET Framework

The GitHub setup instructions (https://docs.sonarqube.org/latest/analysis/github-integration/) will give you the following code to create your GitHub Action with. This is also the same code you will get if you follow the wizard in SonarQube.

1name: Build
2on:
3 push:
4 branches:
5 - master # or the name of your main branch
6 pull_request:
7 types: [opened, synchronize, reopened]
8jobs:
9 build:
10 name: Build
11 runs-on: windows-latest
12 steps:
13 - name: Set up JDK 11
14 uses: actions/setup-java@v1
15 with:
16 java-version: 1.11
17 - uses: actions/checkout@v2
18 with:
19 fetch-depth: 0 # Shallow clones should be disabled for a better relevancy of analysis
20 - name: Cache SonarQube packages
21 uses: actions/cache@v1
22 with:
23 path: ~\sonar\cache
24 key: ${{ runner.os }}-sonar
25 restore-keys: ${{ runner.os }}-sonar
26 - name: Cache SonarQube scanner
27 id: cache-sonar-scanner
28 uses: actions/cache@v1
29 with:
30 path: .\.sonar\scanner
31 key: ${{ runner.os }}-sonar-scanner
32 restore-keys: ${{ runner.os }}-sonar-scanner
33 - name: Install SonarQube scanner
34 if: steps.cache-sonar-scanner.outputs.cache-hit != 'true'
35 shell: powershell
36 run: |
37 New-Item -Path .\.sonar\scanner -ItemType Directory
38 dotnet tool update dotnet-sonarscanner --tool-path .\.sonar\scanner
39 - name: Build and analyze
40 env:
41 GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # Needed to get PR information, if any
42 shell: powershell
43 run: |
44 .\.sonar\scanner\dotnet-sonarscanner begin /k:"example" /d:sonar.login="${{ secrets.SONAR_TOKEN }}" /d:sonar.host.url="${{ secrets.SONAR_HOST_URL }}"
45 dotnet build
46 .\.sonar\scanner\dotnet-sonarscanner end /d:sonar.login="${{ secrets.SONAR_TOKEN }}"

There's two aspects to notice with this. Firstly the Build and analyze section is running a command dotnet build which is fine if your running .Net Core, but for .Net Framework it isn't going to work.

Secondly it's highly likely your solution will use NuGet packages and there's no step in here to restore them.

To setup and restore NuGet packages add in the following steps before the Build and analyze step. Be sure to put your solution filename in the restore command.

1 - name: Setup Nuget
2 uses: Nuget/setup-nuget@v1.0.5
3
4 - name: Restore nuget packages
5 run: nuget restore MySolution.sln

To do a build that will compile your .Net Framework code you will need to use MsBuild rather than dotnet. However if you just swap them over you'll get an invalid command error. First you need to add msbuild to PATH. Change your build steps as follows.

1 - name: Add msbuild to PATH
2 uses: microsoft/setup-msbuild@v1.0.2
3
4 - name: Build and analyze
5 env:
6 GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # Needed to get PR information, if any
7 shell: powershell
8 run: |
9 .\.sonar\scanner\dotnet-sonarscanner begin /k:"example" /d:sonar.login="${{ secrets.SONAR_TOKEN }}" /d:sonar.host.url="${{ secrets.SONAR_HOST_URL }}"
10 dotnet build
11 .\.sonar\scanner\dotnet-sonarscanner end /d:sonar.login="${{ secrets.SONAR_TOKEN }}"

With that now in place you can now compile some .Net Framework code and have the results sent back to your SonarQube instance.