There is no place like home, and also: there is no place like production. Production is the only location where your code is really put to the test by your users. This insight has helped developers to accept that we have to go to production fast and often. But we also want to do so in a responsible way. Common tactics for safely going to production often are for example blue-green deployments and the use of feature flags.

While helping to limit the impact of mistakes, the downside of these two approaches is that they both run only one version of your code. Either by using a different binary or by using a different code path, the one or the other implementation is executed. But what if we could execute both the old and the new code in parallel and compare the results? In this post I will show you how to run two algorithms in parallel and compare the results, using a library called scientist.net, which is available on NuGet.

The use case

To experiment a bit met experimentation I have created a straight forward site for calculation the nth position in the Fibonacci sequence as you can see below.

I started out with the most simple implementation that seemed to meet the requirements of the implementation. In other words, the implementation behind this calculation is done using recursion, like shown below.

public class RecursiveFibonacciCalculator : IRecursiveFibonacciCalculator
{
  public Task<int> CalculateAsync(int position)
  {
    return Task.FromResult(InnerCalculate(position));
  }

  private int InnerCalculate(int position)
  {
    if (position < 1)
    {
      return 0;
    }
    if (position == 1)
    {
      return 1;
    }

    return InnerCalculate(position - 1) + InnerCalculate(position - 2);
  }
}

While this is a very clean and to-the-point implementation code wise, the performance is -to say the least- up for improvement. So I took a few more minutes and I came up with an implementation which I believe is also correct, but also much more performant, namely the following:

public class LinearFibonacciCalculator : ILinearFibonacciCalculator
{
  public Task<int> CalculateAsync(int position)
  {
    var results = new int[position + 1];

    results[0] = 0;
    results[1] = 1;

    for (var i = 2; i <= position; i++)
    {
      results[i] = results[i - 1] + results[i - 2];
    }

    return Task.FromResult(results[position]);
  }
}

However, just swapping implementations and releasing didn’t feel good to me, so I thought: how about running an experiment on this?

The experiment

With the experiment that I am going to run, I want to achieve the following goals:

  • On every user request, run both my recursive and my linear implementation.
  • To the user I want to return the recursive implementation, which I know to be correct
  • While doing this, I want to record:
    • If the linear implementation yields the same results
    • Any performance differences.

To do this, I installed the Scientist NuGet package and added the code shown below as my new implementation.

public async Task OnPost()
{
  HasResult = true;
  Position = FibonacciInput.Position;

  Result = await Scientist.ScienceAsync<int>("fibonacci-implementation", experiment =>
  {
    experiment.Use(async () => await _recursiveFibonacciCalculator.CalculateAsync(Position));
    experiment.Try(async () => await _linearFibonacciCalculator.CalculateAsync(Position));

    experiment.AddContext("Position", Position);
  });
}

This code calls into the Scientist functionality and sets up an experiment with the name fibonacci-implementation that should return an int. The configuration of the experiment is done using the calls to Use(..) and Try(..)

Use(..): The use method is called with a lambda that should execute the known, trusted implementation of the code that you are experimenting with.

Try(..): The try method is called with another lambda, but now the one with the new, not yet verified implementation of the algorithm.

Both the Use(..) and Try(..) method accept a sync-lambda as well, but I do use async/await here on purpose. The advantage of using an async-lambda is that both implementations will be executed in parallel, thus reducing the duration of the web server call. The final thing I do with the call to the AddContext(..) method is adding a named value to the experiment. I can use this context property-bag later on to interpret the results and to pin down scenarios in which the new implementation is lacking.

Processing the runs

While the code above takes care of running two implementations of the Fibonacci sequence in parallel, I am not working with the results yet – so let’s change that. Results can be redirected to an implementation of the IResultPublisher interface that ships with Scientist by assigning an instance to the static ResultPublisher property as I do in my StartUp class.

var resultsPublisher = new ExperimentResultPublisher();
Scientist.ResultPublisher = resultsPublisher;

In the ExperimentResultPublisher class, I have added the code below.

public class ExperimentResultPublisher : IResultPublisher, IExperimentResultsGetter
{
  public LastResults LastResults { get; private set; }
  public OverallResults OverallResults { get; } = new OverallResults();

  public Task Publish<T, TClean>(Result<T, TClean> result)
  {
    if (result.ExperimentName == "fibonacci-implementation")
    {
      LastResults = new LastResults(! result.Mismatched, result.Control.Duration, result.Candidates.Single().Duration);
      OverallResults.Accumulate(LastResults);
    }

    return Task.CompletedTask;
  }
}

For all instances of the fibonacci-implementation experiment, I am saving the results of the last observation. Observation is the Scientist term for a single execution of the experiment. Once I have moved the results over to my own class LastResults, I am adding these last results to another class of my own OverallResults that calculated the minimum, maximum and average for each algorithm.

The LastResults and OverallResults properties are part of the IExperimentResultsGetter interface, which I later on inject in my Razor page.

Results

All of the above, combined with some HTML, will then gave me the following results after a number of experiments.

I hope you can see here how you can take this forward and extract more meaningful information from this type of experimentation. One thing that I would highly recommend is finding all observations where the existing and new implementation do not match and logging a critical error from your application.

Just imagine how you can have your users iterate and verify all your test cases, without them ever knowing.

I am a fan of private agents when working with Azure Pipelines, compared to the hosted Azure Pipelines Agents that are also available. In my experience the hosted agents can have a delay before they start and sometimes work slow in comparison with dedicated resources. Of course this is just an opinion. For this reason, I have been running my private agents on a virtual machine for years. Unfortunately, this solution is not perfect and has a big downside as well: no isolation between jobs.

No isolation between different jobs that are executed on the same agent means that all files left over from an earlier job, are available to any job currently running. It also means that it is possible to pollute NuGet caches, change files on the system, etc etc. And of course, running virtual machines in general is cumbersome due to patching, updates and all the operational risks and responsibilities.

So it was time to adopt one of the biggest revolutions in IT: containers. In this blog I will share how I created a Docker container image that hosts an Azure Pipelines agent and how to run a number of those images within Azure Container Instances. As a starting point I have taken the approach that Microsoft has laid down in its documentation, but I have made a number of tweaks and made my solution more complete. Doing so, I wanted to achieve the following:

  1. Have my container instances execute only a single job and then terminate and restart, ensuring nothing from a running job will become available to the next job.
  2. Have a flexible number of containers running that I can change frequently and with a single push on a button.
  3. Have a source code repository for my container image build, including a pipelines.yml that allows me to build and publish new container images on a weekly schedule.
  4. Have a pipeline that I can use to roll-out 1..n agents to Azure Container Instances – depending on the amount I need at that time.
  5. Do not have the PAT token that is needed for (de)registering agents available to anyone using the agent image.
  6. Automatically handle the registration of agents, as soon as a new container instance becomes available.

Besides the step-by-step instructions below I have also uploaded the complete working solution to GitHub at https://github.com/henrybeen/ContainerizedBuildAgents.

Let’s go!

Creating the container image

My journey started with reading the Microsoft documentation on creating a Windows container image with a Pipelines Agent. You can find this documentation at https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/docker?view=azure-devops. I found two downsides to this approach for my use case. First, the PAT token that is used for (de)registering the agent during the complete lifetime of the container. This means that everyone executing jobs on that agent, can pick up the PAT token and abuse it. Secondly, the agent is downloaded and unpacked at runtime. This means that the actual agent is slow to spin up.

To work around these downsides I started with splitting the Microsoft provided script into two parts, starting with a file called Build.ps1 as shown below.

param (
    [Parameter(Mandatory=$true)]
    [string]
    $AZDO_URL,

    [Parameter(Mandatory=$true)]
    [string]
    $AZDO_TOKEN
)

if (-not $(Test-Path "agent.zip" -PathType Leaf))
{
    Write-Host "1. Determining matching Azure Pipelines agent..." -ForegroundColor Cyan

    $base64AuthInfo = [Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes(":$AZDO_TOKEN"))
    $package = Invoke-RestMethod -Headers @{Authorization=("Basic $base64AuthInfo")} "$AZDO_URL/_apis/distributedtask/packages/agent?platform=win-x64&`$top=1"
    $packageUrl = $package[0].Value.downloadUrl
    Write-Host "Package URL: $packageUrl"

    Write-Host "2. Downloading the Azure Pipelines agent..." -ForegroundColor Cyan

    $wc = New-Object System.Net.WebClient
    $wc.DownloadFile($packageUrl, "$(Get-Location)\agent.zip")
}
else {
    Write-Host "1-2. Skipping downloading the agent, found an agent.zip right here" -ForegroundColor Cyan
}

Write-Host "3. Unzipping the Azure Pipelines agent" -ForegroundColor Cyan
Expand-Archive -Path "agent.zip" -DestinationPath "agent"

Write-Host "4. Building the image" -ForegroundColor Cyan
docker build -t docker-windows-agent:latest .

Write-Host "5. Cleaning up" -ForegroundColor Cyan
Remove-Item "agent" -Recurse

The script downloads the latest agent, if agent.zip is not existing yet, and unzips that file. Once the agent is in place, the docker image is build using the call to docker build. Once completed, the unpacked agent.zip folder is removed – just to keep things tidy. This clean-up also allows for rerunning the script from the same directory multiple times without warnings or errors. A fast feedback loop is also the reason I test for the existence of agent.zip before downloading it.

The next file to create is the Dockerfile. The changes here are minimal. As you can see I also copy over the agent binaries, so I do not have to download these anymore when the container runs.

FROM mcr.microsoft.com/windows/servercore:ltsc2019

WORKDIR c:/azdo/work

WORKDIR c:/azdo/agent
COPY agent .

WORKDIR c:/azdo
COPY Start-Up.ps1 .

CMD powershell c:/azdo/Start-Up.ps1

First we ensure that the directory c:\azdo\work exists by setting it as the working directory. Next we move to the directory that will contain the agent files and copy those over. Finally, we move one directory up and copy the Start-Up script over. To run the container image, a call into that script is made. So, let’s explore Start-Up.ps1 next.

if (-not (Test-Path Env:AZDO_URL)) {
  Write-Error "error: missing AZDO_URL environment variable"
  exit 1
}

if (-not (Test-Path Env:AZDO_TOKEN)) {
  Write-Error "error: missing AZDO_TOKEN environment variable"
  exit 1
}

if (-not (Test-Path Env:AZDO_POOL)) {
  Write-Error "error: missing AZDO_POOL environment variable"
  exit 1
}

if (-not (Test-Path Env:AZDO_AGENT_NAME)) {
  Write-Error "error: missing AZDO_AGENT_NAMEenvironment variable"
  exit 1
}

$Env:VSO_AGENT_IGNORE = "AZDO_TOKEN"

Set-Location c:\azdo\agent

Write-Host "1. Configuring Azure Pipelines agent..." -ForegroundColor Cyan

.\config.cmd --unattended `
  --agent "${Env:AZDO_AGENT_NAME}" `
  --url "${Env:AZDO_URL}" `
  --auth PAT `
  --token "${Env:AZDO_TOKEN}" `
  --pool "${Env:AZDO_POOL}" `
  --work "c:\azdo\work" `
  --replace

Remove-Item Env:AZDO_TOKEN

Write-Host "2. Running Azure Pipelines agent..." -ForegroundColor Cyan

.\run.cmd --once

The script first checks for the existence of four, mandatory, environment variables. I will provide these later on from Azure Container Instances, where we are going to run the image. Since the PAT token is still in this environment variable, we are setting another variable that will ensure that this environment variable is not listed or exposed by the agent, even though we will unset it later on. From here on, the configuration of the agent is started with a series of command-line arguments that allow for a head-less registration of the agent with the correct agent pool.

It is good to know that I am using a non-random name on purpose. This allows me to re-use the same agent name -per Azure Container Instances instance- which prevents an ever increasing list of offline agents in my pool. This is also the reason I have to add the –replace argument. Omitting this would cause the registration to fail.

Finally, we run the agent, specifying the –once argument. This argument will make that the agent will pick-up only a single job and terminate once that job is complete. Since this is the final command in the PowerShell script, this will also terminate the script. And since this is the only CMD specified in the Dockerfile, this will also terminate the container.

This ensures that my container image will execute only one job ever, ensuring that the side-effects of any job cannot propagate to the next job.

Once these files exist, it is time to execute the following from the PowerShell command-line.

PS C:\src\docker-windows-agent> .\Build.ps1 -AZDO_URL https://dev.azure.com/**sorry**-AZDO_TOKEN **sorry**
1-2. Skipping downloading the agent, found an agent.zip right here
3. Unzipping the Azure Pipelines agent
4. Building the image
Sending build context to Docker daemon  542.8MB
Step 1/7 : FROM mcr.microsoft.com/windows/servercore:ltsc2019
 ---> 782a75e44953
Step 2/7 : WORKDIR c:/azdo/work
 ---> Using cache
 ---> 24bddc56cd65
Step 3/7 : WORKDIR c:/azdo/agent
 ---> Using cache
 ---> 03357f1f229b
Step 4/7 : COPY agent .
 ---> Using cache
 ---> 110cdaa0a167
Step 5/7 : WORKDIR c:/azdo
 ---> Using cache
 ---> 8d86d801c615
Step 6/7 : COPY Start-Up.ps1 .
 ---> Using cache
 ---> 96244870a14c
Step 7/7 : CMD powershell Start-Up.ps1
 ---> Running in f60c3a726def
Removing intermediate container f60c3a726def
 ---> bba29f908219
Successfully built bba29f908219
Successfully tagged docker-windows-agent:latest
5. Cleaning up

and…

docker run -e AZDO_URL=https://dev.azure.com/**sorry** -e AZDO_POOL=SelfWindows -e AZDO_TOKEN=**sorry** -e AZDO_AGENT_NAME=Agent007 -t docker-windows-agent:latest

1. Configuring Azure Pipelines agent...

  ___                      ______ _            _ _
 / _ \                     | ___ (_)          | (_)
/ /_\ \_____   _ _ __ ___  | |_/ /_ _ __   ___| |_ _ __   ___  ___
| | | |/ /| |_| | | |  __/ | |   | | |_) |  __/ | | | | |  __/\__ \
\_| |_/___|\__,_|_|  \___| \_|   |_| .__/ \___|_|_|_| |_|\___||___/
                                   | |
        agent v2.163.1             |_|          (commit 0a6d874)


>> Connect:

Connecting to server ...

>> Register Agent:

Scanning for tool capabilities.
Connecting to the server.
Successfully added the agent
Testing agent connection.
2019-12-29 09:45:39Z: Settings Saved.
2. Running Azure Pipelines agent...
Scanning for tool capabilities.
Connecting to the server.
2019-12-29 09:45:47Z: Listening for Jobs
2019-12-29 09:45:50Z: Running job: Agent job 1
2019-12-29 09:47:19Z: Job Agent job 1 completed with result: Success

This shows that the container image can be build and that running it allows me to execute jobs on the agent. Let’s move on to creating the infrastructure within Azure that is needed for running the images.

Creating the Azure container registry

As I usually do, I created a quick ARM templates for provisioning the Azure Container Registry. Below is the resource part.

{
    "name": "[variables('acrName')]",
    "type": "Microsoft.ContainerRegistry/registries",
    "apiVersion": "2017-10-01",
    "location": "[resourceGroup().location]",
    "sku": {
        "name": "Basic"
    },
    "properties": {
        "adminUserEnabled": true
    }
}

Creating a container registry is fairly straightforward, specifying a sku and enabling the creation of an administrative account is enough. Once this is done, we roll this template out to a resource group called Tools, tag the image against the new registry, authenticate and push the image.

PS C:\src\docker-windows-agent> New-AzureRmResourceGroupDeployment -ResourceGroupName Tools -TemplateFile .\acr.json -TemplateParameterFile .\acr.henry.json

DeploymentName          : acr
ResourceGroupName       : Tools
ProvisioningState       : Succeeded
Timestamp               : 29/12/2019 20:07:25
Mode                    : Incremental
TemplateLink            :
Parameters              :
                          Name             Type                       Value
                          ===============  =========================  ==========
                          discriminator    String                     hb

Outputs                 :
DeploymentDebugLogLevel :

PS C:\src\docker-windows-agent> az acr login --name hbazdoagentacr
Unable to get AAD authorization tokens with message: An error occurred: CONNECTIVITY_REFRESH_TOKEN_ERROR
Access to registry 'hbazdoagentacr.azurecr.io' was denied. Response code: 401. Please try running 'az login' again to refresh permissions.
Unable to get admin user credentials with message: The resource with name 'hbazdoagentacr' and type 'Microsoft.ContainerRegistry/registries' could not be found in subscription 'DevTest (a314c0b2-589c-4c47-a565-f34f64be939b)'.
Username: hbazdoagentacr
Password:
Login Succeeded

PS C:\src\docker-windows-agent> docker tag docker-windows-agent hbazdoagentacr.azurecr.io/docker-windows-agent

PS C:\src\docker-windows-agent> docker push hbazdoagentacr.azurecr.io/docker-windows-agent
The push refers to repository [hbazdoagentacr.azurecr.io/docker-windows-agent]
cb08be0defff: Pushed
1ed2626efafb: Pushed
f6c3f9abc3b8: Pushed
770769339f15: Pushed
8c014918cbca: Pushed
c57badbbe459: Pushed
963f095be1ff: Skipped foreign layer
c4d02418787d: Skipped foreign layer
latest: digest: sha256:7972be969ce98ee8c841acb31b5fbee423e1cd15787a90ada082b24942240da6 size: 2355

Now that all the scripts and templates are proven, let’s automate this through a pipeline. The following YAML is enough to build and publish the container.

trigger:
  branches:
    include: 
      - master
  paths:
    include: 
      - agent
    
pool:
  name: Azure Pipelines
  vmImage: windows-2019

workspace:
    clean: all

steps:
  - task: PowerShell@2
    displayName: 'PowerShell Script'
    inputs:
      targetType: filePath
      workingDirectory: agent
      filePath: ./agent/Build.ps1
      arguments: -AZDO_URL https://dev.azure.com/azurespecialist -AZDO_TOKEN $(AZDO_TOKEN)

  - task: Docker@2
    displayName: Build and push image to container registry
    inputs:
      command: buildAndPush
      containerRegistry: BuildAgentsAcr
      repository: docker-windows-agent
      Dockerfile: agent/Dockerfile
      tags: latest
      buildContext: agent

To make the above work, two more changes are needed:

  1. The Build.ps1 needs to be changed a bit, just remove steps 4 and 5
  2. A service connection to the ACR has to be made with the name BuildAgentsAcr

With the container image available in an ACR and a repeatable process to get updates out, it is time to create the Azure Container Instances that are going to run the image.

Creating the Azure container instance(s)

Again, I am using an ARM template for creating the Azure Container Instances.

{
    "copy": {
        "name": "acrCopy", 
        "count": "[parameters('numberOfAgents')]" 
    },
    "name": "[concat(variables('aciName'), copyIndex())]",
    "type": "Microsoft.ContainerInstance/containerGroups",
    "apiVersion": "2018-10-01",
    "location": "[resourceGroup().location]",
    "properties": {
        "imageRegistryCredentials": [{
            "server": "[reference(resourceId('Microsoft.ContainerRegistry/registries',variables('acrName'))).loginServer]",
            "username": "[listCredentials(resourceId('Microsoft.ContainerRegistry/registries',variables('acrName')),'2017-03-01').username]",
            "password": "[listCredentials(resourceId('Microsoft.ContainerRegistry/registries',variables('acrName')),'2017-03-01').passwords[0].value]"
        }], 
        "osType": "Windows", 
        "restartPolicy": "Always",
        "containers": [{
            "name": "[concat('agent-', copyIndex())]",
            "properties": {
                "image": "[parameters('containerImageName')]",
                "environmentVariables": [
                    {
                        "name": "AZDO_URL",
                        "value": "[parameters('azdoUrl')]"
                    },
                    {
                        "name": "AZDO_TOKEN",
                        "secureValue": "[parameters('azdoToken')]"
                    },
                    {
                        "name": "AZDO_POOL",
                        "value": "[parameters('azdoPool')]"
                    },
                    {
                        "name": "AZDO_AGENT_NAME",
                        "value": "[concat('agent-', copyIndex())]"
                    }
                ],
                "resources": {
                    "requests": {
                        "cpu": "[parameters('numberOfCpuCores')]",
                        "memoryInGB": "[parameters('numberOfMemoryGigabytes')]"
                    }
                }
            }
        }]
    }
}

At the start of the resource definition, we specify that we want multiple copies of this resource, not just one. The actual number of resources is specified using a template parameter. This construct allows us to specify the number of agents that we want to run in parallel in ACI instances every time this template is redeployed. If we combine this with a complete deployment mode, the result will be that agents in excess of that number get removed automatically as well. When providing the name of the ACI instance, we are concatenating the name with the outcome of the function copyIndex(). This function will return a integer, specifying in which iteration of the copy loop the template currently is. This way unique names for all the resources are being generated.

As with most modern resources, the container instance takes the default resource properties in the root object and contains another object called properties that contains all the resource specific configuration. Here we first have to specify the imageRegistryCredentials. These are the details needed to connect ACI to the ACR, for pulling the images. I am using ARM template syntax to automatically fetch and insert the values, without ever looking at them.

The next interesting property is the RestartPolicy. The value of Always instructs ACI to automatically restart my image whenever it completes running, no matter if that is from an error or successfully. This way, whenever the agent has run a single job and the container completes, it gets restarted within seconds.

In the second properties object, a reference to the container image, the environment variables and the container resources are specified. The values here should be self-explanatory, and of course the complete ARM template with all parameters and other plumbing is available on the GitHub repository.

With the ARM template done and ready to go, let’s create another pipeline – now using the following YAML.

trigger:
  branches:
    include: 
      - master
  paths:
    include: 
      - aci
    
pool:
  name: Azure Pipelines
  vmImage: windows-2019

workspace:
    clean: all

steps:
  - task: AzureResourceGroupDeployment@2
    displayName: 'Create or update ACI instances'
    inputs:
      azureSubscription: 'RG-BuildAgents'
      resourceGroupName: 'BuildAgents'
      location: 'West Europe'
      csmFile: 'aci/aci.json'
      csmParametersFile: 'aci/aci.henry.json'
      deploymentMode: Complete
      overrideParameters: '-azdoToken "$(AZDO_TOKEN)"'

Nothing new in here really. This pipeline just executes the ARM template against another Azure resource group. Running this pipeline succeeds and a few minutes later, I have the following ACI instances.

And that means I have following agents in my Pool.

All-in-all: It works! Here you can see that when scaling the number of agents up or down, the names of the agents stay the same and that I will get a number of offline agents that is at most the number of agents to which I have scaled up at a certain point in time.

And with this, I have created a flexible, easy-to-scale, setup for running Azure Pipelines jobs in isolation, in containers!

Downsides

Of course, this solution is not perfect and a number of downsides remain. I have found the following:

  1. More work
  2. No caching

More work. The largest drawback that I have found with this approach is that it is more work to set up. Creating a virtual machine that hosts one or more private agents can be done in a few hours, while the above has taken me well over a day to figure out (granted, I was new to containers).

No caching. So slower again. With every job running in isolation, many of the caching benefits that come with using a private agent in a virtual machine are gone again. Now these can be overcome by building a more elaborate image, including more of the different SDK’s and tools by default – but still, complete sources have to be downloaded every time and there will be no intermediate results being cached for free.

If you find more downsides, have identified a flaw, or like this approach, please do let me know!

But still, this approach works for me and I will continue to use it for a while. Also, I have learned a thing or two about working with containers, yay!

This year there is an Azure advent calendar, organized by Gregor Suttie and Richard Hooper, both fellow Microsoft MVPs. In total 75 sessions will be published, all covering an interesting Azure topic. Of course I jumped on the opportunity to get aboard, and fortunatly I was just in time to be added to the list. As always, I am happy to share with the community and delighted that I am allowed to do so. In this case, my contribution is on the topic of Logic Apps and can be found on the calendars Youtube Channel.

When I started preparing for my session and looked over the calendar again, I found that there were two sessions on Logic Apps on the calendar planned. Next to my own, there was also a session by Simon Waight. After a bit of coordination and a few e-mails back and forth, we divided up the topics. In the recording by Simon you will find:

  • An introduction to Logic Apps, connectors and actions
  • An introduction to pay per use environments and Integration Service Environments
  • Some best practices
  • A step by step guide on building your first Logic App

In my session I will expand upon this with the following subjects:

  • The introduction of a real world use case where I am working with Logic Apps
  • Infrastructure as Code for Logic Apps, so you can practice continuous deployment for Logic Apps
  • A short introduction on building your own Logic Apps.

I hope you enjoy the recording, which will go live somewhere today on the calendars Youtube Channel.

Merry Christmas!

This week I had the privilege of attending the Update Conference in Prague.

It was a great opportunity to visit this beautiful city and to attend talks give by some very bright people from all around the world. I had a great time and am happy I was invited and would be delighted to return next year.

As requested by some of the attendees, I am sharing my slides for both presentations here.

  1. Infrastructure as Code: Azure Resource Manager – inside out
  2. Secure deployments: Keeping your Application Secrets Private

Thank you for attending and I hope you enjoyed it just as much as I did!

When I visited my own website a few days ago, I was shocked at how long it has been since I posted here. The reason for this is that I have been working on two large projects, next to my regular work, for several months now. A few weeks ago the first one came to completion and I am really proud to say that my training Introduction to Azure DevOps for A Cloud Guru has been launched.

I think this has been the most ambitious side project I have started and completed for a while. And while I really enjoyed the work on this project, I also found it challenging. A great deal of things I had to learn on the fly and often I found I had to spend a tremendous amount of time on something that was seemingly easy. For this reason, I thought it wise to recap what I have learned. And by sharing it, I hope others might benefit from my experiences as well.

1. Yes, it is work!

I think I have spent roughly one day per week for five months or so on this online course. My estimation is that this will equate to roughly between 100 and 150 hours of work for 3 hours of video material. I have no idea how this compares to other authors of online training material and maybe this is were I find out I completely suck. Also, my guesstimate may be completely wrong. Maybe due to the time spent thinking about a video while going for a run or by taking a break and loosing track of time. The point however is, that this is a lot of time that you have to invest. This ratio of work hours to video hours shows that after a day of hard work, you have completed maybe twenty minutes of video or even less.

For me it is clear that you cannot do this type of work with a twenty minutes here and a half an hour there attitude. You will have to make serious room in your calendar and embrace the fact that you will spend at least one day a week on the project. If you cannot do that, progress will be too slow and your project might slowly fade into oblivion.

2. Find something small to start with

When I first came into contact with A Cloud Guru, I was strongly advised not to start working on a large project right away but start with a smaller project first. For me this meant the opportunity to create a more use-case focused, ACG Project style video. This is a 25-minute video that takes the viewer along in creating a cool project or showing a specific capability in a single video. This allowed me to practice and improve a lot of the skills that I would need when creating a complete course.

The advantage of creating and completing something smaller first, compared to splitting a larger project into parts, is that you are getting the full experience. You will have to write a project proposal, align your style of working with that of your editor, write a script, record, get feedback, start over, create a final recording, get more feedback and finally go through the steps of completing a project: adding a description, link to related resources, putting the sources online, etc.

Having this complete experience for a smaller project before moving on to a larger project really helped me.

3. Quality, quality, quality and … some more quality

Creating courses or videos for a commercial platform is paid work. This means that there is a clear expectation that the work you perform will be of a certain quality. For example, I got a RØDE Podcaster delivered to my home to help with sound quality. However, just using a high-quality microphone was not enough. Every audio sample I have edited in both Audacity and Camtasia to ensure that the audio did not contain any background sound or hum that would distract students. Also, all the videos were recorded on a large screen, configured to use a resolution of only 1080p. Every single time, all excess windows need to be closed or a re-record was necessary. Every time, excess browser tabs needed to be closed or a re-record was necessary. Browser windows and terminals needed the correct level of zoom, or a re-record was necessary. Favorites removed, browser cache cleared to ensure there are no pop-ups of previous entries, etc etc.

Conclusion? Producing high quality content entails a tremendous amount of details that you have to keep in mind every single time. Be prepared for that!

4. Be ready for (constructive) criticism

I feel really lucky in how my interactions with my editors at ACG went. Being on opposite sides of the world, most of the communication was offline through documents or spreadsheets and yet somehow, they managed to make all feedback feel friendly and constructive. In the occasional video call there was always time for a few minutes of pleasantries, before we got down to business. But yes, there were things to talk about. Feedback and criticism were frequent and very strict. I have re-edited some videos five or six times to meet the standards that I was supposed to meet. Especially on the quality of audio and video, there was a clear expectation of quality and that was rigorously verified by people who definitely had more experience than me.

All in all, I can say I have learned a lot recording these videos. But if you want to do this, be prepared and open to feedback.

5. Slides are cheap, demos are hard!

This is really a topic on its own and I think I will write more about it later. But if there is one thing I have learned, it is that demos are hard. Doing a demo on stage is hard, but much more forgiving than doing one in a video. When presenting, no-one minds if your mouse cursor is floating a bit around, searching for that button. When doing a live demo, it is cool to see someone debug a typo in a command on the fly. When presenting, you can make a minor mistake, correct it and then explain what went wrong and how to handle that. The required level of perfection is not as high as for a recorded demo. And then there is the sound! I found it impossible to record the video and audio for my demos in one go and have developed my own approach to it, which I will write about some other time.

In my experience, if you must record five minutes of demo, it might take four to five times as long as recording a five-minute slide presentation.

6. A race against time

While recording your project, your subject might be changing. For example, in the time I was creating my training on Azure DevOps multi-stage YAML builds were introduced, the user interface for Test Plans was changed and several smaller features that I showed in my demos were removed, renamed or moved to another location. Honestly, there are parts that I have recorded multiple times, due to the changes in Azure DevOps. Want more honesty? By the time the course went public, it was still outdated at some points. And yes, I know that I will have to update my course to include multi-stage YAML when it goes out of preview.

The point is, you will have to invest enough time every week in your project to ensure that your work in creating the course is not being overtaken by changes from the vendor. Software development and cloud in particular, is changing at such a rate that you will have to plan for incoming changes and know how to adapt. Also, circling back, taking a ‘yes this is work’ attitude will help spending enough time on your project, shortening its duration and help decrease the chance of being overtaken by changes.

Concluding

If you ever go down the path of creating an online training, I would recommend to keep the above in mind. Along with one final tip: Make sure you enjoy doing it. One thing that I do know for sure now is that if I was not enjoying my work on this training and I had to do it next to my other work, I would never have finished it.

Oh by the way, more details on that second large project? That will have to wait a few more months I’m afraid.

 

Please note: I have written a follow-up to this blog post, detailing a new, better approach in my opinion

One of the services in Azure that I enjoy most lately, is Azure Functions. Functions are great for writing the small single-purpose type of application that I write a lot nowadays. However, last week I was struggling with the configuration of bindings using an Azure Key Vault and I thought I’d share how to fix that.

When you create a new Azure Function for writing a message to a queue every minute, you might end up with something like the code below.

public class DemoFunction
{
  private readonly ILogger _logger;
  private readonly IConfiguration _configuration;

  public DemoFunction(ILogger logger, IConfiguration configuration)
  {
    _logger = logger;
    _configuration = configuration;
  }

  [FunctionName(nameof(DemoFunction))]
  public async Task Run(
    [TimerTrigger("0 */1 * * * *")] TimerInfo timer,
    [ServiceBus("%queueName%", Connection = "serviceBusConnectionString", EntityType = EntityType.Queue)] 
      IAsyncCollector sericeBusQueue)
  {
    var loopCount = int.Parse(_configuration["loopCount"]);

    for (var i=0; i<loopCount; i++)
    {
      await sericeBusQueue.AddAsync(i.ToString());
    }

    await sericeBusQueue.FlushAsync();
  }
}

As you can see, I am using Functions V2 and the new approach to dependency injection using constructors and non-static functions. And this works great! Not being in a static context anymore is highly satisfying for an OOP programmer and it also means that I can retrieve dependencies like my logger and configuration through the constructor.

One of the things I am doing with my Function, is pulling the name of the queue and the connection string to connect to that queue from my application settings. For a connection string this is the default and for the name of a queue or topic, I can do that by using a variable name enclosed in %-signs. After adding the correct three settings to my application settings, this function runs fine locally. My IConfiguration instance is automatically build and filled by the Functions runtime and my queueName and connectionString variables are in my local.settings.json.

The problem comes when trying to move this function to the cloud. Here I do not have a local.settings.json, nor do I want to have secrets in my application settings, the default location for the Functions runtime to pull its settings from. What I want to do, is using an Azure Key Vault for storing my secrets and loading any secrets from there.

It might have been that my Google-fu has been failing, but unfortunately I have not find any hook or method to allow the loading of extra configuration values for an Azure Function. Integrating with the runtime was important for me, since I also wanted to grab values for the configuration of my Function from the configuration, not just configuration that was used in my function.

Anyhow, what I ended up doing after a while of searching was the following:

public class Startup : FunctionsStartup    {
        public override void Configure(IFunctionsHostBuilder builder)
        {
            var services = builder.Services;

            var hostingEnvironment = services
                .BuildServiceProvider()
                .GetService<IHostingEnvironment>();

            var configurationBuilder = new ConfigurationBuilder()
                .SetBasePath(hostingEnvironment.ContentRootPath)
                .AddEnvironmentVariables();

            if (!hostingEnvironment.IsDevelopment())
            {
                {
                    var currentConfiguration = configurationBuilder.Build();
                    var tokenProvider = new AzureServiceTokenProvider();
                    var kvClient = new KeyVaultClient((authority, resource, scope) => 
                      tokenProvider.KeyVaultTokenCallback(authority, resource, scope));

                    configurationBuilder
                        .AddAzureKeyVault($"https://{currentConfiguration["keyVaultName"]}.vault.azure.net/", 
                          kvClient, new DefaultKeyVaultSecretManager());
                }
            }

            services.AddSingleton(configurationBuilder.Build());

            // More dependencies ...
        }
    }

The solution above will, if running in the cloud, use the Managed Identity of the Function plan to pull the values from a Key Vault and append them to the configuration. It works like a charm, however it feels a bit hacky to do override the existing configuration this way. If you find a better way, please do let me know!

Last week I had the pleasure of attending the 4DotNet event in Zwolle, The Netherlands. Next to catching up with old friends, I enjoyed presenting and listening to two other talks

Pat Hermens | Learning from Failure – Finding the ‘second story’

The first speaker up was Pat Hermens. Pat talked to us about Failure. Going over three examples exploring how a catastrophic event came to be, continuously returning to the question: was this a failure [by someone]? His point was that the answer was “no” in all these cases. Instead of focusing on what we believe was a mistake or human error in hindsight, we should focus on the circumstances that made a such an error even possible. Assuming no ill intent, no one wants a nuclear meltdown or a space shuttle crash to occur – still they did while everyone believed they were making correct decisions. Focusing on the second story, or the circumstances or culture that allowed the wrong decision to look like a good decision is the way forward in his opinion.

Patrick Schmidt | Valkuilen bij het maken van high performance applicaties

Next up was Patrick Schmidt. Patrick talked about some .NET internals and explained how you can still create memory leaks in a managed language. He showed how creating a labda function that uses a closure over large object can end in memory leaks. He then moved on to explain some garbage collector internals and how incorrect usage of object creation and destruction can ruin your performance. Of course the prime example of string concatenation vs. the StringBuilder came along here. Finally, he talked about some pitfalls when using Entity Framework: the 1+ n problem and how you can accidentally download a whole table and only do the selection in memory by mixing up IQueryable and IEnumerable.

Henry Been | Logging, instrumentation, dashboards, alerts and all that – for developers

For the final session I had the privilege of presenting myself. In this session I share what I have learned about monitoring and logging over the last year when using Azure Monitor in a number of applications. The slidedeck for this session can be downloaded. If you are looking for an example application to try things out yourself, you can continue working with the example I showed during the talk.

 

If you have read any of my blogs before, or know me only a little bit, you know I am a huge fan of ARM templates for Azure Resource Manager. However, every now and then I run into some piece of infrastructure that I would like to set up for my application only to find out that it is not supported by ARM templates. Examples are Cosmos DB databases and collections. Having to createIfNotExists() these was always a pain to code and also mixes the responsibility of resource allocation up with business logic. But no more as part of all the #MSBuild news, the following came in!

As of right now, you can specify the creation of an CosmosDB database and collection, using ARM templates. To create a CosmosDB database for use with the SQL API, you can now use the following template:

{
    "type": "Microsoft.DocumentDB/databaseAccounts/apis/databases",
    "name": "accountName/sql/databaseName",
    "apiVersion": "2016-03-31",
    "properties": {
        "resource": {
            "id": "databaseName"
        },
        "options": {
            "throughput": 400
        }
    }
}

After setting up a database, it is time to add a few containers. In this case I already provisioned throughput at the database level, so I can add as many containers as I need without additional cost. But, let’s start with just one:

{
    "type": "Microsoft.DocumentDb/databaseAccounts/apis/databases/containers",
    "name": "accountName/sql/databasename/containername",
    "apiVersion": "2016-03-31",
    "dependsOn": [ 
        "[resourceId('Microsoft.DocumentDB/databaseAccounts/apis/databases/accountName/sql/databaseName')]"
    ],
    "properties":
    {
        "resource":{
            "id":  "containerName",
            "partitionKey": {
                "paths": [
                    "/PartitionKey"
                ],
                "kind": "Hash"
            },
            "indexingPolicy": {
                "indexingMode": "consistent",
                "includedPaths": [{
                        "path": "/*",
                        "indexes": [
                            {
                                "kind": "Range",
                                "dataType": "number",
                                "precision": -1
                            },
                            {
                                "kind": "Hash",
                                "dataType": "string",
                                "precision": -1
                            }
                        ]
                    }
                ]
            }
        }
    }
}

I cannot just create the container and specify the, now mandatory, PartitionKey but also specify custom indexing policies. Putting this together with the template that I already had for creating a CosmosDB account, I can now automatically create all the dependencies for my application using the following ARM template:

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "discriminator": {
      "type": "string",
      "minLength": 1
    }
  },
  "variables": {
      "accountName": "[concat(parameters('discriminator'), '-adc-demo')]",
      "databaseName": "myDatabase",
      "usersContainerName": "users",
      "customersContainerName": "customers"
  },
  "resources": [
    {
      "type": "Microsoft.DocumentDB/databaseAccounts",
      "name": "[variables('accountName')]",
      "apiVersion": "2016-03-31",
      "location": "[resourceGroup().location]",
      "kind": "GlobalDocumentDB", 
      "properties": {
        "databaseAccountOfferType": "Standard",
        "consistencyPolicy": {
          "defaultConsistencyLevel": "Session",
          "maxIntervalInSeconds": 5,
          "maxStalenessPrefix": 100
        },
        "name": "[variables('accountName')]"
      }
    },
    {
      "type": "Microsoft.DocumentDB/databaseAccounts/apis/databases", 
      "name": "[concat(variables('accountName'), '/sql/', variables('databaseName'))]", 
      "apiVersion": "2016-03-31",
      "dependsOn": [
        "[resourceId('Microsoft.DocumentDB/databaseAccounts/', variables('accountName'))]"
      ], 
      "properties": { 
        "resource": { 
          "id": "[variables('databaseName')]"
        },
        "options": { 
          "throughput": "400" 
        } 
      }
    },
    { 
      "type": "Microsoft.DocumentDb/databaseAccounts/apis/databases/containers",
      "name": "[concat(variables('accountName'), '/sql/', variables('databasename'), '/', variables('usersContainerName'))]", 
      "apiVersion": "2016-03-31", 
      "dependsOn": [
        "[resourceId('Microsoft.DocumentDB/databaseAccounts/apis/databases', variables('accountName'), 'sql', variables('databaseName'))]"
      ], 
      "properties": { 
        "resource": {
          "id": "[variables('usersContainerName')]", 
          "partitionKey": {
            "paths": [
               "/CustomerId"
            ],
             "kind": "Hash"
          }, 
          "indexingPolicy": {
            "indexingMode": "consistent", 
            "includedPaths": [
              { 
                "path": "/*", 
                "indexes": [
                  { "kind": "Range", 
                    "dataType": "number", 
                    "precision": -1
                  },
                  { 
                    "kind": "Hash", 
                    "dataType": "string", 
                    "precision": -1
                  }
                ]
              }
            ]
          }
        } 
      }
     },
     { 
       "type": "Microsoft.DocumentDb/databaseAccounts/apis/databases/containers",
       "name": "[concat(variables('accountName'), '/sql/', variables('databasename'), '/', variables('customersContainerName'))]", 
       "apiVersion": "2016-03-31", 
       "dependsOn": [
         "[resourceId('Microsoft.DocumentDB/databaseAccounts/apis/databases', variables('accountName'), 'sql', variables('databaseName'))]"
       ], 
       "properties": { 
         "resource": {
           "id": "[variables('customersContainerName')]", 
           "partitionKey": {
             "paths": [
                "/City"
             ],
              "kind": "Hash"
           }, 
           "indexingPolicy": {
             "indexingMode": "consistent", 
             "includedPaths": [
               { 
                 "path": "/*", 
                 "indexes": [
                   { "kind": "Range", 
                     "dataType": "number", 
                     "precision": -1
                   },
                   { 
                     "kind": "Hash", 
                     "dataType": "string", 
                     "precision": -1
                   }
                 ]
               }
             ]
           }
         } 
       }
      }
  ]
}

I hope you enjoy CosmosDB database and collection support just as much as I do, happy coding!

Following up on my previous post on this subject (https://www.henrybeen.nl/add-a-ssl-certificate-to-your-azure-web-app-using-an-arm-template/), I am sharing a minimal, still complete, working example of an ARM template that can be used to provision the following:

  • An App Service Plan
  • An App Service, with:
    • A custom domain name
    • The Lets Encrypt Site extension installed
    • All configuration of the Lets Encrypt Site extension prefilled
  • An Authorization Rule for an Service Principal to install certificates

The ARM template can be found at: https://github.com/henrybeen/ARM-template-AppService-LetsEncrypt

To use this to create a Web App with an Lets Encrypt certificate and to automatically renew that, you have to do the following:

  • Pre-create a new Service Principal in your Azure Service Direction and obtain the objectId, clientId and a clientSecret for that Service Principal
  • Fill in the parameters.json file with a discriminator to make the names of your resources unique, the obtained objectId, clientId and clientSecret, a self-choosen GUID to use as the authorizationRule nameand a customHostname
  • Create a CNAME record pointing from that domain name to the follwing url: {discriminator}-appservice.azurewebsites.net
  • Roll out the template
  • Open up the Lets Encrypt extension, find all settings prefilled and request a certificate!

This is a subject that I have been wanting to write about for a while now and I am happy to say that I finally found the time while in an airplane*. I think I first discussed this question a few months ago with Wouter de Kort at the start of this year (2019.) Since then he has written an interesting blog post on how to structure your DTAP streets in Azure DevOps. His advice is to structure your environments as a pipeline that allows your code only to flow from source control to production, via other environments. Very sound advice, but next to that I wonder, do you need all those environments?

Traditionally a lot of us have been creating a number of environments, and for reasons. One for developers to deploy to and test their own work. One for testers to execute automated tests and do exploratory testing. And only when a release is of sufficient quality, it is promoted to the next environment: acceptance. Here the release is validated one final time in a production-like-environment, before it is pushed to production. The value that an acceptance environment is supposed to add, is that it is as production-like as possible. Often connected to other live systems to test integrations, whereas a test environment might be connected to stubs or not connected at all.

Drawbacks of multiple environments

However, creating and maintaining all these environments also comes with drawbacks:

  • It’s easy to see that your costs increase with the number of environments. Not necessarily in setting the environment up and maintaining it (since that is all done through IaC, right?) But still, we have to pay for the resource we use, and that can be serious money
  • Since our code and the value it delivers can only flow through production after it has gone through all the other environments, the number of environments has impact on how quickly we can deliver our code to production.

All these drawbacks are a plea for limiting the number of environments to as few as possible. Now with that in mind, do you really need an acceptance environment? I am going to argue that you might not. Especially when I hear things like: “Let’s go to acceptance quickly, so we do not have to wait another two days before we can go to production,” I die a little on the inside.

Why you might not need acceptance

So let’s go over some reasons for having an acceptance environment and seeing if we can make these redundant.

No wait, we do need an acceptance environment where our customer can explore the new features and accept their working, before we release them to all users.

While I hope that you involve your customer and other stakeholders before you have a finalized product, there can be value in customers having approve the release of features to users explicitly. However, is it really necessary to do this from an acceptance environment? Have you considered using feature toggles? This way you can release your code to the production environment and allow only your customer access to this new feature. Only after he approves, you open the feature up to more users. In other words, if we can ensure that shipping the new binaries to production, does not automatically entail the release of new features, we do not need an acceptance environment for final feature acceptance by the client. More information on feature flags (also called feature toggles), can be found here.

We need a production-like environment to do final tests

Trust me, there is no place like home. And for your code, production is home. The only way to truly validate your code and the value it should bring, is by running in it in production. An acceptance environment, even if more integrated with other systems and with more realistic data than production, does not compare to production. You cannot fully predict what your users will do with your features, estimate the impact of real world usage or foresee all deviating scenario’s. Here again, if you are using feature flags, that would be a much better approach to progressively open up a new feature to more and more users. And if issues show up, just stop the roll out for a bit or even reverse it, while you are shipping a fix.

Now, do you think you can go without an acceptance environment? And if not, please let me know why not and I might just add a counter argument.

Now, while I do realize that the above does not hold for every organization and every development team, I would definitely recommend to keep challenging yourself on the number of environments you need and if you can reduce that number.

 

(*) An airplane, where there is absolutly nothing else to do, is an environment I bet we all find inspiring!