I am a fan of private agents when working with Azure Pipelines, compared to the hosted Azure Pipelines Agents that are also available. In my experience the hosted agents can have a delay before they start and sometimes work slow in comparison with dedicated resources. Of course this is just an opinion. For this reason, I have been running my private agents on a virtual machine for years. Unfortunately, this solution is not perfect and has a big downside as well: no isolation between jobs.

No isolation between different jobs that are executed on the same agent means that all files left over from an earlier job, are available to any job currently running. It also means that it is possible to pollute NuGet caches, change files on the system, etc etc. And of course, running virtual machines in general is cumbersome due to patching, updates and all the operational risks and responsibilities.

So it was time to adopt one of the biggest revolutions in IT: containers. In this blog I will share how I created a Docker container image that hosts an Azure Pipelines agent and how to run a number of those images within Azure Container Instances. As a starting point I have taken the approach that Microsoft has laid down in its documentation, but I have made a number of tweaks and made my solution more complete. Doing so, I wanted to achieve the following:

  1. Have my container instances execute only a single job and then terminate and restart, ensuring nothing from a running job will become available to the next job.
  2. Have a flexible number of containers running that I can change frequently and with a single push on a button.
  3. Have a source code repository for my container image build, including a pipelines.yml that allows me to build and publish new container images on a weekly schedule.
  4. Have a pipeline that I can use to roll-out 1..n agents to Azure Container Instances – depending on the amount I need at that time.
  5. Do not have the PAT token that is needed for (de)registering agents available to anyone using the agent image.
  6. Automatically handle the registration of agents, as soon as a new container instance becomes available.

Besides the step-by-step instructions below I have also uploaded the complete working solution to GitHub at https://github.com/henrybeen/ContainerizedBuildAgents.

Let’s go!

Creating the container image

My journey started with reading the Microsoft documentation on creating a Windows container image with a Pipelines Agent. You can find this documentation at https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/docker?view=azure-devops. I found two downsides to this approach for my use case. First, the PAT token that is used for (de)registering the agent during the complete lifetime of the container. This means that everyone executing jobs on that agent, can pick up the PAT token and abuse it. Secondly, the agent is downloaded and unpacked at runtime. This means that the actual agent is slow to spin up.

To work around these downsides I started with splitting the Microsoft provided script into two parts, starting with a file called Build.ps1 as shown below.

param (
    [Parameter(Mandatory=$true)]
    [string]
    $AZDO_URL,

    [Parameter(Mandatory=$true)]
    [string]
    $AZDO_TOKEN
)

if (-not $(Test-Path "agent.zip" -PathType Leaf))
{
    Write-Host "1. Determining matching Azure Pipelines agent..." -ForegroundColor Cyan

    $base64AuthInfo = [Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes(":$AZDO_TOKEN"))
    $package = Invoke-RestMethod -Headers @{Authorization=("Basic $base64AuthInfo")} "$AZDO_URL/_apis/distributedtask/packages/agent?platform=win-x64&`$top=1"
    $packageUrl = $package[0].Value.downloadUrl
    Write-Host "Package URL: $packageUrl"

    Write-Host "2. Downloading the Azure Pipelines agent..." -ForegroundColor Cyan

    $wc = New-Object System.Net.WebClient
    $wc.DownloadFile($packageUrl, "$(Get-Location)\agent.zip")
}
else {
    Write-Host "1-2. Skipping downloading the agent, found an agent.zip right here" -ForegroundColor Cyan
}

Write-Host "3. Unzipping the Azure Pipelines agent" -ForegroundColor Cyan
Expand-Archive -Path "agent.zip" -DestinationPath "agent"

Write-Host "4. Building the image" -ForegroundColor Cyan
docker build -t docker-windows-agent:latest .

Write-Host "5. Cleaning up" -ForegroundColor Cyan
Remove-Item "agent" -Recurse

The script downloads the latest agent, if agent.zip is not existing yet, and unzips that file. Once the agent is in place, the docker image is build using the call to docker build. Once completed, the unpacked agent.zip folder is removed – just to keep things tidy. This clean-up also allows for rerunning the script from the same directory multiple times without warnings or errors. A fast feedback loop is also the reason I test for the existence of agent.zip before downloading it.

The next file to create is the Dockerfile. The changes here are minimal. As you can see I also copy over the agent binaries, so I do not have to download these anymore when the container runs.

FROM mcr.microsoft.com/windows/servercore:ltsc2019

WORKDIR c:/azdo/work

WORKDIR c:/azdo/agent
COPY agent .

WORKDIR c:/azdo
COPY Start-Up.ps1 .

CMD powershell c:/azdo/Start-Up.ps1

First we ensure that the directory c:\azdo\work exists by setting it as the working directory. Next we move to the directory that will contain the agent files and copy those over. Finally, we move one directory up and copy the Start-Up script over. To run the container image, a call into that script is made. So, let’s explore Start-Up.ps1 next.

if (-not (Test-Path Env:AZDO_URL)) {
  Write-Error "error: missing AZDO_URL environment variable"
  exit 1
}

if (-not (Test-Path Env:AZDO_TOKEN)) {
  Write-Error "error: missing AZDO_TOKEN environment variable"
  exit 1
}

if (-not (Test-Path Env:AZDO_POOL)) {
  Write-Error "error: missing AZDO_POOL environment variable"
  exit 1
}

if (-not (Test-Path Env:AZDO_AGENT_NAME)) {
  Write-Error "error: missing AZDO_AGENT_NAMEenvironment variable"
  exit 1
}

$Env:VSO_AGENT_IGNORE = "AZDO_TOKEN"

Set-Location c:\azdo\agent

Write-Host "1. Configuring Azure Pipelines agent..." -ForegroundColor Cyan

.\config.cmd --unattended `
  --agent "${Env:AZDO_AGENT_NAME}" `
  --url "${Env:AZDO_URL}" `
  --auth PAT `
  --token "${Env:AZDO_TOKEN}" `
  --pool "${Env:AZDO_POOL}" `
  --work "c:\azdo\work" `
  --replace

Remove-Item Env:AZDO_TOKEN

Write-Host "2. Running Azure Pipelines agent..." -ForegroundColor Cyan

.\run.cmd --once

The script first checks for the existence of four, mandatory, environment variables. I will provide these later on from Azure Container Instances, where we are going to run the image. Since the PAT token is still in this environment variable, we are setting another variable that will ensure that this environment variable is not listed or exposed by the agent, even though we will unset it later on. From here on, the configuration of the agent is started with a series of command-line arguments that allow for a head-less registration of the agent with the correct agent pool.

It is good to know that I am using a non-random name on purpose. This allows me to re-use the same agent name -per Azure Container Instances instance- which prevents an ever increasing list of offline agents in my pool. This is also the reason I have to add the –replace argument. Omitting this would cause the registration to fail.

Finally, we run the agent, specifying the –once argument. This argument will make that the agent will pick-up only a single job and terminate once that job is complete. Since this is the final command in the PowerShell script, this will also terminate the script. And since this is the only CMD specified in the Dockerfile, this will also terminate the container.

This ensures that my container image will execute only one job ever, ensuring that the side-effects of any job cannot propagate to the next job.

Once these files exist, it is time to execute the following from the PowerShell command-line.

PS C:\src\docker-windows-agent> .\Build.ps1 -AZDO_URL https://dev.azure.com/**sorry**-AZDO_TOKEN **sorry**
1-2. Skipping downloading the agent, found an agent.zip right here
3. Unzipping the Azure Pipelines agent
4. Building the image
Sending build context to Docker daemon  542.8MB
Step 1/7 : FROM mcr.microsoft.com/windows/servercore:ltsc2019
 ---> 782a75e44953
Step 2/7 : WORKDIR c:/azdo/work
 ---> Using cache
 ---> 24bddc56cd65
Step 3/7 : WORKDIR c:/azdo/agent
 ---> Using cache
 ---> 03357f1f229b
Step 4/7 : COPY agent .
 ---> Using cache
 ---> 110cdaa0a167
Step 5/7 : WORKDIR c:/azdo
 ---> Using cache
 ---> 8d86d801c615
Step 6/7 : COPY Start-Up.ps1 .
 ---> Using cache
 ---> 96244870a14c
Step 7/7 : CMD powershell Start-Up.ps1
 ---> Running in f60c3a726def
Removing intermediate container f60c3a726def
 ---> bba29f908219
Successfully built bba29f908219
Successfully tagged docker-windows-agent:latest
5. Cleaning up

and…

docker run -e AZDO_URL=https://dev.azure.com/**sorry** -e AZDO_POOL=SelfWindows -e AZDO_TOKEN=**sorry** -e AZDO_AGENT_NAME=Agent007 -t docker-windows-agent:latest

1. Configuring Azure Pipelines agent...

  ___                      ______ _            _ _
 / _ \                     | ___ (_)          | (_)
/ /_\ \_____   _ _ __ ___  | |_/ /_ _ __   ___| |_ _ __   ___  ___
| | | |/ /| |_| | | |  __/ | |   | | |_) |  __/ | | | | |  __/\__ \
\_| |_/___|\__,_|_|  \___| \_|   |_| .__/ \___|_|_|_| |_|\___||___/
                                   | |
        agent v2.163.1             |_|          (commit 0a6d874)


>> Connect:

Connecting to server ...

>> Register Agent:

Scanning for tool capabilities.
Connecting to the server.
Successfully added the agent
Testing agent connection.
2019-12-29 09:45:39Z: Settings Saved.
2. Running Azure Pipelines agent...
Scanning for tool capabilities.
Connecting to the server.
2019-12-29 09:45:47Z: Listening for Jobs
2019-12-29 09:45:50Z: Running job: Agent job 1
2019-12-29 09:47:19Z: Job Agent job 1 completed with result: Success

This shows that the container image can be build and that running it allows me to execute jobs on the agent. Let’s move on to creating the infrastructure within Azure that is needed for running the images.

Creating the Azure container registry

As I usually do, I created a quick ARM templates for provisioning the Azure Container Registry. Below is the resource part.

{
    "name": "[variables('acrName')]",
    "type": "Microsoft.ContainerRegistry/registries",
    "apiVersion": "2017-10-01",
    "location": "[resourceGroup().location]",
    "sku": {
        "name": "Basic"
    },
    "properties": {
        "adminUserEnabled": true
    }
}

Creating a container registry is fairly straightforward, specifying a sku and enabling the creation of an administrative account is enough. Once this is done, we roll this template out to a resource group called Tools, tag the image against the new registry, authenticate and push the image.

PS C:\src\docker-windows-agent> New-AzureRmResourceGroupDeployment -ResourceGroupName Tools -TemplateFile .\acr.json -TemplateParameterFile .\acr.henry.json

DeploymentName          : acr
ResourceGroupName       : Tools
ProvisioningState       : Succeeded
Timestamp               : 29/12/2019 20:07:25
Mode                    : Incremental
TemplateLink            :
Parameters              :
                          Name             Type                       Value
                          ===============  =========================  ==========
                          discriminator    String                     hb

Outputs                 :
DeploymentDebugLogLevel :

PS C:\src\docker-windows-agent> az acr login --name hbazdoagentacr
Unable to get AAD authorization tokens with message: An error occurred: CONNECTIVITY_REFRESH_TOKEN_ERROR
Access to registry 'hbazdoagentacr.azurecr.io' was denied. Response code: 401. Please try running 'az login' again to refresh permissions.
Unable to get admin user credentials with message: The resource with name 'hbazdoagentacr' and type 'Microsoft.ContainerRegistry/registries' could not be found in subscription 'DevTest (a314c0b2-589c-4c47-a565-f34f64be939b)'.
Username: hbazdoagentacr
Password:
Login Succeeded

PS C:\src\docker-windows-agent> docker tag docker-windows-agent hbazdoagentacr.azurecr.io/docker-windows-agent

PS C:\src\docker-windows-agent> docker push hbazdoagentacr.azurecr.io/docker-windows-agent
The push refers to repository [hbazdoagentacr.azurecr.io/docker-windows-agent]
cb08be0defff: Pushed
1ed2626efafb: Pushed
f6c3f9abc3b8: Pushed
770769339f15: Pushed
8c014918cbca: Pushed
c57badbbe459: Pushed
963f095be1ff: Skipped foreign layer
c4d02418787d: Skipped foreign layer
latest: digest: sha256:7972be969ce98ee8c841acb31b5fbee423e1cd15787a90ada082b24942240da6 size: 2355

Now that all the scripts and templates are proven, let’s automate this through a pipeline. The following YAML is enough to build and publish the container.

trigger:
  branches:
    include: 
      - master
  paths:
    include: 
      - agent
    
pool:
  name: Azure Pipelines
  vmImage: windows-2019

workspace:
    clean: all

steps:
  - task: PowerShell@2
    displayName: 'PowerShell Script'
    inputs:
      targetType: filePath
      workingDirectory: agent
      filePath: ./agent/Build.ps1
      arguments: -AZDO_URL https://dev.azure.com/azurespecialist -AZDO_TOKEN $(AZDO_TOKEN)

  - task: Docker@2
    displayName: Build and push image to container registry
    inputs:
      command: buildAndPush
      containerRegistry: BuildAgentsAcr
      repository: docker-windows-agent
      Dockerfile: agent/Dockerfile
      tags: latest
      buildContext: agent

To make the above work, two more changes are needed:

  1. The Build.ps1 needs to be changed a bit, just remove steps 4 and 5
  2. A service connection to the ACR has to be made with the name BuildAgentsAcr

With the container image available in an ACR and a repeatable process to get updates out, it is time to create the Azure Container Instances that are going to run the image.

Creating the Azure container instance(s)

Again, I am using an ARM template for creating the Azure Container Instances.

{
    "copy": {
        "name": "acrCopy", 
        "count": "[parameters('numberOfAgents')]" 
    },
    "name": "[concat(variables('aciName'), copyIndex())]",
    "type": "Microsoft.ContainerInstance/containerGroups",
    "apiVersion": "2018-10-01",
    "location": "[resourceGroup().location]",
    "properties": {
        "imageRegistryCredentials": [{
            "server": "[reference(resourceId('Microsoft.ContainerRegistry/registries',variables('acrName'))).loginServer]",
            "username": "[listCredentials(resourceId('Microsoft.ContainerRegistry/registries',variables('acrName')),'2017-03-01').username]",
            "password": "[listCredentials(resourceId('Microsoft.ContainerRegistry/registries',variables('acrName')),'2017-03-01').passwords[0].value]"
        }], 
        "osType": "Windows", 
        "restartPolicy": "Always",
        "containers": [{
            "name": "[concat('agent-', copyIndex())]",
            "properties": {
                "image": "[parameters('containerImageName')]",
                "environmentVariables": [
                    {
                        "name": "AZDO_URL",
                        "value": "[parameters('azdoUrl')]"
                    },
                    {
                        "name": "AZDO_TOKEN",
                        "secureValue": "[parameters('azdoToken')]"
                    },
                    {
                        "name": "AZDO_POOL",
                        "value": "[parameters('azdoPool')]"
                    },
                    {
                        "name": "AZDO_AGENT_NAME",
                        "value": "[concat('agent-', copyIndex())]"
                    }
                ],
                "resources": {
                    "requests": {
                        "cpu": "[parameters('numberOfCpuCores')]",
                        "memoryInGB": "[parameters('numberOfMemoryGigabytes')]"
                    }
                }
            }
        }]
    }
}

At the start of the resource definition, we specify that we want multiple copies of this resource, not just one. The actual number of resources is specified using a template parameter. This construct allows us to specify the number of agents that we want to run in parallel in ACI instances every time this template is redeployed. If we combine this with a complete deployment mode, the result will be that agents in excess of that number get removed automatically as well. When providing the name of the ACI instance, we are concatenating the name with the outcome of the function copyIndex(). This function will return a integer, specifying in which iteration of the copy loop the template currently is. This way unique names for all the resources are being generated.

As with most modern resources, the container instance takes the default resource properties in the root object and contains another object called properties that contains all the resource specific configuration. Here we first have to specify the imageRegistryCredentials. These are the details needed to connect ACI to the ACR, for pulling the images. I am using ARM template syntax to automatically fetch and insert the values, without ever looking at them.

The next interesting property is the RestartPolicy. The value of Always instructs ACI to automatically restart my image whenever it completes running, no matter if that is from an error or successfully. This way, whenever the agent has run a single job and the container completes, it gets restarted within seconds.

In the second properties object, a reference to the container image, the environment variables and the container resources are specified. The values here should be self-explanatory, and of course the complete ARM template with all parameters and other plumbing is available on the GitHub repository.

With the ARM template done and ready to go, let’s create another pipeline – now using the following YAML.

trigger:
  branches:
    include: 
      - master
  paths:
    include: 
      - aci
    
pool:
  name: Azure Pipelines
  vmImage: windows-2019

workspace:
    clean: all

steps:
  - task: AzureResourceGroupDeployment@2
    displayName: 'Create or update ACI instances'
    inputs:
      azureSubscription: 'RG-BuildAgents'
      resourceGroupName: 'BuildAgents'
      location: 'West Europe'
      csmFile: 'aci/aci.json'
      csmParametersFile: 'aci/aci.henry.json'
      deploymentMode: Complete
      overrideParameters: '-azdoToken "$(AZDO_TOKEN)"'

Nothing new in here really. This pipeline just executes the ARM template against another Azure resource group. Running this pipeline succeeds and a few minutes later, I have the following ACI instances.

And that means I have following agents in my Pool.

All-in-all: It works! Here you can see that when scaling the number of agents up or down, the names of the agents stay the same and that I will get a number of offline agents that is at most the number of agents to which I have scaled up at a certain point in time.

And with this, I have created a flexible, easy-to-scale, setup for running Azure Pipelines jobs in isolation, in containers!

Downsides

Of course, this solution is not perfect and a number of downsides remain. I have found the following:

  1. More work
  2. No caching

More work. The largest drawback that I have found with this approach is that it is more work to set up. Creating a virtual machine that hosts one or more private agents can be done in a few hours, while the above has taken me well over a day to figure out (granted, I was new to containers).

No caching. So slower again. With every job running in isolation, many of the caching benefits that come with using a private agent in a virtual machine are gone again. Now these can be overcome by building a more elaborate image, including more of the different SDK’s and tools by default – but still, complete sources have to be downloaded every time and there will be no intermediate results being cached for free.

If you find more downsides, have identified a flaw, or like this approach, please do let me know!

But still, this approach works for me and I will continue to use it for a while. Also, I have learned a thing or two about working with containers, yay!

If you have read any of my blogs before, or know me only a little bit, you know I am a huge fan of ARM templates for Azure Resource Manager. However, every now and then I run into some piece of infrastructure that I would like to set up for my application only to find out that it is not supported by ARM templates. Examples are Cosmos DB databases and collections. Having to createIfNotExists() these was always a pain to code and also mixes the responsibility of resource allocation up with business logic. But no more as part of all the #MSBuild news, the following came in!

As of right now, you can specify the creation of an CosmosDB database and collection, using ARM templates. To create a CosmosDB database for use with the SQL API, you can now use the following template:

{
    "type": "Microsoft.DocumentDB/databaseAccounts/apis/databases",
    "name": "accountName/sql/databaseName",
    "apiVersion": "2016-03-31",
    "properties": {
        "resource": {
            "id": "databaseName"
        },
        "options": {
            "throughput": 400
        }
    }
}

After setting up a database, it is time to add a few containers. In this case I already provisioned throughput at the database level, so I can add as many containers as I need without additional cost. But, let’s start with just one:

{
    "type": "Microsoft.DocumentDb/databaseAccounts/apis/databases/containers",
    "name": "accountName/sql/databasename/containername",
    "apiVersion": "2016-03-31",
    "dependsOn": [ 
        "[resourceId('Microsoft.DocumentDB/databaseAccounts/apis/databases/accountName/sql/databaseName')]"
    ],
    "properties":
    {
        "resource":{
            "id":  "containerName",
            "partitionKey": {
                "paths": [
                    "/PartitionKey"
                ],
                "kind": "Hash"
            },
            "indexingPolicy": {
                "indexingMode": "consistent",
                "includedPaths": [{
                        "path": "/*",
                        "indexes": [
                            {
                                "kind": "Range",
                                "dataType": "number",
                                "precision": -1
                            },
                            {
                                "kind": "Hash",
                                "dataType": "string",
                                "precision": -1
                            }
                        ]
                    }
                ]
            }
        }
    }
}

I cannot just create the container and specify the, now mandatory, PartitionKey but also specify custom indexing policies. Putting this together with the template that I already had for creating a CosmosDB account, I can now automatically create all the dependencies for my application using the following ARM template:

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "discriminator": {
      "type": "string",
      "minLength": 1
    }
  },
  "variables": {
      "accountName": "[concat(parameters('discriminator'), '-adc-demo')]",
      "databaseName": "myDatabase",
      "usersContainerName": "users",
      "customersContainerName": "customers"
  },
  "resources": [
    {
      "type": "Microsoft.DocumentDB/databaseAccounts",
      "name": "[variables('accountName')]",
      "apiVersion": "2016-03-31",
      "location": "[resourceGroup().location]",
      "kind": "GlobalDocumentDB", 
      "properties": {
        "databaseAccountOfferType": "Standard",
        "consistencyPolicy": {
          "defaultConsistencyLevel": "Session",
          "maxIntervalInSeconds": 5,
          "maxStalenessPrefix": 100
        },
        "name": "[variables('accountName')]"
      }
    },
    {
      "type": "Microsoft.DocumentDB/databaseAccounts/apis/databases", 
      "name": "[concat(variables('accountName'), '/sql/', variables('databaseName'))]", 
      "apiVersion": "2016-03-31",
      "dependsOn": [
        "[resourceId('Microsoft.DocumentDB/databaseAccounts/', variables('accountName'))]"
      ], 
      "properties": { 
        "resource": { 
          "id": "[variables('databaseName')]"
        },
        "options": { 
          "throughput": "400" 
        } 
      }
    },
    { 
      "type": "Microsoft.DocumentDb/databaseAccounts/apis/databases/containers",
      "name": "[concat(variables('accountName'), '/sql/', variables('databasename'), '/', variables('usersContainerName'))]", 
      "apiVersion": "2016-03-31", 
      "dependsOn": [
        "[resourceId('Microsoft.DocumentDB/databaseAccounts/apis/databases', variables('accountName'), 'sql', variables('databaseName'))]"
      ], 
      "properties": { 
        "resource": {
          "id": "[variables('usersContainerName')]", 
          "partitionKey": {
            "paths": [
               "/CustomerId"
            ],
             "kind": "Hash"
          }, 
          "indexingPolicy": {
            "indexingMode": "consistent", 
            "includedPaths": [
              { 
                "path": "/*", 
                "indexes": [
                  { "kind": "Range", 
                    "dataType": "number", 
                    "precision": -1
                  },
                  { 
                    "kind": "Hash", 
                    "dataType": "string", 
                    "precision": -1
                  }
                ]
              }
            ]
          }
        } 
      }
     },
     { 
       "type": "Microsoft.DocumentDb/databaseAccounts/apis/databases/containers",
       "name": "[concat(variables('accountName'), '/sql/', variables('databasename'), '/', variables('customersContainerName'))]", 
       "apiVersion": "2016-03-31", 
       "dependsOn": [
         "[resourceId('Microsoft.DocumentDB/databaseAccounts/apis/databases', variables('accountName'), 'sql', variables('databaseName'))]"
       ], 
       "properties": { 
         "resource": {
           "id": "[variables('customersContainerName')]", 
           "partitionKey": {
             "paths": [
                "/City"
             ],
              "kind": "Hash"
           }, 
           "indexingPolicy": {
             "indexingMode": "consistent", 
             "includedPaths": [
               { 
                 "path": "/*", 
                 "indexes": [
                   { "kind": "Range", 
                     "dataType": "number", 
                     "precision": -1
                   },
                   { 
                     "kind": "Hash", 
                     "dataType": "string", 
                     "precision": -1
                   }
                 ]
               }
             ]
           }
         } 
       }
      }
  ]
}

I hope you enjoy CosmosDB database and collection support just as much as I do, happy coding!

In an earlier post on provisioning a Let’s encrypt SSL certificate to a Web App, I touched upon the subject of creating an RBAC Role Assignment using an ARM template. In that post I said that I wasn’t able to provision an Role Assignment to a just single resource (opposed to a whole Resourcegroup.) This week I found out that this was due to an error on my side. The template for provisioning an Authorizaton Rule for just a single resource, differs from that for provisioning a Rule for a whole Resourcegroup.

Here the correct JSON for provisioning an Role Assignment to a single resource:

    {
      "type": "Microsoft.Web/sites/providers/roleAssignments",
      "apiVersion": "2015-07-01",
      "name": "[concat(parameters('appServiceName'), '/Microsoft.Authorization/', parameters('appServiceContributerRoleGuid'))]",
      "dependsOn": [
        "[resourceId('Microsoft.Web/Sites', parameters('appServiceName'))]"
      ],
      "properties": {
        "roleDefinitionId": "[concat('/subscriptions/', subscription().subscriptionId, '/providers/Microsoft.Authorization/roleDefinitions/', 'b24988ac-6180-42a0-ab88-20f7382dd24c')]",
        "principalId": "[parameters('appServiceContributorObjectId')]"
      }
    },

As Ohad correctly points out in the comments the appServiceContributerRoleGuid, should be a unique Guid generated by you. It does not refer back to a Guid of any predefined role.

In contrast, below find the JSON for provisioning an Authorizaton Rule for a Resourcegroup as a whole. To provision a roleAssignment for a single resource, we do not need to set a more specific scope, but completely leave it out. Instead the roleAssignment has to be nested within the resource it applies to. This is visible when comparing the type, name and scope properties of both definitions.

    {
      "type": "Microsoft.Authorization/roleAssignments",
      "apiVersion": "2015-07-01",
      "name": "[parameters('appServiceContributerRoleName')]",
      "dependsOn": [
        "[resourceId('Microsoft.Web/Sites', parameters('appServiceName'))]"
      ],
      "properties": {
        "roleDefinitionId": "[concat('/subscriptions/', subscription().subscriptionId, '/providers/Microsoft.Authorization/roleDefinitions/', 'b24988ac-6180-42a0-ab88-20f7382dd24c')]",
        "principalId": "[parameters('appServiceContributorObjectId')]",
        "scope": "[concat(subscription().id, '/resourceGroups/', resourceGroup().name)]"
      }
    }

In my previous post (secret management part 1: using azurekey vault and azure managed identity) I showed an example of storing secrets (keys, passwords or certificates) in an Azure Key Vault and how to retrieve them securely. Now, this approach has one downside and that is this indirection via the Key Vault.

In the previous implementation, service X creates an key to access it and we store it in the Key Vault. After that, service Y that needs the key, authenticates to the Azure Active Directory to access the Key Vault, retrieve the secret and use it to access service X. Why can’t we just access service X, after authenticating to the Azure Active Directory, as shown below?

In this approach we completely removed the need for Azure Key Vault, reducing the amount of hassle. Another benefit is that we are no longer creating extra secrets, which means we can also not loose them. Just another security benefit. Now let’s build an example and see how this works.

Infrastructure

Again we start by creating an ARM Template to deploy our infrastructure. This time we are using a feature of the Azure SQL DB Server to have an AAD identity be appointed as an administrator on that server, in the following snippet.

{
  "type": "administrators",
  "name": "activeDirectory",
  "apiVersion": "2014-04-01-preview",
  "location": "[resourceGroup().location]",
  "properties": {
    "login": "MandatorySettingValueHasNoFunctionalImpact", 
    "administratorType": "ActiveDirectory",
    "sid": "[reference(concat(resourceId('Microsoft.Web/sites', parameters('appServiceName')),'/providers/Microsoft.ManagedIdentity/Identities/default'), '2015-08-31-preview').principalId]",
    "tenantId": "[subscription().tenantid]"
  },
  "dependsOn": [
    "[concat('Microsoft.Sql/servers/', parameters('sqlServerName'))]"
  ]
}

We are using the same approach as earlier, but now to set the objectId for the AAD admin of the Azure SQL DB Server. One thing that is also important is that the property for ‘login’ is just a placeholder of the principals name. Since we do not know it, we can set it to anything we want. If we would ever change the user through the portal (which we shouldn’t), this property will reflect the actual username.

Again, the full template can be found on GitHub.

Code

With the infrastructure in place, let’s write some passwordless code to have our App Service access the created Azure SQL DB:

if (isManagedIdentity)
{
  var azureServiceTokenProvider = new AzureServiceTokenProvider();
  var accessToken = await azureServiceTokenProvider.GetAccessTokenAsync("https://database.windows.net/");

  var builder = new SqlConnectionStringBuilder
  {
    DataSource = ConfigurationManager.AppSettings["databaseServerName"] + ".database.windows.net",
    InitialCatalog = ConfigurationManager.AppSettings["databaseName"],
    ConnectTimeout = 30
  };

  if (accessToken == null)
  {
    ViewBag.Secret = "Failed to acuire the token to the database.";
  }
  else
  {
    using (var connection = new SqlConnection(builder.ConnectionString))
    {
      connection.AccessToken = accessToken;
      connection.Open();

      ViewBag.Secret = "Connected to the database!";
    }
  }
}

First we request a token and specify a specific resource “https://database.windows.net/” as the type of resource we want to use the token for. Next we start building a connection string, just as we would do normally. However, we leave out anything related to authentication. Next (and this is only available in .NET Framework 4.6.1 or higher), just before opening the SQL Connection we set the acquired token on the connection object. From there on, we can again work normally as ever before.

Again, it’s that simple! The code is, yet again available on GitHub.

Supported services

Unfortunately, you can not use this approach for every service you will want to call and are dependent on the service supporting this approach. A full list of services that support token based application authentication are listed on MSDN. Also, you can support this way of authentication on your own services. Especially when you are moving to a microservices architecture, this can save you a lot of work and management of secrets.

Last week I received a follow-up question from a fellow developer about a presentation I did regarding Azure Key Vault and Azure Managed Identity. In this presentation I claimed, and quickly showed, how you can use these two offerings to store all the passwords, keys and certificates you need for your ASP.NET application in a secure storage (the Key Vault) and also avoid the problem of just getting another, new password to access that Key Vault.

I have written a small ASP.NET application that reads just one very secure secret from an Azure Key Vault and displays it on the screen. Let’s dive into the infrastructure and code to make this work!

Infrastructure

Whenever we want our code to run in Azure, we need to have some infrastructure it runs on. For a web application, your infrastructure will often contain an Azure App Service Plan and an Azure App Service. We are going to create these using an ARM template. We use the same ARM template to also create the Key Vault and provide an identity to our App Service. The ARM template that delivers these components can be found on GitHub. Deploying this template, would result in the following:

The Azure subscription you are deploying this infrastructure to, is backed by an Azure Active Directory. This directory is the basis for all identity & access management within the subscription. This relation also links the Key  Vault to that same AAD. This relation allows us to create access policies on the Key Vault that describe what operations (if any) any user in that directory can perform on the Key Vault.

Applications can also be registered in an AAD and we can thus give them access to the Key Vault. However, how would an application authenticate itself to the AAD? This is where Managed Identity comes in. Managed Identity will create an service principal (application) in that same Active Directory that is backing the subscription. At runtime your Azure App Service will be provided with environment variables that allow you to authenticate without the use of passwords.

For more information about ARM templates, see the information on MSDN. However there are two important parts of my template that I want to share. First the part that enables the Managed Identity on the App Service:

{
  "name": "[parameters('appServiceName')]",
  "type": "Microsoft.Web/sites",
  …,
  "identity": {
    "type": "SystemAssigned"
  }
}

Secondly, we have to give this identity, that is yet to be created, access to the Key Vault. We do this by specifying an access policy on the KeyVault. Be sure to declare a ‘DependsOn’ the App Service, so you will only reference the identity after it is created:

{
  "type": "Microsoft.KeyVault/vaults",
  "name": "[parameters('keyVaultName')]",
  …,
  "properties": {
    "enabledForTemplateDeployment": false,
    "tenantId": "[subscription().tenantId]",
    "accessPolicies": [
       {
        "tenantId": "[subscription().tenantId]",
        "objectId": "[reference(concat(resourceId('Microsoft.Web/sites', parameters('appServiceName')),'/providers/Microsoft.ManagedIdentity/Identities/default'), '2015-08-31-preview').principalId]",
        "permissions": {
          "secrets": [ "get" ]
        }
      }
    ]
  }
}

Here I am using some magic (that I just copy/pasted from MSDN) to refer back to my earlier deployed app service managed identity and retrieve the principalId and use that to create an access policy for that identity.

That is all, so let’s deploy the templates. Normally you would set up continuous deployment using Azure Pipelines, but for this quick demo I used Powershell:

Login-AzureRmAccount
Select-AzureRmSubscription -SubscriptionId " xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx"
New-AzureRmResourceGroup "keyvault-managedidentity" -Location "West-Europe"
New-AzureRmResourceGroupDeployment -TemplateFile .\keyvault-managedidentity.json -TemplateParameterFile .\keyvault-managedidentity.parameters.json -ResourceGroupName keyvault-managedidentity

Now with the infrastructure in place, let’s add the password that we want to protect to the Key Vault. There are many, many ways to do this but let’s use Powershell again:

$password = Read-Host 'What is your password?' -AsSecureString
Set-AzureKeyVaultSecret -VaultName demo4847 -Name password -SecretValue $password

Do not be alarmed if you get an access denied error. This is most likely because you still have to give yourself access to the Key Vault. By default no-one has access, not even the subscription owners. Let’s fix that with the following command:

Set-AzureRmKeyVaultAccessPolicy -ResourceGroupName "keyvault-managedidentity" -VaultName demo4847 -ObjectId xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -PermissionsToSecrets list,get,set

Code

With the infrastructure in place, let’s write the application that access this secret. I have created a simple, ASP.NET MVC application and edited the Home view to contain the following main body. Again the code is also on GitHub:

A secret

@if (ViewBag.IsManagedIdentity) {

Got a secret, can you keep it? (...) If I show you then I know you won't tell what I said: @ViewBag.Secret

} else {

Running locally, so no secret to tell }

Now to supply the requested values, I have added the following code to the HomeController:

public async Task Index()
  var isManagedIdentity = Environment.GetEnvironmentVariable("MSI_ENDPOINT") != null
    && Environment.GetEnvironmentVariable("MSI_SECRET") != null;

  ViewBag.IsManagedIdentity = isManagedIdentity;

  if (isManagedIdentity)
  {
    var azureServiceTokenProvider = new AzureServiceTokenProvider();
    var keyVaultClient = new KeyVaultClient(new KeyVaultClient.AuthenticationCallback(azureServiceTokenProvider.KeyVaultTokenCallback));
    var secret = await keyVaultClient.GetSecretAsync("https://demo4847.vault.azure.net/secrets/password");

    ViewBag.Secret = secret.Value;
  }

  return View();
}

First I check if we are running in an Azure App Service with Managed Identity enabled. This looks a bit hacky, but it is actually the recommended approach. Next, if running as an MI, I use the AzureSErviceTokenProvider (NuGet package: Microsoft.Azure.Services.AppAuthentication) to retrieve an AAD token. In turn I use that token to instantiate an KeyVaultClient (NuGet package: Microsoft.Azure.KeyVault) and use it to retrieve the secret.

That’s it!

Want to know more?

I hope to write two more blogs on this subject soon. One about using system to system authentication and authorization and not storing extra secrets into KeyVault and one about Config Builders, a new development for .NET Core 2.0 and .NET Framework 4.71 or higher.

Ever wished you would receive a simple heads up when an Azure deployment fails? Ever troubleshooted an issue and looked for the button: “Tell me when this happens again?” Well, I just found it.

Yesterday I stumbled across a -for me (*) – new feature that is just amazing: azure activity log alerts. A feature to notify me when something specific happens.

With the introduction of the Azure Resource Manager model, the activity log was also introduced. The activity log is an audit trail of all events that happen within your Azure subscription, either user initiated or events that originate in Azure itself. This is a tremendous powerfull feature in itself, however it has become more powerfull now. With azure activity log alerts you can create rules that automatically trigger and notify you when an event is emitted that you find interesting.

In this blog post I will detail two scenario’s where activity log alerts can help you out.

(*) It seems this feature was already launched in May this year, according to this Channel9 video

Example: Manage authorizations

Let’s say you are working with a large team on a large project or on a series of related projects. One thing that you might want to keep taps on, is people creating new authorizations. So let’s see if we can quickly set something up to send me an e-mail whenever this happens.

  1. Let’s start by spinning up the monitoring blade in the Azure portal.
  2. In the monitoring blade the activity log automatically opens up. Here we can look through past events and see what has happened and why. Since we are looking to get pro-activly informed about any creation events, lets navigate to Alerts:
  3. In the top of the blade, choose Add activity log alert and the following dialog will open:
  4. Here there are a number of things we have to fill out. As the name and description “A new authorization is created” covers what we are about to do. Select your subscription and the resourcegroup where you want to place this alert. This is not the resourcegroup that the alert concerns, it is where the alert itself lives. As event category we pick “Administrative” and as Resource Type “Role assignment.” The last resets all other dropdowns so we only have to select an Operation name. Let’s pick “Create role assignment.”
  5. After selecting what we want to be alerted about, let’s decide how we want to alerted. This is done via an Alert group, an alert group is a group of one or more actions that are grouped under one name and can be reused. Let’s name our action group “StandardActionGroup” and add an e-mailadres. Giving us a final result as follows:
  6. Now let’s authorize a new user on a resource:
  7. And hurray, we are notified by e-mail:

Example: Streaming Analytics hick-up

So you have an Azure resource that has some issues. Every now and then it gets in a faulted state or just stops working. Often you will find that this is nicely put into the activity log. For example I have a Streaming Analytics job that faults every now and then. Let’s see how we can get Azure to “tell me when this happens again.”

  1. Go to the activity log of the resource with an error
  2. Open the details of the Warning and find the link to Add activity log alert

  3. The blade to open a new alert is added, with everything prefilled to capture just that specific event. In essence allowing you to ask Azure to tell you ‘if it happens again’

Can we automate that?

Finally, as you can see in the image below, every activity log alert is a resource in itself. Which means you can see them when you list a resourcegroup and that you can create them automatically using ARM templates. For example as part of your continuous delivery practice.

E-mail sucks, I want to create automated responses

Also possible. You can also have an webhook called as part of an actiongroup. This way you can easily hook up an Azure function to immediately remedy an issue, for example.