One of the things that makes Azure DevOps so great is the REST API that comes with it. This API allows you to do almost all the things that you can do through the interface. Unfortunately, it is sometimes a bit behind in functionality when comparing it to the interface. Especially in edge cases or when looking at the newest features, support for these features has sometimes not lighted up in the REST API yet. Or the functionality is available, but it is not yet documented.

One example where I ran into this were the new Environments, that can be used for supporting YAML pipelines. If you are working with tens or hundreds of pipelines, automation is key to doing so effectively so I needed that API!

To work with environments, three types of operations need to be available:

  1. Management (get, create, update, delete) of environment themselves;
  2. Management (get, add, remove) of user permissions on those environments;
  3. Management (get, add, remove) of checks on those environments. Checks are rules that are enforced on every deployment that goes into that environment.

The first type of operation has recently been made available in the preview of the next version of API and can be found here. However, managing user permissions or checks is not yet documented. For a recent project, I went ahead with reverse engineering these calls. In this post I will share how I reverse engineered managing user permissions on environments.

Disclaimer: this is al reverse engineered, so no guarantees whatsoever.

Tip: The approach outlined here works for many of the newer functionalities added to Azure DevOps, which seem to often use calls to URLs that start with _apis that are quite stable in my experience.

Managing user permissions

Finding the call for listing user permissions was rather straight forward. To get the API, I went through the following steps:

  1. Open the details of an environment and navigate to the security settings (visible on the left in the screenshot below);
  2. Next I opened up the developer tools, went to the network tab and filtered the list down to XHR requests only and refreshed the page (visible on the right in the screenshot below).

In the list of executed XHR requests, I selected the request that returns the different user permissions. I found this request by first looking at the request below it (roledefinitions), but quickly saw that this only listed the different roles and their names, descriptions and meaning. Inspecting the results visible on the far right will show the active permissions as JSON. I marked the corresponding sections left and right with different colors for the ease of reading.

The URL that was being called for this result was: Inspecting this URL in detail shows that the 1 at the end corresponds with the id of the environment as it is visible in the URL of the screenshot before. The guid in front of the environmentId took a bit more investigation, but after looking around for a bit, this came out to be the id of the project (formerly Team Project) that the environment is in. From here the call for listing the current user permissions on any environment can be generalized to:


If you are not familiar with your project id(s), you can find those using a GET call to

Adding a user permission

Now that we can view the current set of permissions, let’s see if we can add a new user permission. To get the details of this operation, I did the following:

  1. Cleared the recent list of captured network operations;
  2. Make any change to the list on the left (note that there are no XHR requests being made);
  3. Press the save button in the user interface. This results in the following:

In this second screenshot we see that a PUT request has been made to with the following content:


This shows that adding permissions can be done by PUTTING an entry to the same URL as we have seen before. The valid values for the  roleNames property are Administrator, Reader and User. (They can be retrieved and verified using the roledefinitions call we discovered earlier.) But what do we put in for the user id? To find the user id, we have to do two things.

  1. Look the user up using the Graph API
  2. Decode the user descriptor into the correct guid.

The graph API can be accessed through a GET call to, yielding the following response:

    "subjectKind": "user",
    "directoryAlias": "henry",
    "domain": "c570bc0b-9ef3-4b15-98fc-9d7ca9b22afe",
    "principalName": "",
    "mailAddress": "",
    "origin": "aad",
    "originId": "186167cb-63ab-4ef9-a221-0398c9ab6bba",
    "displayName": "Henry Been",
    "_links": {
      "self": {
        "href": ""
      "memberships": {
        "href": ""
      "membershipState": {
        "href": ""
      "storageKey": {
        "href": ""
      "avatar": {
         "href": ""
    "url": "",
    "descriptor": "aad.NjBhYWMwNTMtNjkzNy03ZTA3LTlhM2YtMjk2MjAyYTNkZmZm"

From this response we take the descriptor, strip of the prefix of aad. and BASE64 decode the remainder. This yields the guid we need.

Note: updating an entry is done the same way, the PUT operation acts as an upsert.

Deleting a user permission

Deleting user permissions can be done by making two changes:

  1. Sending a PATCH operation instead of an PUT
  2. Leaving out the roleName

Happy coding!

When the number of YAML pipelines you work with in Azure DevOps increases, you might find the need for centralizing some parts of the configuration variables for your pipeline(s). Maybe some configuration that is shared between multiple application components or even some values that are shared between multiple teams or managed by a central team.

To make this happen, you might be tempted to do either one of the following:

  1. Copy the configuration variables to every individual pipeline. The disadvantage of this is that you are now copying these values around and if one of them changes, this gives a lot of work and the risk of missing one or more of the necessary updates
  2. Use variable groups that you know from the Classic Build and Release definitions to manage central configuration. But if you do this, you loose some of the benefits of pipelines-as-code again.

Luckily there is now an alternative available, by combining some of the new YAML Pipelines features, namely variable templates and repository resources. In this post I want to share how I built a solution where configuration variables were centralized in one repository and used them from YAML pipelines in other repositories.

Let’s start by assuming that you have a number of pipelines, that all look somewhat like this:

  name: 'Azure Pipelines'
  vmImage: windows-latest

- allCompanyVariable: someValue
- allComponentsVariable: someValue

  - script: |
    echo $(allCompanyVariable)
    echo $(allComponentsVariable)

Of course the repetition is in these two variables and we want to centralize them out into some kind of configuration. To do this, I created a new repository named Shared-Configuration. In this repository I added a directory configuration with two files all-company.yml and my-department.yml, containing configuration values that need to be shared with multiple pipelines across multiple repositories.

These files are very straight forward and look like this:

  allCompanyVariable: someCompanyWideThingy
  foo: bar
To use these values, we have to update the dependent pipelines. First we have to add a repository resource declaration. Here we specify that we want to pull another Git repository into the scope of our builds and want to be able to reference files from it. We do this by adding the following YAML at the top of our pipeline:
  - repository: sharedConfigurationRepository
    type: git
    name: Shared-Configuration
This means that the repository Shared-Configuration is pulled into the scope of our pipeline and can be referenced using the identifier sharedConfigurationRepository. With that the repository is in scope, we can reference variable template files that are in this repository as follows:
- template: configuration/all-company.yml@sharedConfigurationRepository
- template: configuration/my-department.yml@sharedConfigurationRepository

Here we again declare variables, but instead of specifying key/value pairs we are now pulling in all variables from the referenced files. To do this, the full path to the file has to be specified along with a the identifier of the repository that holds this file. If the @-sign and the identifier are omitted, the path is assumed to be in the same repository as the pipeline definition.

Putting all of this together, the following syntax can be used for pulling variables defined in other, shared repositories into your YAML pipeline:

  name: 'Azure Pipelines'
  vmImage: windows-latest

  - repository: sharedConfigurationRepository
    type: git
    name: Shared-Configuration

- template: configuration/all-company.yml@sharedConfigurationRepository
- template: configuration/my-department.yml@sharedConfigurationRepository

- script: |
    echo $(allCompanyVariable)
    echo foo value is $(foo)


Happy coding!


Many of my clients of lately have been working with naming conventions for all kinds of things. Some examples are Service Principals, Projects, AAD Groups, Azure resource groups, etc etc.

It seems that naming conventions have become an accepted -or even recommended- approach within many organizations. For some reason small groups of administrators enforce rules that might benefit them, but are holding thousands of other users back. And while I see a little merit in naming conventions from their point of view, I doubt that it is worth the trade-off. In this post I want to share some of the drawbacks of using naming conventions I have encountered.

Maybe we should reconsider doing this naming conventions thing?

Names become impossible to remember, work with or pronounce

A given Azure application can quickly span a number of components. An average resource group I work with has probably anywhere between ten and twenty resources in it. If three of these resources were databases, it is great if the team can refer to them using meaningful names. That is really hard if they are being called 3820-db-39820, 3820-db-399454 and 3820-db-730244. It makes any meaningful conversation impossible. Just imagine you are being called about the 39820 database, how do you even know what that is and what it does?

Having a customers database, a users database and an events database, it would be great to just name them customers, users and events. it makes any conversation about them much easier, removes noise, looking things up in source code or configuration and the work of the development team becomes much more fun. Imagine joining a team that runs five components with on average five teen resources, not pleasant at all.

I know it is a bloody database

And while we are on the topic of names like 3820-db-39820, everyone already knows it is a database. The team that created the database only deals with databases, so dûh! And the team itself can see it right next to the name:

User interfaces cannot deal with your overly long names

Another customer of mine had a naming convention for Azure resource groups. In my opinion quite ridiculous since every resource group is already in a subscription and those can in turn be organized into management groups. A great way for mimicking your organizational structure and seeing what a given resource group is about. So no need for calling them {businessunit}_{department}_{project}_{team}_{freetext} really. But of course some admins still do, delivering the following interface:

Trust me, no fun when you have to work with your resources the whole day. And this happens with many types of naming conventions. Here is another example, now using AAD groups that follow naming conventions.

With many naming conventions, many examples can be found in many different tools. Tools are simply not designed for displaying long, weird names that are supposed to encode all kinds of information. If you add in a bit more of duplication of information like resource type and resource location it becomes even worse.

How do you cope with changes?

Let’s assume two departments in your organization get merged. You now have two options:

  • Do you rename hundreds of resources?
  • Or do you leave hundreds of resources with deceiving names?

Your pick!

A hint: in many tools and systems you cannot change the name of a resource after creation.

Behavior gets attached

Now let’s switch from things that are just annoying to things that can be potentially dangerous. Just for fun, ever tried calling your new App Service not appsrv_{meaningfullname} but db_{meaningfullname}? I bet there will be one or two administrative scripts breaking soon after.

Another problem I just recently encountered is that of conflicting conventions. At one customer all Azure resource groups were prefixed with a certain identifier for the team name, let’s stay team-{number}. For example,  team-0125 and team-5578. This had been going on for a while and more and more dependencies were taken on that convention. One team for example allowed for requesting new pre-configured databases and then automatically added that to the correct resource group based on the team number. A second team scanned all resources and calculated internal cost allocations based on the name of the resource group, etc etc. A few months after establishing the convention and adding all this behavior on top of it, a new off the shelve application was purchased. The team that bought this application had only one request though, and that is if some of the resource groups it should target could start with module-.


Implicit assumptions all around

One thing I have learned to not underestimate is the amount of reverse-engineering going on within organizations. If I name my resourcegroups {projectnumber}-{somethingusefull}, and I don’t tell that the first part is a project number, or folks don’t listen, all kinds of assumptions can start to arise. Imagine that there are also cost centers that most of the time have the same number as the project that they belong to.

Mixing in some attached behavior and confusion will quickly lead to errors. The things that can go wrong when automation teams start their work on the assumption that the first part of any resource group name is the cost center…

There are better approaches

The real problem, in my opinion, is that the names of resources, groups or things are not meant for encoding all sorts of information. And in reality, you don’t have to either.

More and more systems now provide the means for storing extra information with a resource. For example Azure supports adding tags to your resources. It is as simple as adding key/value pairs with descriptive, well-formatted, not abbreviated names. With recent changes to the Azure Portal, you can now even have them render in your lists. As an added benefit, tags are also easy to remove, add, rename and correct. Giving administrators all the information they need, without needing to burden users with long names:

Active Directory for example supports extension attributes. And of course, if tags are not supported demand that they be added to the system you are using. Try to push for the correct solution, instead of trying to work around the issue.

As a conclusion, I have learned that naming conventions have downsides even though they seem to be an accepted practice within many companies. While they may bring value, I really hope this can serve as a reminder for myself and a warning for others when they think it is smart to introduce. Let’s try to not be that person that forces hundreds or thousands of colleagues into some structure that really hinders them, only for our own convenience.

And if we really have to use naming conventions, can we please have the least significant part of an naming convention first? {team}-{department}-{businessUnit} will at least solve most of the everyday problems of the impacted users.

One of the elements of ARM templates that is often overlooked is the capability to write your own functions. The syntax for writing functions in JSON can be a bit cumbersome, especially when comparing to full programming languages, but they can really help to make your ARM templates more readable.

Here is an example that I use in my own ARM templates:

"functions": [
      "namespace": "hb",
      "members": {
        "createKeyVaultReference": {
          "parameters": [
              "name": "keyVaultName",
              "type": "string"
              "name": "secretName",
              "type": "string"
          "output": {
            "type": "string",
            "value": "[concat('@Microsoft.KeyVault(SecretUri=https://', parameters('keyVaultName'), '', parameters('secretName'), '/)')]"

In my templates I frequently use the @Microsoft.KeyVault syntax for AppSettings to reference settings in the Key Vault. It is a very secure and convenient way for working with application secrets. The only downside isthat you have to remember the complete syntax for this notation every single time and have to remember to not forget the trailing slash. That last thing is a mistake that I see frequently. Using a function like this, we can now encode that knowledge in one location and reuse it throughout our template.

After the declaration above, we can invoke this function by prefixing the function name with the name of the function namespace and a dot. So calling the function declared above requires an invocation of hb.createKeyVaultReference:

    "name": "appsettings",
    "type": "config",
    "apiVersion": "2015-08-01",
    "dependsOn": [
    "properties": {
        "someSetting": "[hb.createKeyVaultReference(variables('keyVaultName'), 'someSetting')]"

Here the clutter of concatenating the different parts of the @Microsoft.KeyVault reference string is now removed and the knowledge on how to built that string is moved into one single location, ready for reuse by anyone.



A while ago I wrote about how to retrieve configuration settings for an Azure Function App from the Key Vault. This approach had the benefit that settings are loaded early enough in the start-up procedure, to also be available to retrieve configuration settings for function bindings.

Unfortunately, this approach also has a number of downsides:

  • Lately, I have been running into intermittent issues with this approach lately. In some cases the settings failed to load from the Key Vault in time, resulting in start-up errors of the Function App. These are not only annoying, but can also be hard to troubleshoot.
  • The workaround to only load the settings when a Managed Identity is available and not when running locally, is not so clean. Next to being not clean, it also makes for a difference in behavior between running locally and deployed. A possible source of issues.
  • The app configuration in the portal (or better: in your ARM template) does not list all the settings the application uses. One of more of them are being side-loaded from the Key Vault. This makes that it is less transparent for new members of the team which settings the application really needs.

Luckily, a new approach to loading secrets from an Azure Key Vault has become available. As an setting configuration you can now specify a reference to an Key Vault secret using the following syntax:


Please note the slash at the end! This slash is mandatory to reference the last version of the secret. Optionally, you can specify a specific version by specifying it after the slash. Leaving the slash completely out, makes that the reference will not work.

Now whenever this syntax is encountered by the App Service runtime as a setting, it will automatically use your App Services Managed Identity to retrieve the secret from the Key Vault (just as my previous solution did that in user code). In the portal you can even track the status of this look-up, where it is displayed as successful or not:

Of course this syntax can also be used in ARM templates and I have moved almost all my applications over to this approach now. An example of the syntax for ARM templates is as follows:

  "apiVersion": "2015-08-01",
  "name": "appsettings",
  "type": "config",
  "dependsOn": [
  "properties": {
    "AuditDbConnectionString": "[concat('@Microsoft.KeyVault(SecretUri=https://', variables('keyVaultName'), '', variables('secretName'), '/)')]"

I hope you enjoy this new feature as much as I do and happy coding! 🙂

There is no place like home, and also: there is no place like production. Production is the only location where your code is really put to the test by your users. This insight has helped developers to accept that we have to go to production fast and often. But we also want to do so in a responsible way. Common tactics for safely going to production often are for example blue-green deployments and the use of feature flags.

While helping to limit the impact of mistakes, the downside of these two approaches is that they both run only one version of your code. Either by using a different binary or by using a different code path, the one or the other implementation is executed. But what if we could execute both the old and the new code in parallel and compare the results? In this post I will show you how to run two algorithms in parallel and compare the results, using a library called, which is available on NuGet.

The use case

To experiment a bit met experimentation I have created a straight forward site for calculation the nth position in the Fibonacci sequence as you can see below.

I started out with the most simple implementation that seemed to meet the requirements of the implementation. In other words, the implementation behind this calculation is done using recursion, like shown below.

public class RecursiveFibonacciCalculator : IRecursiveFibonacciCalculator
  public Task<int> CalculateAsync(int position)
    return Task.FromResult(InnerCalculate(position));

  private int InnerCalculate(int position)
    if (position < 1)
      return 0;
    if (position == 1)
      return 1;

    return InnerCalculate(position - 1) + InnerCalculate(position - 2);

While this is a very clean and to-the-point implementation code wise, the performance is -to say the least- up for improvement. So I took a few more minutes and I came up with an implementation which I believe is also correct, but also much more performant, namely the following:

public class LinearFibonacciCalculator : ILinearFibonacciCalculator
  public Task<int> CalculateAsync(int position)
    var results = new int[position + 1];

    results[0] = 0;
    results[1] = 1;

    for (var i = 2; i <= position; i++)
      results[i] = results[i - 1] + results[i - 2];

    return Task.FromResult(results[position]);

However, just swapping implementations and releasing didn’t feel good to me, so I thought: how about running an experiment on this?

The experiment

With the experiment that I am going to run, I want to achieve the following goals:

  • On every user request, run both my recursive and my linear implementation.
  • To the user I want to return the recursive implementation, which I know to be correct
  • While doing this, I want to record:
    • If the linear implementation yields the same results
    • Any performance differences.

To do this, I installed the Scientist NuGet package and added the code shown below as my new implementation.

public async Task OnPost()
  HasResult = true;
  Position = FibonacciInput.Position;

  Result = await Scientist.ScienceAsync<int>("fibonacci-implementation", experiment =>
    experiment.Use(async () => await _recursiveFibonacciCalculator.CalculateAsync(Position));
    experiment.Try(async () => await _linearFibonacciCalculator.CalculateAsync(Position));

    experiment.AddContext("Position", Position);

This code calls into the Scientist functionality and sets up an experiment with the name fibonacci-implementation that should return an int. The configuration of the experiment is done using the calls to Use(..) and Try(..)

Use(..): The use method is called with a lambda that should execute the known, trusted implementation of the code that you are experimenting with.

Try(..): The try method is called with another lambda, but now the one with the new, not yet verified implementation of the algorithm.

Both the Use(..) and Try(..) method accept a sync-lambda as well, but I do use async/await here on purpose. The advantage of using an async-lambda is that both implementations will be executed in parallel, thus reducing the duration of the web server call. The final thing I do with the call to the AddContext(..) method is adding a named value to the experiment. I can use this context property-bag later on to interpret the results and to pin down scenarios in which the new implementation is lacking.

Processing the runs

While the code above takes care of running two implementations of the Fibonacci sequence in parallel, I am not working with the results yet – so let’s change that. Results can be redirected to an implementation of the IResultPublisher interface that ships with Scientist by assigning an instance to the static ResultPublisher property as I do in my StartUp class.

var resultsPublisher = new ExperimentResultPublisher();
Scientist.ResultPublisher = resultsPublisher;

In the ExperimentResultPublisher class, I have added the code below.

public class ExperimentResultPublisher : IResultPublisher, IExperimentResultsGetter
  public LastResults LastResults { get; private set; }
  public OverallResults OverallResults { get; } = new OverallResults();

  public Task Publish<T, TClean>(Result<T, TClean> result)
    if (result.ExperimentName == "fibonacci-implementation")
      LastResults = new LastResults(! result.Mismatched, result.Control.Duration, result.Candidates.Single().Duration);

    return Task.CompletedTask;

For all instances of the fibonacci-implementation experiment, I am saving the results of the last observation. Observation is the Scientist term for a single execution of the experiment. Once I have moved the results over to my own class LastResults, I am adding these last results to another class of my own OverallResults that calculated the minimum, maximum and average for each algorithm.

The LastResults and OverallResults properties are part of the IExperimentResultsGetter interface, which I later on inject in my Razor page.


All of the above, combined with some HTML, will then gave me the following results after a number of experiments.

I hope you can see here how you can take this forward and extract more meaningful information from this type of experimentation. One thing that I would highly recommend is finding all observations where the existing and new implementation do not match and logging a critical error from your application.

Just imagine how you can have your users iterate and verify all your test cases, without them ever knowing.

I am a fan of private agents when working with Azure Pipelines, compared to the hosted Azure Pipelines Agents that are also available. In my experience the hosted agents can have a delay before they start and sometimes work slow in comparison with dedicated resources. Of course this is just an opinion. For this reason, I have been running my private agents on a virtual machine for years. Unfortunately, this solution is not perfect and has a big downside as well: no isolation between jobs.

No isolation between different jobs that are executed on the same agent means that all files left over from an earlier job, are available to any job currently running. It also means that it is possible to pollute NuGet caches, change files on the system, etc etc. And of course, running virtual machines in general is cumbersome due to patching, updates and all the operational risks and responsibilities.

So it was time to adopt one of the biggest revolutions in IT: containers. In this blog I will share how I created a Docker container image that hosts an Azure Pipelines agent and how to run a number of those images within Azure Container Instances. As a starting point I have taken the approach that Microsoft has laid down in its documentation, but I have made a number of tweaks and made my solution more complete. Doing so, I wanted to achieve the following:

  1. Have my container instances execute only a single job and then terminate and restart, ensuring nothing from a running job will become available to the next job.
  2. Have a flexible number of containers running that I can change frequently and with a single push on a button.
  3. Have a source code repository for my container image build, including a pipelines.yml that allows me to build and publish new container images on a weekly schedule.
  4. Have a pipeline that I can use to roll-out 1..n agents to Azure Container Instances – depending on the amount I need at that time.
  5. Do not have the PAT token that is needed for (de)registering agents available to anyone using the agent image.
  6. Automatically handle the registration of agents, as soon as a new container instance becomes available.

Besides the step-by-step instructions below I have also uploaded the complete working solution to GitHub at

Let’s go!

Creating the container image

My journey started with reading the Microsoft documentation on creating a Windows container image with a Pipelines Agent. You can find this documentation at I found two downsides to this approach for my use case. First, the PAT token that is used for (de)registering the agent during the complete lifetime of the container. This means that everyone executing jobs on that agent, can pick up the PAT token and abuse it. Secondly, the agent is downloaded and unpacked at runtime. This means that the actual agent is slow to spin up.

To work around these downsides I started with splitting the Microsoft provided script into two parts, starting with a file called Build.ps1 as shown below.

param (


if (-not $(Test-Path "" -PathType Leaf))
    Write-Host "1. Determining matching Azure Pipelines agent..." -ForegroundColor Cyan

    $base64AuthInfo = [Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes(":$AZDO_TOKEN"))
    $package = Invoke-RestMethod -Headers @{Authorization=("Basic $base64AuthInfo")} "$AZDO_URL/_apis/distributedtask/packages/agent?platform=win-x64&`$top=1"
    $packageUrl = $package[0].Value.downloadUrl
    Write-Host "Package URL: $packageUrl"

    Write-Host "2. Downloading the Azure Pipelines agent..." -ForegroundColor Cyan

    $wc = New-Object System.Net.WebClient
    $wc.DownloadFile($packageUrl, "$(Get-Location)\")
else {
    Write-Host "1-2. Skipping downloading the agent, found an right here" -ForegroundColor Cyan

Write-Host "3. Unzipping the Azure Pipelines agent" -ForegroundColor Cyan
Expand-Archive -Path "" -DestinationPath "agent"

Write-Host "4. Building the image" -ForegroundColor Cyan
docker build -t docker-windows-agent:latest .

Write-Host "5. Cleaning up" -ForegroundColor Cyan
Remove-Item "agent" -Recurse

The script downloads the latest agent, if is not existing yet, and unzips that file. Once the agent is in place, the docker image is build using the call to docker build. Once completed, the unpacked folder is removed – just to keep things tidy. This clean-up also allows for rerunning the script from the same directory multiple times without warnings or errors. A fast feedback loop is also the reason I test for the existence of before downloading it.

The next file to create is the Dockerfile. The changes here are minimal. As you can see I also copy over the agent binaries, so I do not have to download these anymore when the container runs.


WORKDIR c:/azdo/work

WORKDIR c:/azdo/agent
COPY agent .

WORKDIR c:/azdo
COPY Start-Up.ps1 .

CMD powershell c:/azdo/Start-Up.ps1

First we ensure that the directory c:\azdo\work exists by setting it as the working directory. Next we move to the directory that will contain the agent files and copy those over. Finally, we move one directory up and copy the Start-Up script over. To run the container image, a call into that script is made. So, let’s explore Start-Up.ps1 next.

if (-not (Test-Path Env:AZDO_URL)) {
  Write-Error "error: missing AZDO_URL environment variable"
  exit 1

if (-not (Test-Path Env:AZDO_TOKEN)) {
  Write-Error "error: missing AZDO_TOKEN environment variable"
  exit 1

if (-not (Test-Path Env:AZDO_POOL)) {
  Write-Error "error: missing AZDO_POOL environment variable"
  exit 1

if (-not (Test-Path Env:AZDO_AGENT_NAME)) {
  Write-Error "error: missing AZDO_AGENT_NAMEenvironment variable"
  exit 1


Set-Location c:\azdo\agent

Write-Host "1. Configuring Azure Pipelines agent..." -ForegroundColor Cyan

.\config.cmd --unattended `
  --agent "${Env:AZDO_AGENT_NAME}" `
  --url "${Env:AZDO_URL}" `
  --auth PAT `
  --token "${Env:AZDO_TOKEN}" `
  --pool "${Env:AZDO_POOL}" `
  --work "c:\azdo\work" `

Remove-Item Env:AZDO_TOKEN

Write-Host "2. Running Azure Pipelines agent..." -ForegroundColor Cyan

.\run.cmd --once

The script first checks for the existence of four, mandatory, environment variables. I will provide these later on from Azure Container Instances, where we are going to run the image. Since the PAT token is still in this environment variable, we are setting another variable that will ensure that this environment variable is not listed or exposed by the agent, even though we will unset it later on. From here on, the configuration of the agent is started with a series of command-line arguments that allow for a head-less registration of the agent with the correct agent pool.

It is good to know that I am using a non-random name on purpose. This allows me to re-use the same agent name -per Azure Container Instances instance- which prevents an ever increasing list of offline agents in my pool. This is also the reason I have to add the –replace argument. Omitting this would cause the registration to fail.

Finally, we run the agent, specifying the –once argument. This argument will make that the agent will pick-up only a single job and terminate once that job is complete. Since this is the final command in the PowerShell script, this will also terminate the script. And since this is the only CMD specified in the Dockerfile, this will also terminate the container.

This ensures that my container image will execute only one job ever, ensuring that the side-effects of any job cannot propagate to the next job.

Once these files exist, it is time to execute the following from the PowerShell command-line.

PS C:\src\docker-windows-agent> .\Build.ps1 -AZDO_URL**sorry**-AZDO_TOKEN **sorry**
1-2. Skipping downloading the agent, found an right here
3. Unzipping the Azure Pipelines agent
4. Building the image
Sending build context to Docker daemon  542.8MB
Step 1/7 : FROM
 ---> 782a75e44953
Step 2/7 : WORKDIR c:/azdo/work
 ---> Using cache
 ---> 24bddc56cd65
Step 3/7 : WORKDIR c:/azdo/agent
 ---> Using cache
 ---> 03357f1f229b
Step 4/7 : COPY agent .
 ---> Using cache
 ---> 110cdaa0a167
Step 5/7 : WORKDIR c:/azdo
 ---> Using cache
 ---> 8d86d801c615
Step 6/7 : COPY Start-Up.ps1 .
 ---> Using cache
 ---> 96244870a14c
Step 7/7 : CMD powershell Start-Up.ps1
 ---> Running in f60c3a726def
Removing intermediate container f60c3a726def
 ---> bba29f908219
Successfully built bba29f908219
Successfully tagged docker-windows-agent:latest
5. Cleaning up


docker run -e AZDO_URL=**sorry** -e AZDO_POOL=SelfWindows -e AZDO_TOKEN=**sorry** -e AZDO_AGENT_NAME=Agent007 -t docker-windows-agent:latest

1. Configuring Azure Pipelines agent...

  ___                      ______ _            _ _
 / _ \                     | ___ (_)          | (_)
/ /_\ \_____   _ _ __ ___  | |_/ /_ _ __   ___| |_ _ __   ___  ___
| | | |/ /| |_| | | |  __/ | |   | | |_) |  __/ | | | | |  __/\__ \
\_| |_/___|\__,_|_|  \___| \_|   |_| .__/ \___|_|_|_| |_|\___||___/
                                   | |
        agent v2.163.1             |_|          (commit 0a6d874)

>> Connect:

Connecting to server ...

>> Register Agent:

Scanning for tool capabilities.
Connecting to the server.
Successfully added the agent
Testing agent connection.
2019-12-29 09:45:39Z: Settings Saved.
2. Running Azure Pipelines agent...
Scanning for tool capabilities.
Connecting to the server.
2019-12-29 09:45:47Z: Listening for Jobs
2019-12-29 09:45:50Z: Running job: Agent job 1
2019-12-29 09:47:19Z: Job Agent job 1 completed with result: Success

This shows that the container image can be build and that running it allows me to execute jobs on the agent. Let’s move on to creating the infrastructure within Azure that is needed for running the images.

Creating the Azure container registry

As I usually do, I created a quick ARM templates for provisioning the Azure Container Registry. Below is the resource part.

    "name": "[variables('acrName')]",
    "type": "Microsoft.ContainerRegistry/registries",
    "apiVersion": "2017-10-01",
    "location": "[resourceGroup().location]",
    "sku": {
        "name": "Basic"
    "properties": {
        "adminUserEnabled": true

Creating a container registry is fairly straightforward, specifying a sku and enabling the creation of an administrative account is enough. Once this is done, we roll this template out to a resource group called Tools, tag the image against the new registry, authenticate and push the image.

PS C:\src\docker-windows-agent> New-AzureRmResourceGroupDeployment -ResourceGroupName Tools -TemplateFile .\acr.json -TemplateParameterFile .\acr.henry.json

DeploymentName          : acr
ResourceGroupName       : Tools
ProvisioningState       : Succeeded
Timestamp               : 29/12/2019 20:07:25
Mode                    : Incremental
TemplateLink            :
Parameters              :
                          Name             Type                       Value
                          ===============  =========================  ==========
                          discriminator    String                     hb

Outputs                 :
DeploymentDebugLogLevel :

PS C:\src\docker-windows-agent> az acr login --name hbazdoagentacr
Unable to get AAD authorization tokens with message: An error occurred: CONNECTIVITY_REFRESH_TOKEN_ERROR
Access to registry '' was denied. Response code: 401. Please try running 'az login' again to refresh permissions.
Unable to get admin user credentials with message: The resource with name 'hbazdoagentacr' and type 'Microsoft.ContainerRegistry/registries' could not be found in subscription 'DevTest (a314c0b2-589c-4c47-a565-f34f64be939b)'.
Username: hbazdoagentacr
Login Succeeded

PS C:\src\docker-windows-agent> docker tag docker-windows-agent

PS C:\src\docker-windows-agent> docker push
The push refers to repository []
cb08be0defff: Pushed
1ed2626efafb: Pushed
f6c3f9abc3b8: Pushed
770769339f15: Pushed
8c014918cbca: Pushed
c57badbbe459: Pushed
963f095be1ff: Skipped foreign layer
c4d02418787d: Skipped foreign layer
latest: digest: sha256:7972be969ce98ee8c841acb31b5fbee423e1cd15787a90ada082b24942240da6 size: 2355

Now that all the scripts and templates are proven, let’s automate this through a pipeline. The following YAML is enough to build and publish the container.

      - master
      - agent
  name: Azure Pipelines
  vmImage: windows-2019

    clean: all

  - task: PowerShell@2
    displayName: 'PowerShell Script'
      targetType: filePath
      workingDirectory: agent
      filePath: ./agent/Build.ps1
      arguments: -AZDO_URL -AZDO_TOKEN $(AZDO_TOKEN)

  - task: Docker@2
    displayName: Build and push image to container registry
      command: buildAndPush
      containerRegistry: BuildAgentsAcr
      repository: docker-windows-agent
      Dockerfile: agent/Dockerfile
      tags: latest
      buildContext: agent

To make the above work, two more changes are needed:

  1. The Build.ps1 needs to be changed a bit, just remove steps 4 and 5
  2. A service connection to the ACR has to be made with the name BuildAgentsAcr

With the container image available in an ACR and a repeatable process to get updates out, it is time to create the Azure Container Instances that are going to run the image.

Creating the Azure container instance(s)

Again, I am using an ARM template for creating the Azure Container Instances.

    "copy": {
        "name": "acrCopy", 
        "count": "[parameters('numberOfAgents')]" 
    "name": "[concat(variables('aciName'), copyIndex())]",
    "type": "Microsoft.ContainerInstance/containerGroups",
    "apiVersion": "2018-10-01",
    "location": "[resourceGroup().location]",
    "properties": {
        "imageRegistryCredentials": [{
            "server": "[reference(resourceId('Microsoft.ContainerRegistry/registries',variables('acrName'))).loginServer]",
            "username": "[listCredentials(resourceId('Microsoft.ContainerRegistry/registries',variables('acrName')),'2017-03-01').username]",
            "password": "[listCredentials(resourceId('Microsoft.ContainerRegistry/registries',variables('acrName')),'2017-03-01').passwords[0].value]"
        "osType": "Windows", 
        "restartPolicy": "Always",
        "containers": [{
            "name": "[concat('agent-', copyIndex())]",
            "properties": {
                "image": "[parameters('containerImageName')]",
                "environmentVariables": [
                        "name": "AZDO_URL",
                        "value": "[parameters('azdoUrl')]"
                        "name": "AZDO_TOKEN",
                        "secureValue": "[parameters('azdoToken')]"
                        "name": "AZDO_POOL",
                        "value": "[parameters('azdoPool')]"
                        "name": "AZDO_AGENT_NAME",
                        "value": "[concat('agent-', copyIndex())]"
                "resources": {
                    "requests": {
                        "cpu": "[parameters('numberOfCpuCores')]",
                        "memoryInGB": "[parameters('numberOfMemoryGigabytes')]"

At the start of the resource definition, we specify that we want multiple copies of this resource, not just one. The actual number of resources is specified using a template parameter. This construct allows us to specify the number of agents that we want to run in parallel in ACI instances every time this template is redeployed. If we combine this with a complete deployment mode, the result will be that agents in excess of that number get removed automatically as well. When providing the name of the ACI instance, we are concatenating the name with the outcome of the function copyIndex(). This function will return a integer, specifying in which iteration of the copy loop the template currently is. This way unique names for all the resources are being generated.

As with most modern resources, the container instance takes the default resource properties in the root object and contains another object called properties that contains all the resource specific configuration. Here we first have to specify the imageRegistryCredentials. These are the details needed to connect ACI to the ACR, for pulling the images. I am using ARM template syntax to automatically fetch and insert the values, without ever looking at them.

The next interesting property is the RestartPolicy. The value of Always instructs ACI to automatically restart my image whenever it completes running, no matter if that is from an error or successfully. This way, whenever the agent has run a single job and the container completes, it gets restarted within seconds.

In the second properties object, a reference to the container image, the environment variables and the container resources are specified. The values here should be self-explanatory, and of course the complete ARM template with all parameters and other plumbing is available on the GitHub repository.

With the ARM template done and ready to go, let’s create another pipeline – now using the following YAML.

      - master
      - aci
  name: Azure Pipelines
  vmImage: windows-2019

    clean: all

  - task: AzureResourceGroupDeployment@2
    displayName: 'Create or update ACI instances'
      azureSubscription: 'RG-BuildAgents'
      resourceGroupName: 'BuildAgents'
      location: 'West Europe'
      csmFile: 'aci/aci.json'
      csmParametersFile: 'aci/aci.henry.json'
      deploymentMode: Complete
      overrideParameters: '-azdoToken "$(AZDO_TOKEN)"'

Nothing new in here really. This pipeline just executes the ARM template against another Azure resource group. Running this pipeline succeeds and a few minutes later, I have the following ACI instances.

And that means I have following agents in my Pool.

All-in-all: It works! Here you can see that when scaling the number of agents up or down, the names of the agents stay the same and that I will get a number of offline agents that is at most the number of agents to which I have scaled up at a certain point in time.

And with this, I have created a flexible, easy-to-scale, setup for running Azure Pipelines jobs in isolation, in containers!


Of course, this solution is not perfect and a number of downsides remain. I have found the following:

  1. More work
  2. No caching

More work. The largest drawback that I have found with this approach is that it is more work to set up. Creating a virtual machine that hosts one or more private agents can be done in a few hours, while the above has taken me well over a day to figure out (granted, I was new to containers).

No caching. So slower again. With every job running in isolation, many of the caching benefits that come with using a private agent in a virtual machine are gone again. Now these can be overcome by building a more elaborate image, including more of the different SDK’s and tools by default – but still, complete sources have to be downloaded every time and there will be no intermediate results being cached for free.

If you find more downsides, have identified a flaw, or like this approach, please do let me know!

But still, this approach works for me and I will continue to use it for a while. Also, I have learned a thing or two about working with containers, yay!

This year there is an Azure advent calendar, organized by Gregor Suttie and Richard Hooper, both fellow Microsoft MVPs. In total 75 sessions will be published, all covering an interesting Azure topic. Of course I jumped on the opportunity to get aboard, and fortunatly I was just in time to be added to the list. As always, I am happy to share with the community and delighted that I am allowed to do so. In this case, my contribution is on the topic of Logic Apps and can be found on the calendars Youtube Channel.

When I started preparing for my session and looked over the calendar again, I found that there were two sessions on Logic Apps on the calendar planned. Next to my own, there was also a session by Simon Waight. After a bit of coordination and a few e-mails back and forth, we divided up the topics. In the recording by Simon you will find:

  • An introduction to Logic Apps, connectors and actions
  • An introduction to pay per use environments and Integration Service Environments
  • Some best practices
  • A step by step guide on building your first Logic App

In my session I will expand upon this with the following subjects:

  • The introduction of a real world use case where I am working with Logic Apps
  • Infrastructure as Code for Logic Apps, so you can practice continuous deployment for Logic Apps
  • A short introduction on building your own Logic Apps.

I hope you enjoy the recording, which will go live somewhere today on the calendars Youtube Channel.

Merry Christmas!

This week I had the privilege of attending the Update Conference in Prague.

It was a great opportunity to visit this beautiful city and to attend talks give by some very bright people from all around the world. I had a great time and am happy I was invited and would be delighted to return next year.

As requested by some of the attendees, I am sharing my slides for both presentations here.

  1. Infrastructure as Code: Azure Resource Manager – inside out
  2. Secure deployments: Keeping your Application Secrets Private

Thank you for attending and I hope you enjoyed it just as much as I did!

When I visited my own website a few days ago, I was shocked at how long it has been since I posted here. The reason for this is that I have been working on two large projects, next to my regular work, for several months now. A few weeks ago the first one came to completion and I am really proud to say that my training Introduction to Azure DevOps for A Cloud Guru has been launched.

I think this has been the most ambitious side project I have started and completed for a while. And while I really enjoyed the work on this project, I also found it challenging. A great deal of things I had to learn on the fly and often I found I had to spend a tremendous amount of time on something that was seemingly easy. For this reason, I thought it wise to recap what I have learned. And by sharing it, I hope others might benefit from my experiences as well.

1. Yes, it is work!

I think I have spent roughly one day per week for five months or so on this online course. My estimation is that this will equate to roughly between 100 and 150 hours of work for 3 hours of video material. I have no idea how this compares to other authors of online training material and maybe this is were I find out I completely suck. Also, my guesstimate may be completely wrong. Maybe due to the time spent thinking about a video while going for a run or by taking a break and loosing track of time. The point however is, that this is a lot of time that you have to invest. This ratio of work hours to video hours shows that after a day of hard work, you have completed maybe twenty minutes of video or even less.

For me it is clear that you cannot do this type of work with a twenty minutes here and a half an hour there attitude. You will have to make serious room in your calendar and embrace the fact that you will spend at least one day a week on the project. If you cannot do that, progress will be too slow and your project might slowly fade into oblivion.

2. Find something small to start with

When I first came into contact with A Cloud Guru, I was strongly advised not to start working on a large project right away but start with a smaller project first. For me this meant the opportunity to create a more use-case focused, ACG Project style video. This is a 25-minute video that takes the viewer along in creating a cool project or showing a specific capability in a single video. This allowed me to practice and improve a lot of the skills that I would need when creating a complete course.

The advantage of creating and completing something smaller first, compared to splitting a larger project into parts, is that you are getting the full experience. You will have to write a project proposal, align your style of working with that of your editor, write a script, record, get feedback, start over, create a final recording, get more feedback and finally go through the steps of completing a project: adding a description, link to related resources, putting the sources online, etc.

Having this complete experience for a smaller project before moving on to a larger project really helped me.

3. Quality, quality, quality and … some more quality

Creating courses or videos for a commercial platform is paid work. This means that there is a clear expectation that the work you perform will be of a certain quality. For example, I got a RØDE Podcaster delivered to my home to help with sound quality. However, just using a high-quality microphone was not enough. Every audio sample I have edited in both Audacity and Camtasia to ensure that the audio did not contain any background sound or hum that would distract students. Also, all the videos were recorded on a large screen, configured to use a resolution of only 1080p. Every single time, all excess windows need to be closed or a re-record was necessary. Every time, excess browser tabs needed to be closed or a re-record was necessary. Browser windows and terminals needed the correct level of zoom, or a re-record was necessary. Favorites removed, browser cache cleared to ensure there are no pop-ups of previous entries, etc etc.

Conclusion? Producing high quality content entails a tremendous amount of details that you have to keep in mind every single time. Be prepared for that!

4. Be ready for (constructive) criticism

I feel really lucky in how my interactions with my editors at ACG went. Being on opposite sides of the world, most of the communication was offline through documents or spreadsheets and yet somehow, they managed to make all feedback feel friendly and constructive. In the occasional video call there was always time for a few minutes of pleasantries, before we got down to business. But yes, there were things to talk about. Feedback and criticism were frequent and very strict. I have re-edited some videos five or six times to meet the standards that I was supposed to meet. Especially on the quality of audio and video, there was a clear expectation of quality and that was rigorously verified by people who definitely had more experience than me.

All in all, I can say I have learned a lot recording these videos. But if you want to do this, be prepared and open to feedback.

5. Slides are cheap, demos are hard!

This is really a topic on its own and I think I will write more about it later. But if there is one thing I have learned, it is that demos are hard. Doing a demo on stage is hard, but much more forgiving than doing one in a video. When presenting, no-one minds if your mouse cursor is floating a bit around, searching for that button. When doing a live demo, it is cool to see someone debug a typo in a command on the fly. When presenting, you can make a minor mistake, correct it and then explain what went wrong and how to handle that. The required level of perfection is not as high as for a recorded demo. And then there is the sound! I found it impossible to record the video and audio for my demos in one go and have developed my own approach to it, which I will write about some other time.

In my experience, if you must record five minutes of demo, it might take four to five times as long as recording a five-minute slide presentation.

6. A race against time

While recording your project, your subject might be changing. For example, in the time I was creating my training on Azure DevOps multi-stage YAML builds were introduced, the user interface for Test Plans was changed and several smaller features that I showed in my demos were removed, renamed or moved to another location. Honestly, there are parts that I have recorded multiple times, due to the changes in Azure DevOps. Want more honesty? By the time the course went public, it was still outdated at some points. And yes, I know that I will have to update my course to include multi-stage YAML when it goes out of preview.

The point is, you will have to invest enough time every week in your project to ensure that your work in creating the course is not being overtaken by changes from the vendor. Software development and cloud in particular, is changing at such a rate that you will have to plan for incoming changes and know how to adapt. Also, circling back, taking a ‘yes this is work’ attitude will help spending enough time on your project, shortening its duration and help decrease the chance of being overtaken by changes.


If you ever go down the path of creating an online training, I would recommend to keep the above in mind. Along with one final tip: Make sure you enjoy doing it. One thing that I do know for sure now is that if I was not enjoying my work on this training and I had to do it next to my other work, I would never have finished it.

Oh by the way, more details on that second large project? That will have to wait a few more months I’m afraid.