Multiple Placement Group Value of NodeType does not match with the value of VMSS

If you’re getting the following error:

Multiple Placement Group Value of NodeType does not match with the value of VMSS

while attempting to deploy a VMSS for a Service Fabric cluster to enable multi-AZ (availability zones) in a region that doesn’t support that (for instance, West Central US; here’s a list of regions that do) then here’s the change you need to make in your template:

{
  "parameters": {
    "azCount": { // usually is either 0 or 3
      "type": "int"
    }
  },
  "variables": {
    "azVar": { // produces [ "1", "2", "3" ]
      "copy": [
        {
          "name": "azCopy",
          "count": "[parameters('azCount')]",
          "input": "[string(copyIndex('azCopy', 1))]"
        }
      ]
    },
    "azEnabled": "[greater(length(variables('azVar').azCopy), 0)]"
  },
  "resources": [
    {
      "name": "myNode",
      "type": "Microsoft.Compute/virtualMachineScaleSets",
      "apiVersion": "2021-07-01",
      "location": "westcentralus",
      "zones": "[if(variables('azEnabled'), variables('azVar').azCopy, json('null'))]",
      "properties": {
        "singlePlacementGroup": "[if(variables('azEnabled'), variables('azEnabled'), json('null'))]",
        "zoneBalance": "[if(variables('azEnabled'), variables('azEnabled'), json('null'))]",
    }
  ]
}

Means that neither of the properties can be set to false spite the ARM schema for the latest (as of the time of writing) API version mentions only for zoneBalance and not for singlePlacementGroup.

Unfortunately neither of these docs mention that, yet:

Happy deployment, folks!

Posted in Infrastructure | Tagged , , , | Leave a comment

Reliable and scalable infrastructure: Secrets

This is a series of posts:

In the previous post we’ve discussed probably the most important aspect of running a service – the handling of live traffic. Without it it’s a not a service but a bunch of robots wasting your time and money.

Now let’s discuss the next most important aspect – the secrets. Perhaps, a service can run without any. But only if it’s a static website. But even a static website need an SSL certificate, so… Any real-world application needs to write its data somewhere, e.g. to a database, or read another application’s data, e.g. from a web service. So it needs to authorize, so it needs a secret (whether it’s a password or a certificate), so it needs to keep it somewhere and access it from there somehow.

As mentioned above, the main dimension for secrets is the type (or kind): passwords and certificates. One even can combine them into one by requiring a passphrase to access the private key of a certificate.

Going forward we’ll be discussing certificates, they’re the primary authorization mechanism employed by modern web applications in the cloud.

What rather really matters is the difference in the mechanism to store and access one or another. For example, there is the whole subsystem called KPI for certificates while there is basically nothing built-in for plain text passwords, such as those used to access AAD applications.

Another important dimension for secrets to discuss is the regionality: whether a secret is unique within each individual region where your service is hosted, or it’s shared by a group of regions (let’s say North America, Europe, Asia), or by all regions (what effectively makes it global).

Similarly to the least privilege access principle, the idea is to scope a secret down as much as possible, as it’s technically feasible to a single region. Ideally, all secrets are regional as long as they can be. For example, SSL certificates. The opposite would be a certificate used to encrypt JWT (aka JWE). Since the encryption in this case is symmetric, the same certificate must be used to decrypt the payload. Thus making such certificate a global secret.

There is no single obvious reason to prefer one strategy over another, each has its pros and cons. Such as:

  • The growing number of secrets increases the overall cost of maintenance.
  • You’ll need a secrets inventory, which then must be kept up to date. Otherwise it defeats the purpose of having one.
  • More certificates will expire more often, so you’ll need to keep eye on every one of them.
  • On other hand, a breach in one region would not automatically mean a breach in another, or what’d the worst – in all. Means your whole production environment can (or cannot) be taken over.
    • If this ever happens, you want to be able to shut the attacked region down, fail over the traffic, and handle it without affecting the customers.

Too many certificates that expire too often is a hell of thumbprints to update in the configuration. Right? Wrong. If so, what should one do instead? Instead one should switch to the validation by subject name and issuer (or SNI, for short).

In this case the service (or the underlying compute) trusts the root certificate and subsequently – all certificates issued under the umbrella of this trusted root. As the result, it doesn’t matter how often a certificate expires and what’s its thumbprint, the service continues to use and trust it regardless.

One of important nuances thought is the recommendation to renew (aka roll) the certificate in advance, earlier than it would expire. This way you give yourself enough time to handle any errors that might occur during the renewal before the certificate expires and causes an outage.

Last but not least aspect to discuss is the separation of secrets delivery from secrets consumption. It’s less methodological, more technological and practical, and still provides important advantages. How exactly? A na├»ve implementation of consuming a secret involves fetching it first. But is it really necessary, can we do better? Yes, we can.

In order to follow the Single Responsibility Principle (SRP, for short) and encapsulate each function, we can split them into two:

  1. Fetch a secret (in this case, a certificate) from a remote location, such as Azure Key Vault, and install it into a local store. For the code that does that, it doesn’t matter how and when the secret will be consumed, its role ends here.
  2. Read the secret from a local store, For the code that does that, it doesn’t matter how and when the secret was fetched, its role starts here.

Practically speaking, it means that these two operations can be performed not just on two different timelines but by two different applications, written by diffent peole, using diffent platform and/or programming languages. Basically, this is the micro-services architecture applied to the secrets.

P.S. I’d like to thank and acknowledge Andrey Fedyashov, my fellow colleague at Microsoft and friend, who shamed me (I mean encouraged) into finishing this series.

Posted in Infrastructure | Tagged , , , , , | Leave a comment

How to get Tenant ID from Subscription ID in Azure using MSAL

This is a series of blog posts:

First you need to install AAD client NuGet package. Note this is MSAL, the modern and recommended way to communicate with AAD.

<PackageReference Include="Microsoft.Identity.Client" Version="4.36.2" />

Then use one of its helper methods:

using Microsoft.Identity.Client;
using Microsoft.Identity.Client.Instance;

var hostName = "management.azure.com";
var apiVersion = "2020-08-01";
var requetUrl = $"https://{hostName}/subscriptions/{subscription}?api-version={apiVersion}";
var httpClient = new HttpClient();
var response = await httpClient.GetAsync(requetUrl, cancellationToken);

var authenticationParameters = WwwAuthenticateParameters.CreateFromResponseHeaders(response.Headers);

var authorizationHeaderRegex = new Regex(@"https://.+/(.+)/?", RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
var match = authorizationHeaderRegex.Match(authenticationParameters.Authority);
var tenantString = match.Success ? match.Groups[1].Value : null;

if (!Guid.TryParse(tenantString, out var tenantId))
{
    throw new InvalidOperationException($"Received tenant id '{tenantString}' is not valid guid");
}

Console.WriteLine(tenantId);

It’s not async and makes you to write less code. You still need to parse the tenant id out of the authorization uri, though.

You can find the code here: https://dotnetfiddle.net/Wyh9vs.


However after I contributed to the library, starting version 4.37.0, parsing using Regex is not needed anymore:

using Microsoft.Identity.Client;

var hostName = "management.azure.com";
var apiVersion = "2020-08-01";
var requetUrl = $"https://{hostName}/subscriptions/{subscription}?api-version={apiVersion}";
var httpClient = new HttpClient();
var response = await httpClient.GetAsync(requetUrl, cancellationToken);

var authenticationParameters = WwwAuthenticateParameters.CreateFromResponseHeaders(response.Headers);
var tenantId = authenticationParameters.GetTenantId();

Console.WriteLine(tenantId);

You can find the updated, shorter code here: https://dotnetfiddle.net/EYkWAg.

Posted in Programming | Tagged , , | Leave a comment

How to get Tenant ID from Subscription ID in Azure using ADAL

This is a series of blog posts:

In previous part we did it this using a script, this time we’ll do it using C#.

First you need to install AAD client NuGet package. Note this is ADAL, it’s now legacy and put into the maintenance mode.

<PackageReference Include="Microsoft.IdentityModel.Clients.ActiveDirectory" Version="5.2.9" />

Then use one of its helper methods:

using Microsoft.IdentityModel.Clients.ActiveDirectory;

var hostName = "management.azure.com";
var apiVersion = "2020-08-01";
var requetUrl = $"https://{hostName}/subscriptions/{subscription}?api-version={apiVersion}";
var httpClient = new HttpClient();
var response = await httpClient.GetAsync(requetUrl, cancellationToken);

var authenticationParameters = await AuthenticationParameters.CreateFromUnauthorizedResponseAsync(response);

var authorizationHeaderRegex = new Regex(@"https://.+/(.+)/?", RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
var match = authorizationHeaderRegex.Match(authenticationParameters.Authority);
var tenantString = match.Success ? match.Groups[1].Value : null;

if (!Guid.TryParse(tenantString, out var tenantId))
{
    throw new InvalidOperationException($"Received tenant id '{tenantString}' is not valid guid");
}

Console.WriteLine(tenantId);

You can find the code here: https://dotnetfiddle.net/M7paDG.

One of the drawbacks is that the helper method is async without a real need to be: underneath it calls another async helper which reads the content of the response but then it doesn’t use the content.

So you can write little more code yourself without the penalty of making it async:

using Microsoft.IdentityModel.Clients.ActiveDirectory;

var hostName = "management.azure.com";
var apiVersion = "2020-08-01";
var requetUrl = $"https://{hostName}/subscriptions/{subscription}?api-version={apiVersion}";
var httpClient = new HttpClient();
var response = await httpClient.GetAsync(requetUrl, cancellationToken);
		
var authenticationParameters = AuthenticationParameters.CreateFromResponseAuthenticateHeader(response.Headers.WwwAuthenticate.ToString());

var authorizationHeaderRegex = new Regex(@"https://.+/(.+)/?", RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
var match = authorizationHeaderRegex.Match(authenticationParameters.Authority);
var tenantString = match.Success ? match.Groups[1].Value : null;

if (!Guid.TryParse(tenantString, out var tenantId))
{
	throw new InvalidOperationException($"Received tenant id '{tenantString}' is not valid guid");
}

Console.WriteLine(tenantId);

You can find the code here: https://dotnetfiddle.net/kagSAK.

Posted in Programming | Tagged , , | Leave a comment

How to get Tenant ID from Subscription ID in Azure using PowerShell

This is a series of blog posts:

  • Part 1: using PowerShell
  • Part 2: using ADAL
  • Part 3: using MSAL

In order to do this, you’ll need:

  1. Call this Azure Resource Manager API without authentication, I suggest always use the latest stable API version
  2. Inspect the WWW-Authenticate header
  3. Parse the tenant id out of the authorization uri

Here’s a sample header value:

Bearer authorization_uri=”https://login.windows.net/e0a3d130-92db-4546-9813-45dd621f8379&#8243;, error=”invalid_token”, error_description=”The authentication failed because of missing ‘Authorization’ header.”

Here’s how to extract Tenant ID using PowerShell for Windows:

$hostName = 'management.azure.com'
$apiVersion = '2020-08-01'
$url = "https://$hostName/subscriptions/$subscription/?api-version=$apiVersion"
$response = try { Invoke-RestMethod -Method GET $url } catch [System.Net.WebException] { $_.Exception.Response }
$header = $response.Headers['WWW-Authenticate']
$match = $header | Select-String -Pattern 'Bearer authorization_uri="https://.+/(.+?)"'
$tenantId = $match.Matches[0].Groups[1].Value
$tenantId 

And using PowerShell Core, note a different exception type being caught:

$hostName = 'management.azure.com'
$apiVersion = '2020-08-01'
$url = "https://$hostName/subscriptions/$subscription/?api-version=$apiVersion"
$response = try { Invoke-RestMethod -Method GET $url } catch [System.Net.Http.HttpRequestException] { $_.Exception.Response }
$header = $response.Headers.WwwAuthenticate
$match = $header | Select-String -Pattern 'Bearer authorization_uri="https://.+/(.+?)"'
$tenantId = $match.Matches[0].Groups[1].Value
$tenantId 

When called with c3c0a359-4420-4f84-8925-f642e2717296 will output e0a3d130-92db-4546-9813-45dd621f8379.

That’s it, folks!

Posted in Programming | Tagged , , , | Leave a comment

Carnation Anapa Winery, vol 3, day 153: corking

Today I’m bottling my wine. I got a 6-gallom carboy that went down to about 5 during the initial testing.

In the first batch I bottled 10 bottles. Each contains about 15g of water where I diluted about 0.8g of potassium metabisulfite total. In the second – 9 more.

Posted in Winemaking | Tagged | Leave a comment

Following circular nested profile path identified

If you’re getting the following error:

Circular nested profile definitions are not allowed. Following circular nested profile path identified: example.trafficmanager.net -> example.trafficmanager.net.

Then very likely you got an ARM template like this:

{
  "type": "Microsoft.Network/trafficManagerProfiles/nestedEndpoints",
  "apiVersion": "[variables('tmApiVersion')]",
  "name": "[concat(variables('tmName'), '/', parameters('location'))]",
  "properties": {
    "endpointStatus": "Enabled",
    "targetResourceId": "[resourceId('Microsoft.Network/trafficManagerProfiles', variables('tmName'))]",
    "weight": 1,
    "minChildEndpoints": 1,
    "geoMapping": [
      "GEO-NA"
    ]
  }
}

What means you created a Geographic traffic-routing based Traffic Manager that references itself. Hence the error.

Posted in Infrastructure | Tagged , | Leave a comment

How to get secret from Key Vault using PowerShell and Managed Identity

First you need to acquire a token using Managed Identity by calling the local IMDS endpoint:

$audience = 'https://vault.azure.net'
$token = Invoke-RestMethod -Method GET -Uri "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=$audience" -Headers @{ 'Metadata' = 'true' }

Note that audience must match the service you’re calling and is different from example calling ARM.

Then call Key Vault REST API to get the secret:

$secret = "https://$vaultName.vault.azure.net/secrets/$secretName/?api-version=7.0"
$auth = "$($token.token_type) $($token.access_token)"
Invoke-RestMethod -Method GET -Uri $secret -Headers @{ 'Authorization' = $auth }

That’s it, folks!

Posted in Programming | Tagged , , | Leave a comment

Reliable and scalable infrastructure: Traffic

This is a series of posts:

  1. Introduction
  2. Principles
  3. Layers
  4. Traffic (this post)
  5. Secrets

Now you have multiple environments, each consisting of multiple data centers, each consisting of multiple scale units. How do you wire up them all together to be well prepared for a disaster?

There are various kinds of services (stateless and stateful) so are the patterns of traffic (inbound and outbound) they serve. I’m lucky enough to work with mostly stateless services that serve inbound traffic. That it, there is no state per se and the data to be processed is the HTTP requests coming from the users over the internet. Namely, an ARM resource provider (RP) for the Azure Device Update (ADU). Thus below I’ll explain how to use Azure Traffic Manager (ATM or colloquially just TM) to route traffic to this kind of services. Other kinds might require a different model.

The model I’m proposing here is rooted in two aspects described earlier:

  • Each data center had multiple scale units
  • Each data center has its failover pair

First, the reliability of a data center. TM works just fine and routes traffic to a single scale unit in a data center, meanwhile being ready for the second one to be stood up and added to the rotation.. Thanks to the probes that run periodically (what is easily configurable), check each endpoint and mark it active (or not). The priority mode suites this option the best as the first endpoint would have priority 10 and the second would have 20. The numbers are arbitrary but you got the idea. The endpoints with higher priority kick in only when the those with lower are down.

Normally, if you have just one cluster up and running, the second endpoint will be always inactive and traffic will be always served by the cluster behind the first endpoint. In case of emergency, if you have to delete that cluster, you create another one. Its DNS/IP are known in advance and already preconfigured on the TM profile. This way you won’t need to do anything and it’ll start serving traffic immediately.

Another option is to have two endpoints and two clusters always up, running and serving traffic. It’s needed when there are any technical limitations or other considerations why one cluster is not enough. In this case the weighted mode with the same weight for both cluster works well.

You’ve secured the reliability of a single region: one cluster goes down, another takes up its place and continues to serve traffic. Now let’s shift the focus and see what happens if not just one cluster but the whole region goes down? This is less likely to happen until a really bad deployment takes place, likely of your own than not.

Azure has grouped regions into called failover pairs. What means that by the contract there won’t be a deployment to both regions simultaneously and at least one will stay healthy. For you that means that you can have another TM profile with two endpoints in the priority mode:

  1. The first endpoint is in Region A, e.g. West US with DNS westus.service.example.com
  2. The second endpoint is in Region B, e.g. Easy US with DNS eastus.service.example.com

If Region A is completely down, what would happen only when all clusters in that region by some reason became unavailable, then only traffic will be routed to Region B. What has its own complications such as increased latency, increased load on what’s now a single region with doubled traffic, what again increases latency. But serving customers slowly is better than not serving them at all.

Posted in Infrastructure | Tagged , , , | Leave a comment

How to assigned permissions for user-assigned managed identity on multiple subscriptions in bulk

First get the subscriptions you want to assign permissions on:

$subs = Get-AzSubscription |? { $_.Name.Contains("NorthAmerica") }

Then get the client id of the identity you to assign permissions for:

$id = Get-AzUserAssignedIdentity -ResourceGroupName my-shared-prod-westus2 `
                                 -Name my-shared-prod-westus2-id

Now perform the actual permissions assignment:

$subs |% { New-AzRoleAssignment -Scope "/subscriptions/$($_.Id)" `
                                -RoleDefinitionName "Contributor" `
                                -ApplicationId $id.ClientId }

That’s it, folks!

Posted in Programming | Tagged , | Leave a comment