Building secure applications using modern authentication (part 3)

This post is a part of a series.

Whether you are building something as complex as a SCIM connector or a mobile application, or just a simple SPA application, chances are you want to share your application with customers or other partners. If so, there are a few security features you should be aware of.

Publisher Verification

In part 2 I discussed the concept of consent and I briefly mentioned that organizations can control how and who can consent to the various permissions by updating the tenant settings. Well, those settings can be found here:

With the setting above organizations can prevent the users from being able to consent to share their data, and they can also configure a workflow that allows the users to request the consent from their administrators.

With the settings below organizations can allow end-users to consent to specific privileges they deem to be low impact only from verified publishers.

Microsoft recommends restricting user consent to allow users to consent only for app from verified publishers, and only for permissions you select.” This is due to known risks associated with the abuse of application permissions that have either been forgotten by the organizations or simply not secured enough.

But what are verified publishers? Applications that are associated with a verified publisher provide the end-users with a trust factor. This is because becoming a verified publisher means the partner has a valid MPN (Microsoft Partner Network) account that has been verified to be a legitimate business. Once the process is completed any consent forms presented to users for applications associated with that MPN will show the verified publisher blue badge.

And as you can see above, it can also expedite the consent process if the tenants have the recommended settings that allow end-users to consent only for applications from a verified publisher.

AppGallery

App Gallery is a “catalog of thousands of apps that make it easy to deploy and configure single sign-on (SSO) and automated user provisioning.” The nice thing about AppGallery is that you can make your app available to all your customers in a secure way via Enterprise Apps. Although offering SSO for your application is highly recommended for security reasons, you don’t necessarily need to have both SSO and user provisioning enabled, it could be one or the other, so maybe you can start with some features and then later add others.

Keep in mind there is a process review associated with publishing your app via AppGallery, you can find the checklist and the steps here.

In part 4 of the series I’ll cover a few additional security recommendations for your custom applications.

Building secure applications using modern authentication (part 2)

This post is a part of a series.

Azure Active Directory, which is the Identity Provider (IdP) or OpenID Provider (OP) behind Azure and Office 365, supports OpenID Connect (OIDC) and OAuth 2.0, for authentication and authorization, respectively. It does this via Application Registrations and Service Principals (Enterprise Applications), which in turn are assigned permissions (scopes) for a variety of APIs, including Microsoft Graph API, as well as custom APIs exposed by applications on AAD. 

OAuth and OIDC are supported for both applications and services where AAD is the IdP/OP as well as within the AAD tenant itself and it works in the same manner.  I state this because unfortunately, that’s not the case for all IdPs. Some IdPs support OAuth for registered applications, but not for tenant/organization level access, such as unlocking a user or resetting MFA, etc.

Azure AD offers different types of permissions (scopes) for the various flows, as described on part 1 of this series. As we talk about permissions, please keep in mind that organizations can determine how and who can consent to the various permissions by updating the tenant settings. More about this in part 3 of the series.

There are different types of permissions mostly because they are meant for different types of applications and processes. Here is a quick summary of the types of permissions, their intended use, the consent required, and the effective permissions.

Effective Permissions
Interactive Application (signed-in user)

For interactive applications where a signed-in user is logged in, the applications will get access on behalf of users, which is the case for mobile, web, or SPAs (Single Page Applications). These interactive applications should be using delegated permissions since they act as the signed-in user when making calls to the API.

By default users can consent to delegated permissions, however admins will have to consent to some higher privileged permissions and when the permissions are required for all users. Consent is usually requested automatically when the user initially accesses an application that requires permissions that are protected by OAuth, or if the application specifically requests consent. It can also happen if the permissions have changed, if the user or an admin revoked the consent, or if the application is using incremental consent to ask for some permissions now and more later as needed, maybe for optional features. Incremental consent is a great way to abide by the principle of least privilege.

The effective permissions of interactive applications are essentially the intersection of the delegated permissions assigned to the application and the permissions the user has been granted within the system, which essentially prevents elevation of privilege.

Background Service or Daemon Process

For background services or daemon processes, the applications can only log in as themselves or a Service Principal (SPN).  Only administrators can consent to application permissions, since there is no associated user. These applications will require application permissions since there is no signed-in user and they make calls to the API as the SPN with the associated credentials, which can be a secret or a certificate.  These credentials should be stored in a password vault, such as Azure Key Vault. The great thing about using AKV is that you can use a managed identity to access the vault where the secret is kept, only when needed. Internally, managed identities are service principals that can be locked to only be used with specific Azure resources. Additionally, there are no credentials in the code, Azure takes care of rolling the credentials that are used. When the managed identity is deleted, the corresponding service principal is automatically removed. Permissions for MSIs are assigned via PowerShell (New-AzureADServiceAppRoleAssignment) or CLI, not the portal.

The effective permissions of these applications are the full application permissions that were granted and consented to for this application, since there is no associated user signed-in.

NOTE: Granting application permissions to interactive applications can significantly increase the risk associated due to the possibility of inadvertently elevating privileges for a signed-in user that can circumvent any permission guardrails directly associated with the user. For example, the Mail.Read permission, when assigned as a delegated permission “Allows the app to read the signed-in user’s mailbox“, but when assigned as an application permission, it “Allows the app to read mail in all mailboxes without a signed-in user“.

The permissions referenced above are assigned via the Application Registration menu, within the API permissions blade:

Even Exchange?

And in case you are wondering, yes, even accessing mail endpoints can be accomplished using OAuth, for additional details, please reference the links below:

MSAL

One of the best benefits AAD offers is MSAL. The Microsoft Authentication Library (MSAL) is a set of libraries that authenticate and authorize users and applications. They are OAuth 2.0 and OIDC connect libraries that are built to handle protocol level details for developers. They stay up to date with the latest security updates and cache and refresh tokens automatically so developers don’t have to worry about the token expiration within custom applications. Basically, it provides developers a safe head start with OAuth 2.0 and OIDC for custom applications.

MSAL supports CAE (Continuous Access Evaluation), which is a new feature that allows tokens to be revoked as needed, based on specific risks, events (i.e. user is disabled), or policy updates (new location), etc. This feature allows tokens to have a longer life because they can be revoked when there is an action that dictates the access must be removed. MSAL supports this feature and will proactively refresh the tokens as needed. So, not only is your application safer, but it’s also more efficient.

MSAL also supports PIM and Conditional Access, including authentication context, which allows you to protect specific sensitive resources within your custom application. For an example of how Conditional Access works, please reference my previous blogs (Restrict downloads for sensitive (confidential) documents to only compliant devices and Passwordless Azure VM SSH login using FIDO2 security keys).

In part 3 of the series I’ll cover the App Gallery and the concept of publisher verification.

Building secure applications using modern authentication (part 1)

TL;DR – You don’t need to disable MFA for users in the name of “automation”. Basic authentication is considered legacy authentication because there are safer options available. Keep reading to learn about OAuth, OIDC, modern authentication and how to use the valet key to create secure applications.

As scary as it sounds, I have worked with too many third party tools (even security tools!) that rely on basic authentication to integrate services. Any IdP that offers only an API key that never expires as an option to authenticate apps used for automation is not offering you the safest option available for authentication.

With the upcoming deprecation of basic auth for Exchange, I figured this is a good time to talk about modern authentication, why it is a safer option, and how Microsoft makes it easier to implement. That’s the topic of the posts in this series:

Let’s start with a quick summary of the basics…

What is OAuth and OIDC?

OAuth is an authorization (authz) framework that was developed to allow application clients to be delegated specific access to services or resources. 

The client gets access via an access token with a specific lifetime that is granted with specific permissions, referred to as scopes, which are included in the claims (contents) within the access token. These access tokens are granted by an authorization server based on the approval of the owner of that resource or service. 

Open ID Connect, or OIDC, is the authentication (authn) profile that is built on top of OAuth to authenticate and obtain information about the end user, which is stored in an id_token once the user authenticates.

Is OIDC the same as SAML? No, it is not the same, but it is similar because it is used for federated authentication. If you are familiar with SAML, the Identity Provider (IdP) would be referred to as the OpenID Provider (OP) for OIDC and the Service Provider (SP) would be referred to as the Relying Party (RP) for OIDC.  And what you normally see within the assertion in SAML, you will see in the id_token in OIDC and the SAML attributes are the user claims in OIDC. SAML is mostly used for websites, whereas OIDC is mostly used for APIs, machine-to-machine, and mobile applications, so far.

In summary:

  • OAuth is for authorization through an access token.
  • OIDC is for authentication through an id token.

Both id_tokens and access tokens are represented as JSON Web Tokens (JWTs). Also, please keep in mind there is another type of token, a refresh token, that allows the clients to get a new access token after expiration. 

Note: When referring to OAuth on this page we are specifically referring to OAuth 2.0 and above. 

Why OAuth?

One word, granularity.

The typical analogy used to describe OAuth is the valet parking process, where the valet key is the OAuth token because it can’t do everything that the regular key can do, only what you need it to do for the valet person to park the car. The valet key cannot open the trunk or the glove box, two places where you may have valuable objects that you don’t want the valet person to have access to.

Those scopes (permissions) that we mentioned above are included in the claims within the token.  Therefore, the actions allowed using that token can only be within the range that is specifically noted in the scope.  For example, the JWT token below, can only perform actions that are permitted for these scopes: “Group.Read.All”, “User.Read.All”, and/or “AccessReview.Read.All”.  Additionally, these access tokens have a predetermined lifetime, as you can see from the expiration time (‘exp’) claim below. Consequently, the permissions on that token are only valid for the amount of time stated.

I used jwt.ms below to decode the token, so you can see the claims included.

Some of the claims in this token are explained in detail below. To simplify the discussion I am just focusing on the most relevant claims in the description below. The areas highlighted in red are (1) the expiration and (2) the specific permissions associated with that access token.

This makes it much easier to deliver effective applications while abiding by the principle of least privilege.

Different flows for different requirements
Requirements

Client Types – OAuth defines two client types based on whether they can keep a secret or not. 

  • Confidential clients can keep a secret because they have a safe process to store secrets. For example, machine-to-machine, web application with a secure backend, etc.
  • Public clients cannot keep a secret. For example, mobile applications, SPAs, etc.

The type of client used for the connection combined with other factors determines the OAuth flow that is recommended for the solution, as explained below.

Flows

Legacy: Authorization Code – This flow is similar to PKCE explained below, but it does not include the code verifier and challenges noted below. This flow should only be used for confidential clients because it is susceptible to authorization code injection. OAuth 2.1 will require PKCE for all OAuth clients using the authorization code flow.

Authorization Code with PKCE (Proof Key for Code Exchange)- PKCE (pronounced ‘pixie’) is an improvement of the legacy Authorization Code grant type described above because vulnerabilities were discovered.  Specifically, a malicious application could intercept the authorization code and obtain the authorization token. This is the recommended flow for public clients. Originally it was intended for for mobile applications, but now this is the recommended flow for browser apps as well. See below a detailed description of this flow:

Authorization Code with PKCE

Client Credentials – This flow allows the client to get a token without the context of an end user.  In other words, you just need a client id and and a client secret. This flow is recommended only for confidential clients, where there is no end-user, i.e. machine-to-machine. Additionally, secrets and certificates used must be stored in a secure vault, such as Azure Key Vault and the credentials should be rotated. Keep in mind there are safer options within this same OAuth flow for machine-to-machine authentication and authorization within various clouds, i.e. MSIs (managed identities) in Azure AD, IAM roles in AWS, etc., and you should use those where possible. They are safer because the system generates the secrets and rotates the secret automatically, a developer doesn’t even need to know the secret.

Legacy: Implicit Flow – When this flow initially became available there were no other options to implement cross domain requests in a secure manner, however, that is no longer the case. Currently, there are secure ways, such as Authorization Code with PKCE, which is the recommended option. This flow is expected to be removed on OAuth 2.1.

Legacy: Password Grant or Resource Owner Password Flow – It’s just a way for the client to exchange a username and the user’s password for an access token from the authorization server. Given the obvious security risks associated with exposing a user’s credential to a client, the IETF states that “the resource owner password credentials grant MUST NOT be used“.  This flow is expected to be removed on OAuth 2.1.

Note: Some flows above have been marked Legacy because the upcoming OAuth 2.1 release will not support those. 

In part 2 of the series I’ll discuss the various types of scopes or permissions and the wonderful MSAL.

Restrict downloads for sensitive (confidential) documents to only compliant devices

TL;DR – Yes, you can restrict file access within a folder. Keep reading to see how you can restrict downloads or other actions for specific files to only allow certain access from compliant devices.

This specific scenario came up during a session and I wanted to document and share how this is possible. The question was wether you could restrict downloads for specific files, not just at folder level, to only allow the download of that file on compliant devices. Yes, it’s possible!

It’s a real better together story, where the following services work together to deliver:

  • Microsoft 365 Information Protection – for the sensitivity label configuration.
  • Azure Active Directory Conditional Access – for the App Control within the sessions controls to enforce the policy on specific cloud apps and/or specific users. After detecting the signals, AAD forwards the evaluation to Defender for Cloud Apps.
  • Microsoft Defender for Cloud Apps (previously MCAS) – for the conditional access policy that will evaluate the sensitivity labeled document is being downloaded from a compliant device, and if not, it will block it.
  • Microsoft Endpoint Manager / Intune – for the compliance policies that determine if the device is compliant. Intune passes information about device compliance to Azure AD.
  • Microsoft Defender for Endpoint – also helps because my compliance policies require devices to be at or under the specific risk score.
Information Protection Configuration

On this tenant I just have these three sensitivity labels with Highly Confidential – Project – Falcon being the highest level.

Azure Active Directory Conditional Access Configuration

This is the conditional access policy that will trigger the evaluation:

It will trigger for these specific cloud apps:

And it will then pass the baton over to Defender for Cloud Apps (previously MCAS) by selecting to “Use Conditional Access App Control” within Session controls.

Defender for Cloud Apps Configuration

In Defender for Cloud Apps (previously MCAS), I have a Conditional access policy:

And the following settings were used to configure this policy:

  • Session control type: Control file download (with inspection)
  • Filters: Device Tag does not equal Intune Compliant
  • Filters: Sensitivity label equal Highly Confidential – Project – Falcon

Additionally under Actions, I have selected to Block and notify, and I also customized the message the user will see.

Finally, I configured an additional alert to my email, which can be an administrator email, if needed.

I should also note that once you have completed the setup for the conditional access policy, you’ll notice these apps will slowly start showing in Defender for Cloud Apps under “Connected apps“, specifically under “Conditional Access App Control apps“.

Microsoft Endpoint Manager Configuration

Intune manages my endpoints and as you can see I have some compliant and some that are not compliant based on the compliance policies applied to them.

The compliance policies also require the device to be at or under the machine risk score:

I have connected Microsoft Endpoint Management to Defender for Endpoint as shown below:

Defender for Endpoint Configuration

From the Defender for Endpoint side, I have also connected it to Intune to share device information:

Results

The final result is that any end user that tries to download files that are labeled Highly Confidential – Project – Falcon from a non-compliant device will be blocked, while still allowing downloads of other files in the same folder, as you can see this video:

The document that was blocked from downloading was labeled Highly Confidential – Project – Falcon and that is why the user is not allowed to download it to a non-compliant device.

Finally, since I configured an alert email, I also received this alert:

This is just one very specific scenario, there is so much more that is possible to modify and tweak depending on the requirements. However, I hope it gives you an idea of what is possible and hopefully inspires you to create your own scenario.

Guest Access Reviews

TL;DR – A super simple way to review all guests with access to a tenant.

In certain scenarios guests from other tenants have to be invited to the enterprise tenant, i.e. B2B. However, good security practices dictate that guests should be reviewed to ensure users do not keep access when no longer needed. The documentation offers a few great options, including the ability to review all Microsoft 365 groups, which allows admins to manage guest access to those groups. However, I’ve worked with organizations that needed to review generic guest access to the tenant, that might have not been particularly associated with a specific group, and they also needed ability to automatically remove those guests from the tenant.

In this scenario, I just had to create a new dynamic group, which I creatively called “All Guests”, using the dynamic membership rule ‘(user.userType -eq “Guest”)‘.

Then I created an Access Review for the new ‘All Guests’ dynamic group. The key here is to ensure the scope is “Guest users only”, this is because the next option will not be available if that is not selected.

Then under “Upon completion settings” I chose the option to “Block user from signing-in for 30 days, then remove user from the tenant“.

Note: This option is only available when the scope is “Guest users only”.

When this option is selected, any users that are denied during the review processes are updated with ‘Block sign in’ set to ‘Yes’ and that remains in place for 30 days. At the end of the 30 days, any users which administrators have not updated the value of ‘Block sign in‘, i.e. those that still have blocked sign-in, are then removed from the tenant.

Reviewers still follow the standard procedure to provide feedback, by accessing the MyAccess URL and approving or denying the access.

Once the access review period is over or the review is manually stopped, the system automatically applies the changes as shown below:

Since the access review included the options above, at the end of the review period we can then see those denied users have been disabled, with ‘Block sign in’ set to ‘Yes’.

After 30 days those users are then automatically removed from the tenant, unless the value of this setting is manually updated by the administrators.

This is a very easy and efficient way to remove those guest users that no longer require access to a tenant.

Passwordless Azure VM SSH login using FIDO2 security keys (Part 3)

This post is part of a series.

Finally, we can see this passwordless ssh login in action. There are two scenarios depicted below. The first one is what happens when the user tries to login from a compliant device. The second one is what happens when the user tries to login from a noncompliant device. Both scenarios show a passwordless authentication, however one is stopped from connecting to the server due to the noncompliant device.

Note: I am demonstrating this flow from devices which have Azure CLI already installed.

(1) SSH login from a compliant device

The authentication flow in the video above is the following:

  1. Try to ssh to the VM “az ssh vm –ip ##.##.##.##“, which fails because I don’t have a token yet, since I haven’t authenticated.
  2. Type “az login” to authenticate, which triggers the new browser window to open.
  3. I chose my user ‘ChristieC’ and I am prompted for the login. The default authentication method for this user is the FIDO2 security key login.
  4. I am presented with the prompt to enter the PIN to unlock the security key (user verification) and I touch the key (user presence) to allow it to proceed.
  5. Now that I am authenticated, I try to ssh again “az ssh vm –ip ##.##.##.##“, which triggers the new browser window to open to very the Conditional Access policy requirements as shown in the message “Your device is required to be managed to access this resource“.
  6. Once my device is verified to be compliant and I accept the certificate, then I’ve successfully connected to the server.
(2) SSH login from a noncompliant device

But what happens if I try to login with the same user, but from a device that is not compliant?

As you can see above, I can still authenticate using the FIDO2 security key, however I cannot ssh to the server from the noncompliant device due the the Conditional Access policy in place. I can then chose to follow the steps to bring that device into compliance.

In summary, this demo shows a passwordless SSH login using FIDO2 security keys as well as some Conditional Access policies to check device compliance.

Passwordless Azure VM SSH login using FIDO2 security keys (Part 2)

This post is a part of a series.

I’ve chosen the scenario where SSH login is only allowed to the user if they are connecting from a compliant device, so I need a Conditional Access policy to enforce that restriction.

Conditional Access Policy

Under cloud apps, I selected “Azure Linux VM Sign-in” and “Azure Windows VM Sign-in”. The demo will just show Linux, but either one will work.

And then I selected to grant access only when the two conditions selected are met:

VM details

The users have been assigned either “Virtual Machine Administrator Login” or “Virtual Machine User Login” roles.

Additionally, this VM was provisioned to allow SSH using Azure AD credentials.

In part 3 of this series we’ll see the passwordless SSH login in action.

Passwordless Azure VM SSH login using FIDO2 security keys (Part 1)

TL;DR – Passwordless ssh to Azure VMs using FIDO2 security keys.

There are many great articles and documents on passwordless authentication, many of them are linked here. However, I wanted to focus on passwordless SSH login using FIDO2 security keys. And then I figured, why not go a little further and show this in action with some Conditional Access policies to check device compliance for good measure. That’s the topic of the posts in this series:

A super fast FIDO history recap…

The original FIDO standard is now referred to as U2F, which stands for Universal Second Factor, that is the MFA ‘flavor’ of FIDO. The new and improved (and fully passwordless) ‘flavor’ is UAF, which stands for Universal Authentication Framework, that’s FIDO2 and it uses WebAuthn and CTAP2 protocols. The most important detail is that both are based on public key cryptography and both are strongly resistant to phishing. However, FIDO2 UAF is the only passwordless protocol.

How does FIDO2 work?

It’s important to show how FIDO2 UAF works because that’s what makes it phishing resistant. However, I only highlight the most important steps below, but feel free to read more about it at the FIDO Alliance website.

The diagram above depicts the login authentication flow for FIDO2 passwordless (UAF). Prior to this point the user has already registered the security key with the relying party, that’s why the authenticator already has the private key and the relying party has the equivalent public key. The authentication flow steps are as follows:

  1. User requests access to the relying party.
  2. The relying party presents a challenge and its web origin.
  3. The browser derives the relying party id (RP ID) from the web origin value. In the case of Azure AD, the challenge passed is a nonce (‘number once’). This prevents the token replay attacks, because it can only be used once.
  4. The authenticator finds that unique pair for that specific user and that specific relying party and then prompts the user for the verification (pin or biometric). This prevents man-in-the-middle (MitM) attacks because the attacker won’t have the same web origin.
  5. Once the user verifies it’s the correct human, then the authenticator uses that private key to encrypt the challenge (nonce) and sends the authenticator data back to the browser, which sends it back to the relying party.
  6. The relying party then validates that the challenge was signed by the correct private key and that the nonce hasn’t been used yet and then it grants the id_token.

Note: WebAuthn is the protocol used between the browser and the relying party API, and CTAP is the protocol from the authenticator to browser.

Keep in mind the authenticator generates a unique key pair per user per relying party (web origin). So, I can use the same authenticator for two different users for the same relying party, but each has a unique key pair associated. And I can also store key pairs for different relying parties in the same authenticator. Additionally, the private key is only kept in the authenticator and it is not passed through the channels. The same goes for the PIN or biometric used to unlock the authenticator, they are not passed through the channels. This authentication process protects against phishing, man-in-the-middle, and token replay attacks.

Azure AD Setup

For this demo the only authentication method we need enabled is FIDO2 Security Key:

Although not required, it is highly recommended to restrict the specific security keys that can be registered by the users on the tenant. This is especially important for those organizations that have very specific compliance requirements. This is a fantastic security feature which unfortunately is not available with other identity providers. Some AAGUID values can be found here: Yubico, AuthnTrend. You can also find the AAGUID value from the authenticator registration information as noted here.

Please reference the Microsoft documentation for a current list of FIDO2 security keys that are known to be compatible with Azure AD UAF (passwordless).

In part 2 of this series I’ll go over the Conditional Access policy to only allow access from a compliant device.

Federating AWS with Azure AD

TL;DR – For an enterprise level authentication and authorization solution, federate AWS single-accounts with Azure AD.

Security best practices dictate that AWS root accounts should be used only on rare occasions. All root accounts should enable MFA, remove any access keys, and set up monitoring to alert in case the root account is used. For day-to-day work users should access their AWS services with their IAM users and the best practice is to federate that access with a reliable identity provider (IdP), such as Azure AD.

There are two main options to federate authentication for AWS accounts. In this blog I will show you the two options and I’ll explain why I prefer one over the other.

(1) AWS SSO

The first option is to federate AWS SSO. This is configured with the AWS SSO instance within the AWS Organization. As a reminder, AWS Organizations allow administrators to manage several AWS accounts. The single sign-on integration is done between AWS SSO and the Azure tenant. With this configuration, the users in Azure AD are assigned to the AWS SSO enterprise application, so they are not assigned to a specific AWS account. The assignment of users to the specific permission sets is done within AWS SSO. Those permission sets are what determine the user’s specific role(s) within the specific AWS accounts.

And this is the end-user experience when federating AWS SSO:

From MyApps, the user clicks on the AWS SSO enterprise application that was assigned to them in Azure AD, then they are presented with an AWS SSO menu of accounts and roles that were assigned to them via AWS SSO, which they can then click on to access the account with that specific role.

Please keep in mind the following details when using this setup:

  • Users and groups have to exist locally in AWS SSO, so this solution will provision users to AWS SSO when they are assigned to the AWS SSO enterprise application.
  • In a similar manner, users are disabled (not deleted) when they are removed from the AWS SSO enterprise application.
  • Since the roles are assigned within AWS SSO, Azure AD is not aware of which roles are assigned to which users. This becomes important if you need specific Conditional Access policies or specific access reviews and/or access packages within Identity Governance.
  • Supports SP and IdP initiated login, since the users exist locally on AWS SSO.
(2) AWS Single-Account Access

The second option is to federate the AWS Single Account. This is configured with each individual AWS account. The integration is done between the AWS account and the Azure tenant. Therefore, when the users in Azure AD are assigned to the AWS account enterprise application, they are assigned to a specific AWS account. Azure AD is fully aware of the specific account the users are assigned to as well as the specific AWS roles they are assigned to.

And this is the end-user experience when federating a single account:

From MyApps, the user clicks on the specific AWS single-account enterprise application that was assigned to them in Azure AD, then they are presented with the option of the roles that were assigned to them for that account, which they can then select to access the account with that specific role.

Please keep in mind the following details when using this setup:

  • Users and groups do NOT exist locally on AWS. That’s right, users and groups do not need to be provisioned or deprovisioned in AWS.
  • The provisioning configuration ensures roles created in AWS are synchronized to Azure AD, so they can be assigned to users.
  • Azure AD is fully aware of which roles are assigned to which users for specific accounts.
  • This configuration allows implementation of Conditional Access policies for the specific AWS accounts.
  • Only supports IdP initiated login, since the users do not exist locally in AWS.
  • To ensure AWS CloudTrail data accuracy, add the source identity attribute to identify the user responsible for AWS actions performed while assuming IAM roles.
  • When CLI access is required, the temporary credential can be generated using the AssumeRoleWithSAML CLI command. This will last as long as the session is valid (default is 12 hours).
Drumroll, please…

By now you probably guessed which option I lean towards. The AWS Single-Account access configuration should be selected for enterprises that have specific compliance requirements that include identity governance or any organization that wants to implement a zero trust model, since “least privileged access” is at the foundation.

There are several benefits to this configuration.

  • The lack of users in AWS means that users or their entitlements do not have to be removed in AWS when employees are terminated. 
  • The configuration allows the tracking of the specific AWS roles within Azure AD, which means access packages* can be created and then automatically assigned or be made available to be requested by users with their appropriate approvals.
  • Those access packages can also have associated access reviews* to ensure access is removed when no longer needed.
  • Specific Conditional Access* policies can be created for the specific AWS accounts. For example, you may require access to production AWS accounts only from compliant devices, but maybe the rules are not as tight for development AWS accounts.
  • Having one central identity governance solution means organizations have the ability to meet compliance, auditing, and reporting requirements. This also means that the principles of least privilege and segregation of duties* can realistically be enforced.

Some organizations have tried to manage AWS accounts with AWS SSO and implement some level of identity governance using security groups. However, as the organizations grow, it becomes unnecessarily complex and challenging to meet compliance requirements for the reasons described in detail in my previous post.

* More on those topics in follow-up posts.

Roles vs Groups

TL;DR – For an enterprise level solution that authorizes user access, use application roles as much as possible instead of security groups.

One of the principles of the zero trust model is “least privileged access”, which is very well documented. However, I am focusing on a specific area that sometimes goes unnoticed until it’s too late and it becomes painful to correct.

This is my attempt to warn security architects and anyone planning to implement a zero trust model to ensure they pay attention to this detail because it can be costly to correct later on.

First, some basics.

In general terms authentication (Authn) is the process of verifying a user is who they say they are, whereas authorization (Authz) covers what the user should have access to, the specific permissions that have been (hopefully) assigned to the user based on the job they do. This post focuses on authorization (Authz), the access the user gets once they are authenticated.

Authorization can be based on various triggers, among them group membership and/or role assignment:

  • Groups are a logical collection of users, which can have specific privileges assigned to them. Groups can be on-prem AD groups that are synchronized to cloud or cloud-only groups [Ref: M365 and AAD groups]. Groups are not necessarily tied to an application, but security groups can be used in multiple applications and can be used for access control purposes.  They can also be nested, meaning that group 1 can be a member of group 2, and so on.
  • Roles are used to assign permissions to users and groups. Roles are specific to a function within a specific application.
Applications and services handle authorization in different ways.

A security best practice for enterprise solutions is to have applications and services rely on a central identity provider (IdP) for authentication and authorization whenever possible. This is referred to as SSO (single sign-on) or federation. Reference this quickstart for additional details.

Those applications and services that delegate authentication and sometimes authorization to the central IdP, handle authorization in various different ways:

  • Some applications and services can only handle authorization internally, so we simply need to authenticate the user and the application knows what permissions (groups/roles/privileges/responsibilities/profiles) are assigned to that user. These applications do this various different ways, that includes:
    • The use of LDAP calls to a directory service to check group membership.
    • Through their own internal authorization. Many enterprise applications (SAP, Salesforce, etc.) have their own authorization mechanisms within the application. That means that there is no mapping between the permissions set within the application to any group in any directory and typically this is for a good reason, as we explain below. For applications/services with independent authorization, an integration solution (i.e. SCIM) is required to synchronize the assignment of permissions between the application/service and the identity management solution. This supports the ultimate goal of having a single identity governance solution to view, audit, and modify all roles and assignments that a user has access to. More on that on a follow-up post.
  • Some applications can be configured to receive and process an access token or assertion that is passed when the user logs in. These assertions or access tokens include attributes or claims that tell the relying or service provider what specific groups and/or roles are assigned to the user that is connecting.

The applications that can consume access tokes or assertions will process those claims (roles and/or groups) and proceed with the authorization based on the claims that were passed on that token. For example, if the claim in the access token includes [roles=’Admin’], then the user receives access to a special area granted only to admins. Keep in mind that regardless of the protocol payload (SAML assertion, OIDC access token, or a Kerberos token), they are all bound by a maximum header size.

As noted above these claims can include a variety of attributes, including groups or roles.

  • When groups are used, all groups that the user is a member of are typically included in the token. This includes groups that are not associated with the application/service. Some services, allow the prioritization of groups in the claims by using custom filters per application, but these results can often be unpredictable.
  • For roles, only the roles associated with that specific application are included.

Maybe smaller organizations can initially survive with using groups to determine the level of authorization to applications or services. However, larger enterprises, where users are often members of multiple groups, need to be concerned with the token size. All tokens have a maximum allowable limit due to header size restrictions. For example, Okta’s group limit is 100, Azure/O365 restricts SAML tokens to 150 groups, 200 groups for JWTs. Additionally, this affects Kerberos tokens which prevents login when a user is a member of around 125 groups or more. As such, if a user is a member of 202 groups, then the user will receive errors when trying to login to various applications/services (i.e. HTTP 400).

It’s just a math problem.

Applications and services can have a multitude of permissions. At the time of writing, Salesforce has a maximum of 1500 custom profiles, which is the main permission set within the service, Azure has a maximum of 5000 custom roles per tenant and AWS has a maximum of 5000 IAM roles per AWS account.  Any authorization that needs to be attached to security groups will then require a security group to be created per role per account/tenant. When you take in to consideration that many enterprises have hundreds of tenants and accounts, it starts to compound quite rapidly.  Even if an organization consolidates, typically the math for a large enterprise can realistically result in hundreds of thousands of security groups with users having to be members of hundreds of these groups, which can quickly add up to surpass the maximums mentioned above.

This is not new to the IT industry as many large organizations have experienced the growing pains of discovering these complications as they merge or expand in size and have a desire to increase their security posture. More information can be found by searching online for “token bloat”.

In addition to token bloat, there are also known AD replication issues with group membership sizing as well as the issue of the lifetime of groups, since groups can continue to exist long after the application is removed, and may even be used for other unintended purposes. Consequently, this is why most industry experts recommend that enterprises, or any organizations planning for their future, use role based authorization, instead of group membership due to the above mentioned restrictions and also due to the amount of information that can end up in the token.

How Azure AD makes it easier.

Unlike other IdPs, Azure AD natively handles and manages application roles for enterprise application, including many SaaS services as well as custom applications. Therefore, when assigning users to these services, the administrators can choose specific application roles, which are not mapped to any security groups.

These roles can also be included in access packages that can be assigned to or requested by end users and for which access reviews can be created.

In a follow-up post I will go into the various options that are available, as well as some of my own recommendations when architecting security solutions for identity governance.