In this post, we’re gonna talk about Zero Trust Networks and Architectures. Zero Trust, like DevOps, isn’t a set of tools that you buy and suddenly you have Zero Trust, but rather it’s also about culture, policies, and more broadly, organization of resources.
The whole premise of Zero Trust is to move away from the conventional Perimeter Defense, which defends only the ingress and egress, and implicitly trusts all traffic within the network, to Defense in Depth mode of thinking, which assumes that the entire network is compromised, and all communications between hosts and the network cannot be trusted.
Using a PKI (Public Key Infrastructure) to manage the public keys. A PKI is a way to securely distribute and validate public keys.
A PKI uses a Registration Authority to bind an identity to a public key. That way, you know can be sure who owns the public key. This binding is embedded in a signed certificate.
A Certificate Authority is a type of PKI, and it is most commonly used. A CA relies on a signature chain, for which the CA is the anchor. The private key of the CA signs the client certificate, which binds the identity to the public key. To validate the signature of the signed client certificate, we can use the CA’s certificate. A commonly used certificate technology is the X.509 certificate.
Because the entire chain of signatures lies rooted in the CA, the CA must be highly protected at all times. In the context of Zero Trust, all entities rely on the PKI to prove their identity to the network. These entities include Devices, Users, and Applications.
It is highly advised to host a private PKI instead of a public one
Variable Trust and Network Agents
Variable trust is a concept that allows the degree of trust given to an entity to vary overtime. Generally speaking, a new device has a low trust, and it has to slowly “gain” trust. Also, trust degrades over time when the entity is stale.
To handle variable trust, we introduce a concept of Network Agents.
A Network Agent is a collection of information surrounding the entity, and these information influences the authorization decision.
Some examples of information in a Network Agent are:
- Agent Trust Score
- Device Trust Score
- IP Address
- Device Manufacturer
- User Roles
When making an authorization decision, it is not the individual pieces of information being authorized, but the agent, and the collection of information that is authorized. For example: Only a right combination of
User Role AND IP Address AND Geolocation can access certain data.
A Network Agent is not for authentication, but only for authorization. Authentication can be achieved using the PKI and X.509 certificates.
Caching of Authentication Results is permissible, while caching of Authorization Results are not. This is because the information within the Network Agent may change rapidly, or modified by a malicious actor. Authentication data on the other does not change as frequently (passwords, biometrics)
Making Authorization Decisions
Zero Trust relies on 4 major components
- Policy Engine
- Trust Engine
- Data Store
All traffic must flow through the Enforcement module, which queries the Policy Engine for a recommended answer (Allow/Deny).
The Policy Engine gives it’s recommendation based on calculations from the Trust Engine and the Data Store (which stores information such as the Network Agent, and the actual Policies themselves, which are Policies-as-Code [see OPA])
Devices can be trusted by using a “Golden Image”, which is the last know operational state that is highly likely to be secure (Not guaranteed, because the Golden Image could still have been compromised at deeper levels).
Secure boot is another mechanism for trusting a device, where the protection comes at a lower level in the firmware. A public key is loaded into the firmware, which is then used to validate processes, drivers and OS loaders from the lower level, all the way to when the Operating System starts.
Because the Private Key is such an important concept that holds Zero Trust together, it must be highly secured. Often, the best way to store the private key is in a separate hardware itself, and it can only be accessed through the hardware. These hardware are called Trusted Platform Module (TPM), or Hardware Security Module (HSM). TPMs expose an API to generate a pair of asymmetric keys, exposes the public key, and stores the private key in the hardware.
TPMs generate a Storage Root Key (SRK), which is an asymmetric key. Because asymmetric encryption is typically expensive to perform over large data, the SRK encrypts a symmetric key (AES), which is then used for encryption/decryption of the data. The work flow for decryption then looks likes this: To decrypt the data, you have to decrypt the private key, which is used to decrypt the AES key, which is then used to decrypt the data.
Platform Configuration Registers (PCR) are an important part of TPMs which provides storage slots to store hashes of processes that are running. It starts with the hashes of the BIOS, boot record, configuration and so on. This sequence of hashes is used to verify that nothing malicious has been injected from the start. PCR values cannot be modify or rolled back, and the only way to subvert them is through a supply chain attack to initialize faulty PCR values.
Supplying the private key information any way external is always a risk, hence TPMs make use of a Endorsement Key (EK), which is unique per TPM, and only exists on the TPM. This EK is used to sign the PCR values, called a “quote”, and is used to verify the state of the TPM.
The X.509 standard is used for device identification and authentication. It defines the format for public key certificates, and methods to validate those certificates. X.509 can be used for both validation of identity, as well as encryption for secure communication.
Usually, a certificate has to be trusted as it has to be signed by a trusted third party. When no such third party exists, the certificate becomes a “Self-Signed” certificate. This is usually used in testing environments, or environments with no internet connectivity. The signing of the certificate is done by the Certificate Authority (CA), and when you trust the CA, you implicitly trust all certs that come from that CA.
Images of devices and inventory should be tracked and version controlled. Regular “Rotation” of the images is also recommended, as the trust of the device always degrades overtime. (The more it is being used and accessed, the more likely it is to be compromised).
There are two types of Identitiy
- Informal Identity
- Authoritative Identity
Informal Identity is an identity that is created by the user on how they want to represent themselves. This can be online nicknames. Informal identities cannot be trusted because they cannot be definitively tied to a user.
Authoritative identity on the other hand is a piece of information that is governed and back by a trusted entity, such as the government. Examples of authoritative identity are NRIC numbers or Passport numbers.
Human interaction when it comes to authentication is always better than just relying on digital counterparts, such as passwords. Using multiple channels and out-of-band authentication is also recommended (i.e. generation an OTP on a separate device and system).
Authentication can happen in multiple ways
- Something you know (Passphrase)
- Something you have (Physical Token)
- Something you are (Biometrics)
A combination of all 3 is more secure than just relying on one. In most organizations, they typically only rely on passphrases, which is not a strong indicator for authentication.
The authentication logic and process should be separated from the application itself. Single Sign Ons (SSO) technologies allows a user to authenticate themselves once, and access a variety of services.
Local Authentication Systems allow users to authenticate themselves with a trusted device, and that device can then be used to verify the person to the services. These devices store the private key, and the applications they want to access are given their public key to confirm that they are in possession of the private key in the device. Moving authentication away to local devices prevents or mitigates replay attacks and MITM.
Trusting code and applications require a trust in a few things:
- Trust in the people writing the code
- Trust in the process of converting the code to an application
- Trust in the deployment process of the application to the infrastructure
- Trust in the telemetry of the application
The different phases of checks are:
- Source Code
In fact, these few steps ties in very closely to DevSecOps, where we talk about pre-compilation checks, post-compilation checks, pre-deployment checks and monitoring.
Trusting the Source requires you to secure the repository of code, and have proper audit trails. This can be achieved by using any modern Version Control System such as git. Proper Code Review processes before allowing code to be merged into the code base is also recommended. Signed contributions are also recommended so we can have an audit trail of changes to the code.
Trusting the Builds requires you to check the artifacts, configurations and process that are used to build the code to an application. This can be achieved by signing the artifacts, and using proper Configuration Management Systems. Even the configurations should be version controlled, and changes be logged for auditing.
Trusting the Distribution requires you to secure the delivery of the application to the destination. Taking Advanced Packaging Tool (APT) as an example, it uses hashing and signing in the process of distributing APT packages. The APT repo comes with a Release file, Packages File, and the packages themselves. The Package file acts as an index to the packages, together with their checksums to ensure their integrity. The Release file contains meta data about the whole repository, and a checksum for the Packages file. The Release file is then signed by a trusted publisher and distributes the public key which allows verification of the Release file. Using this chain, we can verify the Release file, the Package file and the package itself.
We should also use Upgrade-Only policy when distributing software, and never allow any user to downgrade to an older version of a software.
Trusting the Execution requires you to ensure that the application is secure in runtime. This can be achieved using secure coding practices, running the application in isolated environments, and active logging and monitoring of the application. The application should also be tested by a Red Team through pen-testing activities to ensure that it is secure. A bulk of these vulnerabilities can be handles by shifting security left, through secure coding practices mentioned earlier.
All traffic flowing through a Zero Trust network should always be encrypted where possible. There are two ways to achieve this:
TLS resides at the application level, and this is commonly implemented by libraries running within the application. IPSec on the other hand runs deeper in the OSI model (Layer 3/4). Being so low in the model, it typically runs at the kernel level.
Using IPSec, the traffic is definitely encrypted, as the encryption happens regardless of the application implementation. However, there are some drawbacks to using IPSec,
- Network support
- Not all firewalls or application allow IPSec traffic
- Device support
- Not all devices support IPSec, or has IPSec enabled. (Desktop vs Mobile)
- The cipher suites available to IPSec is much smaller compared to TLS
- Application support
- Applications on both ends need to be configure specifically to support IPSec, such as running an IKE daemon to facilitate the security negotiation. These adds overheads to the whole security process
A pragmatic approach would be to use TLS to client-server interactions, and IPSec for server-server interactions
To guard against malicious traffic, we can drop all traffic, and only listen to packets that are Pre-authenticated from a trusted client with a pre-authorized key. This can be achieved using Single Packet Authorization (SPA). An SPA packet includes encrypting or signing a piece of data, and sending it over UDP to the server, and the server does the authentication, replying only if the packet is authenticated.
An SPA would contain information such as:
- 16 bytes of random data
- Local username
- Local timestamp
- fwknop version
- SPA message type
- Access request
- SPA message digest (SHA-256)
There are many ways to secure traffic using encryption, however some encryption methods are preferred over others. During a TLS key change, the two systems use mathematical function to generate a key that is agreed upon by both parties. Some ways of generating the keys are:
- Elliptic Curve Diffie-Hellman
- Uses an Elliptic Curve to agree on the key
- Uses modular arithmetic to agree on the key
- Uses a public key of the server to encrypt and share the key to the server
Both Diffie-Hellman implementation ensure Perfect Forward Secrecy (PFS), while RSA does not. PFS ensures that if a private key is leaked, previous encrypted messages cannot be decrypted. RSA does not ensure PFS because the private key is used to directly encrypt the session key.
Elliptic Curves have a potential security issue, where some concerns have been raised about the integrity of the seed value used to calculate the keys. These seed values, if compromised, and result in the loss of integrity of the ECHE. Threat actors and State Actors are suspected to have tampered with these seed values before.
Encryption and the application should be separated from each other, either running as a separate process, or an entirely different server.
Bulk encryption usually result in poor performance when using asymmetric encryption key. How we solve that is to use asymmetric encryption to encrypt a symmetric key, which is then used for the encryption/decryption of the actual data. This is how it works in the TPM example above.
Traffic filtering can be done in different ways:
- Host Filtering
- End points perform the filtering. This approach is typically not recommended as the sole solution, as it is too close to the end point.
- When a host is in a virtual environment, these firewalls should be placed at the hypervisor level, and not at the image level.
- Bookended Filtering
- Performs filtering both on the ingress and egress traffic
- Intermediary Filtering
- Devices other than the sender and receiver should take part in the traffic filtering process
- This means that applications along the perimeter should also filter the traffic
- All network flows MUST BE authenticated
- All network flows SHOULD BE encrypted
- Authentication and Encryption MUST be performed by the endpoints
- All network flows MUST be enumerated so that access can be enforced
- The strongest authentication and encryption suites SHOULD BE used
- Authentication SHOULD NOT rely on public PKI. A private PKI SHOULD BE used
- Devices SHOULD BE regularly scanned, patched and rotate