Threat Modeling, Risk Registers, and Residual Risk

A startup CTO asks you: "We just got our third enterprise prospect asking for SOC 2. Where do we even start?" The answer is not "buy a compliance platform." The answer is risk management — specifically, figuring out what could go wrong, writing it down, and deciding what to do about each one.

That process produces a risk register. The risk register drives your control selection. Your controls produce the evidence that auditors review. Everything flows from risk.

Rating every risk as 'Critical'

What risk management actually is

Risk management is the discipline of identifying threats, assessing how likely they are and how much damage they'd cause, and deciding on a response. That's it. Three verbs: identify, assess, respond.

It is not vulnerability scanning. Vulnerability scanning is one input to risk management — a way to discover specific technical weaknesses. But a risk register also captures process risks (no one reviews access quarterly), vendor risks (your payroll provider stores SSNs with no encryption at rest), and business risks (a key engineer leaves and no one else understands the IAM pipeline).

It is not fear-based decision making. One of the four valid risk responses is "accept" — you document the risk, acknowledge it, and move on. The goal is not to eliminate all risk. The goal is to make risk visible so the business can make informed decisions about where to spend money and engineering time.

Every compliance framework — SOC 2, ISO 27001, NIST 800-53, HIPAA — starts with risk assessment. The frameworks don't tell you which controls to implement. They tell you to assess your risks and implement controls proportional to those risks. The risk register is the document that connects "what could go wrong" to "what we did about it" to "how much it cost."

Threat modeling: asking "what could go wrong" systematically

You could brainstorm threats in a room with a whiteboard. Some teams do. The problem is coverage — you'll think of whatever the last incident reminded you of and miss entire categories.

STRIDE is a threat modeling framework from Microsoft that gives you six categories to walk through. It doesn't cover everything, but it covers enough to be the industry default for application-level threat modeling.

Category                   What it targets         Example
──────────────────────     ─────────────────────   ──────────────────────────────
S — Spoofing               Authentication          Attacker uses stolen session
                                                   token to impersonate a user

T — Tampering              Integrity               Attacker modifies API request
                                                   payload to change their role

R — Repudiation            Non-repudiation         Employee deletes audit logs
                                                   to hide unauthorized access

I — Information            Confidentiality         API returns full user objects
    Disclosure                                     including hashed passwords

D — Denial of              Availability            Attacker floods login endpoint
    Service                                        with credential-stuffing attempts

E — Elevation of           Authorization           Regular user accesses admin
    Privilege                                      panel via direct URL

STRIDE threat categories — each one targets a different security property

Concrete example: threat modeling a SaaS login flow

Suppose you're threat modeling the authentication flow for a B2B SaaS product. The flow is: user enters email and password, server validates credentials, server issues a JWT, client stores the JWT and includes it in subsequent API requests.

Walk through STRIDE:

Spoofing. An attacker who obtains a valid JWT can impersonate the user for the token's lifetime. If the token lifetime is 24 hours and there's no refresh rotation, a stolen token gives 24 hours of access. Threat: credential theft via phishing, token theft via XSS.

Tampering. If the JWT is signed but the server doesn't validate the algorithm header, an attacker could forge a token using the none algorithm. Threat: JWT algorithm confusion attack.

Repudiation. If the application doesn't log authentication events, a compromised account can be used without any trail. When the customer asks "who accessed my data last Tuesday," you have no answer. Threat: missing audit trail for auth events.

Information Disclosure. The login endpoint might return different error messages for "user not found" vs "wrong password," letting an attacker enumerate valid email addresses. Threat: user enumeration via login error messages.

Denial of Service. No rate limiting on the login endpoint means an attacker can attempt thousands of passwords per minute. Threat: brute-force and credential-stuffing attacks.

Elevation of Privilege. After login, the JWT payload includes a role field. If the server trusts the client-provided role without validating it against the database on each request, a user could modify their role claim. Threat: JWT claim manipulation.

Each of these threats becomes a row in the risk register. Some will score high enough to require mitigation. Others you might accept — user enumeration, for example, is often accepted at early-stage companies because the mitigation (generic error messages) confuses legitimate users.

Risk registers

A risk register is a structured document — usually a spreadsheet, sometimes a database — that captures every identified risk and tracks the response. It's the single artifact that connects business objectives to security spending.

Here are the standard columns:

Risk ID — Unique identifier (e.g., RISK-001)
Title — Short name
Description — What specifically could happen
Category — Operational, technical, compliance, vendor, etc.
Likelihood — How likely is this to happen? (1-5 scale)
Impact — How bad would it be if it did? (1-5 scale)
Risk Score — Likelihood x Impact
Owner — The person accountable for this risk
Mitigation — What controls are in place or planned
Status — Open, Mitigated, Accepted, Transferred
Last Review Date — When this risk was last reassessed

The risk register is a living document. Risks get added when you threat model a new feature, reassessed quarterly, and closed when the underlying system changes. A risk register that hasn't been updated in six months is a compliance finding waiting to happen.

ID        Title                  Category    L   I   Score  Owner         Mitigation                Status      Reviewed
────────  ─────────────────────  ──────────  ──  ──  ─────  ────────────  ────────────────────────   ──────────  ──────────
RISK-001  Credential theft       Technical   4   4    16    VP Eng        MFA enforced via Okta,     Mitigated   2026-05-01
          via phishing                                                    phishing-resistant keys
                                                                         for privileged users

RISK-002  Unreviewed third-      Vendor      3   3     9    Head of       Vendor security review     Open        2026-04-15
          party data access                                 Security      process in draft, not
                                                                         yet enforced

RISK-003  Accidental exposure    Operational 2   5    10    Platform      S3 bucket policies in      Mitigated   2026-05-10
          of customer PII in                                Lead          Terraform, CloudTrail
          S3 buckets                                                     alerting on public ACLs

Example risk register — three rows from a SaaS company

Notice RISK-001 scores highest (16) because credential theft via phishing is both likely and high-impact. RISK-003 has the highest single impact (5) but lower likelihood because the team already has Terraform-managed bucket policies. The risk score surfaces which risks deserve the most attention and budget.

Likelihood and impact scoring

The 1-5 scales need clear definitions or they become meaningless. Two people rating the same risk should arrive at roughly the same scores. Here's a calibration that works for most mid-stage companies:

Likelihood scale:

Score	Label	Definition
1	Rare	Could happen in theory but hasn't in our industry in years. Requires multiple unlikely conditions.
2	Unlikely	Has happened to similar companies but not commonly. Maybe once every few years.
3	Possible	Happens regularly in the industry. Reasonable to expect it within the next 12-18 months.
4	Likely	Happens frequently. We've seen near-misses or it's already happened to a direct competitor.
5	Almost Certain	Actively being exploited in the wild. It's a matter of when, not if.

Impact scale:

Score	Label	Definition
1	Negligible	Minor inconvenience. No customer data affected. Internal-only impact, resolved in hours.
2	Minor	Limited disruption. Small number of users affected. Resolved in a day. No regulatory notification required.
3	Moderate	Noticeable service disruption or limited data exposure. May require customer notification. Recovery takes days.
4	Major	Significant data breach or extended outage. Regulatory notification required. Material financial impact. Reputational damage.
5	Catastrophic	Existential threat. Mass data breach, total service failure, or regulatory action that threatens the business. Front-page news.

The risk score is Likelihood x Impact, giving you a range of 1-25. Most organizations define four risk tolerance bands: scores 1-4 are low (accept and monitor), 5-9 are medium (mitigate within the current quarter), 10-15 are high (mitigate within 30 days, escalate to leadership), and 16-25 are critical (mitigate immediately, block affected operations if necessary).

Quick check

A company discovers that their staging environment uses production database credentials. No breach has occurred, but similar misconfiguration has led to breaches at other companies. What would be the most appropriate likelihood and impact scores?

Likelihood 1, Impact 2 — it is rare and minor since no breach occurred

Likelihood 3, Impact 4 — it is possible and would cause major damage if exploited

Likelihood 5, Impact 5 — production credentials are exposed so this is already critical

Likelihood 4, Impact 1 — likely to be found but staging environments are low value

Risk responses: Accept, Mitigate, Transfer, Avoid

Once you've scored a risk, you need to decide what to do about it. There are exactly four options.

Accept. You document the risk and decide to live with it. This is not negligence — it's an informed business decision. Example: the risk of a junior engineer accidentally pushing directly to the main branch scores low because you have branch protection rules requiring PR reviews and CI checks. The residual risk (someone with admin access bypasses protections) exists but is accepted given the low likelihood and the cost of further mitigation.

Mitigate. You implement controls to reduce likelihood, impact, or both. Example: deploying MFA across all user accounts reduces the likelihood of credential theft. The risk doesn't go away — phishing-resistant MFA like hardware keys reduces it more than SMS-based MFA, but neither eliminates it entirely. The risk score drops from a 16 to a 6, and the remaining risk becomes residual risk.

Transfer. You shift the financial consequence to a third party. Example: purchasing a cyber insurance policy with $5M in breach coverage. The risk of a breach hasn't changed — the financial exposure has. Transfer doesn't reduce likelihood or technical impact. It reduces business impact by sharing the cost. Outsourcing to a SOC provider is another form of transfer — you're paying someone else to manage the operational risk.

Avoid. You eliminate the risk entirely by removing the activity or asset that creates it. Example: instead of building PCI-compliant cardholder data storage, you use Stripe and never touch card numbers. The risk of cardholder data breach drops to zero because you don't have cardholder data. Avoidance is the strongest response but often the most restrictive — you're giving up a capability to eliminate a risk.

In practice, most risks are mitigated. A smaller number are accepted (usually low-scoring risks where mitigation cost exceeds the risk). Transfer is common for catastrophic-impact risks. Avoidance is rare but powerful when it applies.

Residual risk

Here's the part most people miss: mitigation does not eliminate risk. It reduces it. What remains after mitigation is residual risk.

You deploy MFA. Credential theft likelihood drops from 4 to 2. The impact is still 4. Your residual risk score is 8, down from 16. That 8 is the risk the organization formally accepts when it signs off on the risk register.

Residual risk matters because:

Auditors ask about it. SOC 2 and ISO 27001 both require organizations to document residual risk and demonstrate that leadership has formally accepted it. A risk register that only shows "mitigated" without quantifying what remains is incomplete.
It drives layered controls. If residual risk after MFA is still too high, you add a second control — conditional access policies that block logins from unmanaged devices. Each layer reduces residual risk further. The question is always: is the residual risk now within our tolerance?
It prevents false confidence. A team that deploys MFA and marks "credential theft" as "resolved" is dangerously wrong. MFA-bypass attacks exist. Phishing kits that intercept MFA tokens are commodity tools. The control helps enormously — it does not solve the problem.

The GRC Engineer's job

Every control has residual risk. The GRC Engineer's job is to make residual risk visible, not to eliminate all risk. When the board asks "are we secure?" the honest answer is always "here are the risks we've identified, here's what we've done about each one, and here's what remains." That transparency is what risk management actually looks like.

The cycle is continuous: identify risks, assess them, implement controls, measure residual risk, report to leadership, reassess when the environment changes. A new feature, a new vendor, a new regulation, a new threat — each one restarts the cycle for the affected risks.

Putting it together

The flow from threat model to board report looks like this:

Threat model a system (STRIDE or another framework) — produces a list of threats
Score each threat for likelihood and impact — produces risk scores
Record everything in the risk register — produces the tracking artifact
Choose a response for each risk — accept, mitigate, transfer, or avoid
Implement controls for mitigated risks — produces evidence
Calculate residual risk — what remains after controls
Get leadership sign-off on residual risk — produces the formal acceptance
Review quarterly — keeps the register current

This is the core loop of GRC. Everything else in this bootcamp — identity management, cloud IAM, endpoint security, compliance automation — feeds into this loop. You're either identifying risks, implementing controls to mitigate them, or collecting evidence to prove the controls work.

Quick check

A company uses Stripe for payment processing to avoid storing cardholder data. Six months later, they add a feature that caches the last four digits of card numbers in their own database for display purposes. What happened to their PCI-related risk?

Nothing changed — Stripe still handles all payment processing

They moved from Avoid to Mitigate — they now store partial cardholder data and need controls around it

They moved from Avoid to Transfer — Stripe still owns the risk

They moved from Avoid to Accept — last four digits are not sensitive

What comes next

The next lessons in this module cover how to build a risk register from a realistic scenario, including hands-on scoring and response selection. The frameworks in Phase 1 (SOC 2, ISO 27001, NIST) all reference risk assessment as the starting point — now you know what that actually means in practice.