Phase 1 · Risk Management Basics · Lesson 1 of 3
Article
·
18 min
·
+10 pts
A startup CTO asks you: "We just got our third enterprise prospect asking for SOC 2. Where do we even start?" The answer is not "buy a compliance platform." The answer is risk management — specifically, figuring out what could go wrong, writing it down, and deciding what to do about each one.
That process produces a risk register. The risk register drives your control selection. Your controls produce the evidence that auditors review. Everything flows from risk.
Rating every risk as 'Critical'
Risk management is the discipline of identifying threats, assessing how likely they are and how much damage they'd cause, and deciding on a response. That's it. Three verbs: identify, assess, respond.
It is not vulnerability scanning. Vulnerability scanning is one input to risk management — a way to discover specific technical weaknesses. But a risk register also captures process risks (no one reviews access quarterly), vendor risks (your payroll provider stores SSNs with no encryption at rest), and business risks (a key engineer leaves and no one else understands the IAM pipeline).
It is not fear-based decision making. One of the four valid risk responses is "accept" — you document the risk, acknowledge it, and move on. The goal is not to eliminate all risk. The goal is to make risk visible so the business can make informed decisions about where to spend money and engineering time.
Every compliance framework — SOC 2, ISO 27001, NIST 800-53, HIPAA — starts with risk assessment. The frameworks don't tell you which controls to implement. They tell you to assess your risks and implement controls proportional to those risks. The risk register is the document that connects "what could go wrong" to "what we did about it" to "how much it cost."
You could brainstorm threats in a room with a whiteboard. Some teams do. The problem is coverage — you'll think of whatever the last incident reminded you of and miss entire categories.
STRIDE is a threat modeling framework from Microsoft that gives you six categories to walk through. It doesn't cover everything, but it covers enough to be the industry default for application-level threat modeling.
Category What it targets Example
────────────────────── ───────────────────── ──────────────────────────────
S — Spoofing Authentication Attacker uses stolen session
token to impersonate a user
T — Tampering Integrity Attacker modifies API request
payload to change their role
R — Repudiation Non-repudiation Employee deletes audit logs
to hide unauthorized access
I — Information Confidentiality API returns full user objects
Disclosure including hashed passwords
D — Denial of Availability Attacker floods login endpoint
Service with credential-stuffing attempts
E — Elevation of Authorization Regular user accesses admin
Privilege panel via direct URLSTRIDE threat categories — each one targets a different security property
Suppose you're threat modeling the authentication flow for a B2B SaaS product. The flow is: user enters email and password, server validates credentials, server issues a JWT, client stores the JWT and includes it in subsequent API requests.
Walk through STRIDE:
Spoofing. An attacker who obtains a valid JWT can impersonate the user for the token's lifetime. If the token lifetime is 24 hours and there's no refresh rotation, a stolen token gives 24 hours of access. Threat: credential theft via phishing, token theft via XSS.
Tampering. If the JWT is signed but the server doesn't validate the algorithm header, an attacker could forge a token using the none algorithm. Threat: JWT algorithm confusion attack.
Repudiation. If the application doesn't log authentication events, a compromised account can be used without any trail. When the customer asks "who accessed my data last Tuesday," you have no answer. Threat: missing audit trail for auth events.
Information Disclosure. The login endpoint might return different error messages for "user not found" vs "wrong password," letting an attacker enumerate valid email addresses. Threat: user enumeration via login error messages.
Denial of Service. No rate limiting on the login endpoint means an attacker can attempt thousands of passwords per minute. Threat: brute-force and credential-stuffing attacks.
Elevation of Privilege. After login, the JWT payload includes a role field. If the server trusts the client-provided role without validating it against the database on each request, a user could modify their role claim. Threat: JWT claim manipulation.
Each of these threats becomes a row in the risk register. Some will score high enough to require mitigation. Others you might accept — user enumeration, for example, is often accepted at early-stage companies because the mitigation (generic error messages) confuses legitimate users.
A risk register is a structured document — usually a spreadsheet, sometimes a database — that captures every identified risk and tracks the response. It's the single artifact that connects business objectives to security spending.
Here are the standard columns:
The risk register is a living document. Risks get added when you threat model a new feature, reassessed quarterly, and closed when the underlying system changes. A risk register that hasn't been updated in six months is a compliance finding waiting to happen.
ID Title Category L I Score Owner Mitigation Status Reviewed
──────── ───────────────────── ────────── ── ── ───── ──────────── ──────────────────────── ────────── ──────────
RISK-001 Credential theft Technical 4 4 16 VP Eng MFA enforced via Okta, Mitigated 2026-05-01
via phishing phishing-resistant keys
for privileged users
RISK-002 Unreviewed third- Vendor 3 3 9 Head of Vendor security review Open 2026-04-15
party data access Security process in draft, not
yet enforced
RISK-003 Accidental exposure Operational 2 5 10 Platform S3 bucket policies in Mitigated 2026-05-10
of customer PII in Lead Terraform, CloudTrail
S3 buckets alerting on public ACLsExample risk register — three rows from a SaaS company
Notice RISK-001 scores highest (16) because credential theft via phishing is both likely and high-impact. RISK-003 has the highest single impact (5) but lower likelihood because the team already has Terraform-managed bucket policies. The risk score surfaces which risks deserve the most attention and budget.
The 1-5 scales need clear definitions or they become meaningless. Two people rating the same risk should arrive at roughly the same scores. Here's a calibration that works for most mid-stage companies:
Likelihood scale:
| Score | Label | Definition |
|---|---|---|
| 1 | Rare | Could happen in theory but hasn't in our industry in years. Requires multiple unlikely conditions. |
| 2 | Unlikely | Has happened to similar companies but not commonly. Maybe once every few years. |
| 3 | Possible | Happens regularly in the industry. Reasonable to expect it within the next 12-18 months. |
| 4 | Likely | Happens frequently. We've seen near-misses or it's already happened to a direct competitor. |
| 5 | Almost Certain | Actively being exploited in the wild. It's a matter of when, not if. |
Impact scale:
| Score | Label | Definition |
|---|---|---|
| 1 | Negligible | Minor inconvenience. No customer data affected. Internal-only impact, resolved in hours. |
| 2 | Minor | Limited disruption. Small number of users affected. Resolved in a day. No regulatory notification required. |
| 3 | Moderate | Noticeable service disruption or limited data exposure. May require customer notification. Recovery takes days. |
| 4 | Major | Significant data breach or extended outage. Regulatory notification required. Material financial impact. Reputational damage. |
| 5 | Catastrophic | Existential threat. Mass data breach, total service failure, or regulatory action that threatens the business. Front-page news. |
The risk score is Likelihood x Impact, giving you a range of 1-25. Most organizations define four risk tolerance bands: scores 1-4 are low (accept and monitor), 5-9 are medium (mitigate within the current quarter), 10-15 are high (mitigate within 30 days, escalate to leadership), and 16-25 are critical (mitigate immediately, block affected operations if necessary).
Quick check
A company discovers that their staging environment uses production database credentials. No breach has occurred, but similar misconfiguration has led to breaches at other companies. What would be the most appropriate likelihood and impact scores?
Once you've scored a risk, you need to decide what to do about it. There are exactly four options.
Accept. You document the risk and decide to live with it. This is not negligence — it's an informed business decision. Example: the risk of a junior engineer accidentally pushing directly to the main branch scores low because you have branch protection rules requiring PR reviews and CI checks. The residual risk (someone with admin access bypasses protections) exists but is accepted given the low likelihood and the cost of further mitigation.
Mitigate. You implement controls to reduce likelihood, impact, or both. Example: deploying MFA across all user accounts reduces the likelihood of credential theft. The risk doesn't go away — phishing-resistant MFA like hardware keys reduces it more than SMS-based MFA, but neither eliminates it entirely. The risk score drops from a 16 to a 6, and the remaining risk becomes residual risk.
Transfer. You shift the financial consequence to a third party. Example: purchasing a cyber insurance policy with $5M in breach coverage. The risk of a breach hasn't changed — the financial exposure has. Transfer doesn't reduce likelihood or technical impact. It reduces business impact by sharing the cost. Outsourcing to a SOC provider is another form of transfer — you're paying someone else to manage the operational risk.
Avoid. You eliminate the risk entirely by removing the activity or asset that creates it. Example: instead of building PCI-compliant cardholder data storage, you use Stripe and never touch card numbers. The risk of cardholder data breach drops to zero because you don't have cardholder data. Avoidance is the strongest response but often the most restrictive — you're giving up a capability to eliminate a risk.
In practice, most risks are mitigated. A smaller number are accepted (usually low-scoring risks where mitigation cost exceeds the risk). Transfer is common for catastrophic-impact risks. Avoidance is rare but powerful when it applies.
Here's the part most people miss: mitigation does not eliminate risk. It reduces it. What remains after mitigation is residual risk.
You deploy MFA. Credential theft likelihood drops from 4 to 2. The impact is still 4. Your residual risk score is 8, down from 16. That 8 is the risk the organization formally accepts when it signs off on the risk register.
Residual risk matters because:
Auditors ask about it. SOC 2 and ISO 27001 both require organizations to document residual risk and demonstrate that leadership has formally accepted it. A risk register that only shows "mitigated" without quantifying what remains is incomplete.
It drives layered controls. If residual risk after MFA is still too high, you add a second control — conditional access policies that block logins from unmanaged devices. Each layer reduces residual risk further. The question is always: is the residual risk now within our tolerance?
It prevents false confidence. A team that deploys MFA and marks "credential theft" as "resolved" is dangerously wrong. MFA-bypass attacks exist. Phishing kits that intercept MFA tokens are commodity tools. The control helps enormously — it does not solve the problem.
The GRC Engineer's job
Every control has residual risk. The GRC Engineer's job is to make residual risk visible, not to eliminate all risk. When the board asks "are we secure?" the honest answer is always "here are the risks we've identified, here's what we've done about each one, and here's what remains." That transparency is what risk management actually looks like.
The cycle is continuous: identify risks, assess them, implement controls, measure residual risk, report to leadership, reassess when the environment changes. A new feature, a new vendor, a new regulation, a new threat — each one restarts the cycle for the affected risks.
The flow from threat model to board report looks like this:
This is the core loop of GRC. Everything else in this bootcamp — identity management, cloud IAM, endpoint security, compliance automation — feeds into this loop. You're either identifying risks, implementing controls to mitigate them, or collecting evidence to prove the controls work.
Quick check
A company uses Stripe for payment processing to avoid storing cardholder data. Six months later, they add a feature that caches the last four digits of card numbers in their own database for display purposes. What happened to their PCI-related risk?
The next lessons in this module cover how to build a risk register from a realistic scenario, including hands-on scoring and response selection. The frameworks in Phase 1 (SOC 2, ISO 27001, NIST) all reference risk assessment as the starting point — now you know what that actually means in practice.