CIS Benchmarks for macOS, Windows & Linux

A device can be enrolled in MDM and running EDR and still be soft — shipped with insecure defaults, unnecessary services running, weak settings no one changed. Hardening is the practice of configuring the operating system to reduce that attack surface, and the de facto standard for what to configure is the CIS Benchmarks. When an auditor asks "how do you know your systems are securely configured?", "we follow the CIS Benchmark and here is our conformance score" is the answer that holds up.

This lesson explains what CIS Benchmarks are, how their two levels differ, which items matter most across macOS, Windows, and Linux, and how to measure conformance — plus the operational reality that hardening too aggressively breaks things.

After pushing CIS Level 2 to the entire company

What CIS Benchmarks are

The Center for Internet Security publishes Benchmarks: consensus-developed, prescriptive configuration guides for operating systems, cloud platforms, browsers, databases, and more. Each benchmark is a long list of specific, testable recommendations — "ensure the firewall is enabled," "set the screen-lock timeout," "disable guest login," "require strong password policy" — with the rationale, the security impact, and the exact steps to audit and remediate each one.

Two things make them the industry reference point:

They are specific and testable. Not "harden the OS" but "set com.apple.screensaver askForPasswordDelay to 5 seconds or less," with an audit command and a remediation command. That specificity is what lets you measure conformance instead of just asserting it.
They map to frameworks. CIS Benchmarks underpin the broader CIS Controls, and they cross-reference SOC 2, ISO 27001, NIST 800-53, and PCI DSS. A single benchmark conformance report becomes evidence for the configuration-management and secure-baseline expectations in all of those.

Level 1 vs Level 2

Every CIS Benchmark recommendation is tagged with a profile level, and understanding the difference keeps you from over-hardening.

Level 1 — practical baseline security with minimal impact on usability and functionality. These are the settings nearly every organization should apply: enable the firewall, require a password to unlock, disable unneeded sharing services, enforce automatic updates. Level 1 is the sensible default.
Level 2 — defense-in-depth for high-security environments, accepting reduced functionality or convenience. These settings (disabling features, aggressive restrictions, extensive logging) are appropriate for sensitive systems but can break workflows if applied broadly.

The practical guidance: apply Level 1 broadly across the fleet, and apply Level 2 selectively to high-sensitivity systems (servers holding regulated data, admin workstations) where the reduced convenience is an acceptable trade. Trying to push Level 2 onto every laptop is how hardening projects collapse under a pile of help-desk tickets.

STIGs, the stricter cousin

You may also hear about DISA STIGs (Security Technical Implementation Guides). STIGs are the U.S. Department of Defense's hardening standard — stricter and mandatory for DoD systems. CIS Benchmarks are the broader commercial standard and are what most SOC 2 / ISO 27001 organizations align to. If you work with federal or defense customers, expect STIGs; otherwise, CIS is the common reference.

High-value items per OS

You do not memorize benchmarks, but you should recognize the high-leverage controls each OS benchmark emphasizes — these are the ones that show up in findings.

macOS (CIS Apple macOS Benchmark)

FileVault full-disk encryption enabled
Firewall (Application Firewall) enabled, stealth mode on
Automatic security updates enabled; OS at a supported version
Screen lock with a short timeout and password required on wake
Gatekeeper and System Integrity Protection enabled; limit sharing services (screen sharing, remote login)

Windows (CIS Microsoft Windows Benchmark)

BitLocker drive encryption enabled
Windows Firewall enabled across profiles (domain/private/public)
Strong account and password policy; restrict local administrator usage
Disable SMBv1 and other legacy/insecure protocols
Audit logging configured; automatic updates enabled; Microsoft Defender features on

Linux (CIS Distribution Benchmarks — Ubuntu, RHEL, etc.)

Disable unused services and network protocols; close unneeded ports
Enforce SSH hardening (no root login, key-based auth, strong ciphers)
Configure host firewall (ufw/firewalld/iptables)
Filesystem and partition controls; restrict setuid/setgid where possible
Enable auditd / system logging; enforce strong password and account policy

Notice the recurring theme across all three: encryption on, firewall on, updates on, unnecessary services off, logging on, strong account policy. That cluster is the spine of endpoint hardening regardless of platform.

Measuring conformance

The point of a prescriptive benchmark is that you can score a system against it, which turns hardening from an assertion into a metric.

CIS-CAT (the CIS Configuration Assessment Tool, including CIS-CAT Pro) scans a system against the benchmark and reports a pass/fail per item and an overall conformance score.
MDM compliance policies can enforce and report many Level 1 items directly (encryption, firewall, screen lock, OS version) — the same MDM reports from Module 3.1 double as partial CIS conformance evidence.
Configuration scanners and CSPM-style tools can check benchmark conformance at scale, including for servers and cloud instances.

The output you want is a conformance percentage and a list of failing items — per device or per device group — that you can track over time. A baseline established and then monitored ("the macOS fleet is at 94% CIS Level 1 conformance, here are the 6% of items failing and why") is exactly the kind of evidence configuration-management controls call for.

Quick check

A security team announces they will apply the full CIS Level 2 Benchmark to every laptop in the company, including all developers and sales staff, by the end of the week. As the GRC Engineer, what is the most appropriate guidance?

Approve it — more hardening is always strictly better, and Level 2 everywhere maximizes security

Recommend applying Level 1 broadly across the fleet and reserving Level 2 for high-sensitivity systems, rolling out in phases with testing, because blanket Level 2 will break workflows and generate workarounds that can be worse than the original risk

Reject CIS Benchmarks entirely and let each team configure machines however they prefer

Tell them benchmarks are optional and do not affect any compliance framework

The hardening-vs-usability trade

Hardening always has a cost, and ignoring that cost is how programs fail. Disable too much and you break legitimate work; users respond by finding ways around the controls, which leaves you less secure than a moderate baseline everyone actually tolerates. The disciplined approach is:

Baseline with Level 1, applied through MDM where possible so it is enforced and reported.
Test changes on a pilot group before fleet-wide rollout — especially anything touching developer toolchains.
Exception-handle deliberately: when a system genuinely needs a benchmark item relaxed, document the exception, the rationale, and a compensating control rather than silently turning the setting off.
Monitor conformance over time and treat drift as a finding, the same way you treat MDM compliance drift.

GRC Engineer's lens

CIS conformance is one of the most reusable pieces of evidence you can produce: a single conformance report supports the secure-configuration and baseline expectations in SOC 2, ISO 27001 Annex A.8, NIST 800-53, and PCI DSS at once. Your role is to choose the right level per system, make sure the baseline is enforced and measured (not just documented in a wiki), and ensure exceptions are written down with compensating controls. When an auditor asks how you know systems are securely configured, you point at the conformance trend, not at a policy PDF no machine ever read.

What to carry forward

CIS Benchmarks turn "harden the OS" into a specific, testable, scoreable standard. Apply Level 1 broadly and Level 2 selectively, enforce what you can through MDM, measure conformance with tools like CIS-CAT, and treat the usability trade-off as a real constraint rather than an excuse. The conformance score is both a security metric and cross-framework evidence.

Next, you will switch from configuration to detection in practice: triaging the EDR alerts that fire when something gets past the hardened baseline.