Incident Response and Recovery

No security system is perfect. Despite the best defenses, security incidents happen. A security incident is any event that threatens the confidentiality, integrity, or availability of information or systems. Incident response is the structured process an organization follows to detect, contain, investigate, and recover from these events — and learn from them to prevent future occurrences.

Organizations with a solid incident response plan recover faster, suffer less damage, and meet legal obligations. Organizations without one scramble in chaos, lose more data, and face larger financial and reputational losses.

What Counts as a Security Incident?

Incident Type	Example
Malware infection	Ransomware encrypts all files on the company server
Data breach	Customer database exported by an unauthorized user
Account compromise	Employee credentials stolen and used to log in from another country
DDoS attack	Company website goes offline due to traffic flood
Insider threat	Disgruntled employee deletes critical project files
Phishing success	Employee clicks a link and enters credentials on a fake site
Physical breach	Unauthorized person enters the server room

The Incident Response Lifecycle

The most widely used framework for incident response comes from NIST (National Institute of Standards and Technology). It defines a six-phase lifecycle.

NIST INCIDENT RESPONSE LIFECYCLE:

  ┌─────────────┐
  │ PREPARATION │ ← Build the team, tools, and plan BEFORE incidents happen
  └──────┬──────┘
         │
  ┌──────▼──────┐
  │ DETECTION & │ ← Identify that an incident has occurred
  │IDENTIFICATION│
  └──────┬──────┘
         │
  ┌──────▼──────┐
  │CONTAINMENT  │ ← Stop the spread. Isolate affected systems.
  └──────┬──────┘
         │
  ┌──────▼──────┐
  │ERADICATION  │ ← Remove the threat. Clean infected systems.
  └──────┬──────┘
         │
  ┌──────▼──────┐
  │  RECOVERY   │ ← Restore systems and verify they are clean.
  └──────┬──────┘
         │
  ┌──────▼──────────┐
  │POST-INCIDENT    │ ← Review what happened and how to prevent recurrence.
  │REVIEW (LESSONS) │
  └─────────────────┘

Phase 1: Preparation

Preparation is the most important phase. An organization that prepares before an incident responds faster, makes fewer mistakes, and recovers more completely. Preparation includes:

Forming a CSIRT – Computer Security Incident Response Team. A defined group of people with specific roles during an incident.
Incident Response Plan – A documented, tested plan with step-by-step procedures for different types of incidents.
Security tools – Logging, monitoring, antivirus, SIEM (Security Information and Event Management) systems.
Communication templates – Pre-written templates for notifying management, customers, law enforcement, and regulators.
Regular drills – Tabletop exercises that simulate incidents to test team readiness.

Phase 2: Detection and Identification

Detection is finding the incident. Many breaches go undetected for months. The faster detection happens, the less damage occurs. Detection sources include:

DETECTION SOURCES:

Internal Sources:
  - IDS/IPS alerts
  - Antivirus notifications
  - SIEM correlation rules triggering
  - Employee reporting suspicious activity
  - Unusual system slowdown or crashes

External Sources:
  - Notification from law enforcement
  - Third-party security researcher disclosure
  - Customer reporting inability to access account
  - Dark web monitoring service finds company data for sale

Identification Questions:
  ❓ What systems are affected?
  ❓ What data may be compromised?
  ❓ When did the incident start?
  ❓ Who is potentially responsible?
  ❓ Is the attack still ongoing?

Security Incident Severity Classification

Severity	Description	Example	Response Time
Critical (P1)	Major impact to operations or data	Ransomware on production server	Immediate — within 1 hour
High (P2)	Significant threat, partial impact	Compromised admin account detected	Within 4 hours
Medium (P3)	Contained threat, limited impact	Malware on one employee workstation	Within 24 hours
Low (P4)	Minor issue, no data compromised	Suspicious but blocked phishing email	Within 72 hours

Phase 3: Containment

Containment stops the incident from spreading further. The goal is to minimize damage while preserving evidence for investigation. Containment comes in two forms.

SHORT-TERM CONTAINMENT:
  Immediate actions to stop further spread.
  Examples:
  - Disconnect infected computer from the network
  - Block the attacker's IP address at the firewall
  - Disable the compromised user account
  - Take a memory snapshot for forensic analysis

LONG-TERM CONTAINMENT:
  Temporary fixes that allow operations to continue while full cleanup happens.
  Examples:
  - Move critical services to a clean backup server
  - Apply emergency security patches
  - Increase monitoring on all other systems
  - Change all administrative passwords

CONTAINMENT DIAGRAM:

BEFORE CONTAINMENT:
  Infected Server ──► Spreading to → DB Server, File Server, Email Server

AFTER CONTAINMENT:
  [ISOLATED: Infected Server] ──(network disconnected)
  DB Server ✔ Clean | File Server ✔ Clean | Email Server ✔ Clean

Phase 4: Eradication

Eradication removes the cause of the incident — the malware, the backdoor, the compromised credentials, the vulnerable software. Simply rebooting or disconnecting a system does not eradicate the threat. The root cause must be identified and completely removed.

Remove all malware from infected systems.
Close the vulnerability that allowed entry (patch, config change).
Reset all compromised account credentials.
Remove any backdoors or rogue user accounts the attacker may have created.
Verify that no persistence mechanisms remain (startup scripts, scheduled tasks).

Phase 5: Recovery

Recovery restores normal operations safely. Rushing back online without proper verification risks re-infection. Recovery involves:

RECOVERY STEPS:

Step 1: Restore systems from known-clean backups (pre-incident state)
Step 2: Reinstall operating systems if compromise was deep
Step 3: Verify all restored data integrity (compare hashes)
Step 4: Test functionality in an isolated environment first
Step 5: Gradually reconnect to production network with enhanced monitoring
Step 6: Monitor closely for 30-60 days for signs of re-compromise
Step 7: Notify affected users, regulators, and partners as required

Phase 6: Post-Incident Review (Lessons Learned)

After recovery, the incident response team meets to conduct a thorough review. This is not about assigning blame — it is about understanding what happened and improving defenses for the future. The output is a detailed incident report and an updated response plan.

POST-INCIDENT REVIEW QUESTIONS:

Timeline:
  When did the incident start? When was it detected? Time gap?

Root Cause:
  What vulnerability or gap allowed this to happen?

Detection:
  What indicators were missed that could have caught this earlier?

Response:
  Did the team follow the response plan effectively?
  What slowed down the response?

Prevention:
  What controls would have prevented this incident?
  What needs to change in policies, training, or technology?

Communication:
  Were stakeholders notified appropriately and on time?

OUTPUT: Updated Incident Response Plan + Technical Remediation Actions

Digital Forensics: Collecting Evidence

Digital forensics is the process of collecting, preserving, and analyzing digital evidence from an incident. Evidence must be collected properly to remain admissible in legal proceedings. The cardinal rule of forensics is to never work on the original evidence — always work on a forensic copy.

CHAIN OF CUSTODY:

Original Evidence (Hard Drive) → Forensic Image Created (exact copy)
  │
  ▼
Original Drive → Sealed, tagged, stored securely
  │
  ▼
Forensic Image → Used for investigation
  │
  ▼
Every access to evidence is documented:
  Who handled it? When? Why? Any changes?

Unbroken chain of custody = evidence admissible in court
Broken chain of custody = evidence may be rejected

Incident Response Plan Template Structure

Section	Contents
Purpose and Scope	What the plan covers, which systems and data it applies to
Team Roles	Incident Commander, Technical Lead, Communications Lead, Legal Liaison
Incident Classification	Severity levels and criteria for each
Detection Procedures	How to identify and report a potential incident
Containment Playbooks	Step-by-step guides for common incident types
Communication Plan	Who gets notified, when, and through what channel
Recovery Procedures	How to restore systems safely
Legal and Regulatory	Breach notification timelines required by law
Review Process	How and when the plan is tested and updated

Incident response handles the technical side of a breach. Equally important is the organizational side — the rules, frameworks, and compliance requirements that govern how security is managed across an entire organization. That is what security policies and compliance address.

Previous lessons

Back to courses

Next lessons