Logging & Monitoring

What to Log

Authentication Events

User Authentication Logs

Successful logins (user ID, timestamp, source IP, user agent)
Failed login attempts (user ID, timestamp, source IP, failure reason)
Password changes and reset requests
Multi-factor authentication events (enrollment, verification, bypass)
Account lockouts and unlock events
Session creation and termination

Service Account Authentication

Service-to-service authentication events
API key usage and authentication
Database connection authentication
Third-party integration authentication
Scheduled task and job authentication

Administrative Actions

System Administration

User account creation, modification, and deletion
Privilege elevation and role changes
System configuration changes
Security policy modifications
Administrative user activity and commands
Backup and recovery operations

Database Administration

Schema changes and DDL operations
Data modification transactions (INSERT, UPDATE, DELETE)
Database user management
Database configuration changes
Query execution monitoring for sensitive data access

Data Access Events

Sensitive Data Access

Customer personal information access
Financial data viewing and modification
Healthcare or regulated data access
Intellectual property and trade secret access
Cross-boundary data transfers (internal to external)

File and Document Access

File downloads and uploads
Document sharing and permissions changes
Print and export operations
File deletion and modification
Access to classified or confidential documents

Configuration Changes

Infrastructure Changes

Server provisioning and deprovisioning
Network configuration modifications
Firewall rule changes
Load balancer and DNS changes
Cloud resource modifications (instances, storage, networking)

Application Configuration

Feature flag changes
Environment variable modifications
API endpoint changes
Third-party integration modifications
Security control toggles

Security Events

Threat Detection

Malware detection and quarantine
Intrusion detection system alerts
Vulnerability scanning results
Suspicious network activity
DDoS attack detection and mitigation

Access Control Violations

Unauthorized access attempts
Privilege escalation attempts
Policy violation detections
Data exfiltration attempts
Insider threat indicators

Log Retention

Retention Schedule by Log Type

Log Category	Retention Period	Storage Location	Archive Location
Authentication Logs	2 years	Primary SIEM	Cold storage after 90 days
Administrative Actions	7 years	Primary SIEM	Cold storage after 1 year
System Logs	1 year	Centralized logging	Archive after 30 days
Application Logs	90 days	Application servers	Archive after 30 days
Security Events	7 years	Security monitoring	Cold storage after 1 year
Audit Logs	7 years	Centralized audit store	Permanent archive
Database Logs	2 years	Database audit logs	Archive after 90 days
Network Logs	90 days	Network monitoring	Archive after 30 days

Retention Justification

Regulatory Requirements: SOX, HIPAA, PCI-DSS compliance needs
Legal Discovery: Potential litigation and investigation support
Security Analysis: Long-term threat hunting and analysis
Operational Support: Troubleshooting and performance analysis
Business Continuity: Historical data for business planning

Storage and Archival

Hot Storage: Frequently accessed logs for active monitoring
Warm Storage: Less frequently accessed logs with slower retrieval
Cold Storage: Long-term archival storage with minimal access costs
Geographic Distribution: Logs stored across multiple geographic regions
Encryption: All log storage encrypted at rest and in transit

Alert Thresholds

Critical Alerts (Immediate Response)

Threshold: Immediate paging and 15-minute response time

Multiple failed login attempts (5+ within 10 minutes)
Successful login from unusual geographic location
Privilege escalation or administrative access outside business hours
Malware detection or virus quarantine events
Data exfiltration attempts or unusual data transfer volumes
System availability issues affecting customer-facing services

High Priority Alerts (1-hour Response)

Threshold: Email notification and 1-hour response time

Failed login attempts (3+ within 30 minutes)
Configuration changes in production systems
Database performance issues or unusual query patterns
API rate limit breaches or unusual API usage
Security scan results indicating high-severity vulnerabilities
Unauthorized access attempts to sensitive systems

Medium Priority Alerts (4-hour Response)

Threshold: Email notification and 4-hour response time

Single failed login attempts from multiple IP addresses
Unusual application error rates
Capacity utilization above 80% for extended periods
Backup operation failures
Certificate expiration warnings (within 30 days)
Security patch compliance issues

Low Priority Alerts (24-hour Response)

Threshold: Dashboard notification and 24-hour response time

Informational events and routine operations
Performance degradation below critical thresholds
Scheduled maintenance reminders
License expiration notifications
Routine security scan results (low severity)

Paging & On-Call Rules

On-Call Rotation

Primary On-Call Engineer

Responsible for initial alert response and triage
Must acknowledge alerts within response time requirements
Has authority to escalate to secondary on-call or management
Maintains incident documentation and status updates

Secondary On-Call Engineer

Backup coverage for primary on-call
Provides technical expertise for complex incidents
Takes over if primary on-call is unavailable
Participates in incident resolution and post-incident review

Escalation Procedures

Initial Alert: Primary on-call notification
No Acknowledgment (5 minutes): Escalate to secondary on-call
No Acknowledgment (15 minutes): Escalate to engineering manager
No Resolution (1 hour): Escalate to director level
Critical Impact (2 hours): Escalate to C-level executives

Paging Rules by Severity

Critical (Level 1)

Page both primary and secondary on-call simultaneously
Page engineering manager if unresolved after 30 minutes
Page director if unresolved after 1 hour
Page C-level if unresolved after 2 hours

High (Level 2)

Page primary on-call engineer
Page secondary on-call if no response in 15 minutes
Page engineering manager if unresolved after 1 hour

Medium (Level 3)

Email notification to primary on-call
Escalate to engineering manager if no response in 4 hours

Low (Level 4)

Dashboard notification
No paging required unless aggregate impact becomes significant

Holiday and Weekend Coverage

Reduced staffing expectations with extended response times
Emergency contact procedures for critical business hours
Remote access capabilities for all on-call engineers
Automatic escalation for unacknowledged critical alerts

Log Integrity Protections

Tamper-Evident Controls

Write-Once, Read-Many (WORM) Storage

Critical security logs stored in tamper-evident format
Cryptographic hashing of log entries for integrity verification
Immutable storage for compliance and legal requirements
Regular integrity verification and validation procedures

Digital Signatures

Cryptographic signing of log entries
Public key infrastructure for log verification
Chain of custody documentation for log evidence
Regular key rotation and certificate management

Log Access Controls

Role-Based Access

Read access based on job responsibilities and security clearance
Administrative access logging and monitoring
Separation of duties between log generation and log access
Regular access reviews and certification

Audit Trail Maintenance

All log access attempts recorded and monitored
Detailed logging of log management operations
Change tracking for log retention and archival policies
Regular audit of log access patterns and anomalies

Evidence of Control Monitoring

Control Effectiveness Monitoring

Automated Control Testing

Daily automated verification of log collection and transmission
Weekly validation of alert threshold effectiveness
Monthly assessment of log retention and archival processes
Quarterly review of on-call rotation and escalation procedures

Manual Control Validation

Weekly manual review of critical alerts and responses
Monthly sampling of log entries for completeness and accuracy
Quarterly on-call procedure testing and simulation exercises
Annual comprehensive logging and monitoring system review

Metrics and Key Performance Indicators

System Availability

Log collection uptime and availability (99.9% target)
Alert system response time and delivery success rate
On-call acknowledgment time performance
Incident resolution time trends

Security Effectiveness

False positive rate for security alerts
Time to detection for security incidents
Coverage of critical systems in logging scope
Compliance with log retention requirements

Documentation and Reporting

Operational Reports

Daily operational status reports
Weekly alert summary and response metrics
Monthly system health and performance reports
Quarterly control effectiveness assessment reports

Executive Reporting

Monthly executive dashboard with key metrics
Quarterly business impact assessment
Annual security posture improvement report
Ad-hoc reports for specific incidents or audits

Privacy Considerations

Personal Data in Logs

Data Minimization

Log only necessary personal information for security purposes
Implement data pseudonymization where technically feasible
Regular review of logged personal data for continued necessity
Privacy impact assessments for new logging requirements

Purpose Limitation

Logs collected solely for security and operational purposes
Prohibition on using logs for employee monitoring or discipline
Clear separation between security logs and HR/personnel files
Regular training on appropriate log data usage

Log Redaction and Anonymization

Automatic Redaction Rules

Email Addresses

Pattern: [EMAIL-REDACTED] for full email addresses
Pattern: user@[DOMAIN-REDACTED] for user identification
Exception: Security incident investigations with legal approval

Phone Numbers

Pattern: [PHONE-REDACTED] for full phone numbers
Pattern: ***-***-#### for partial visibility
Exception: Critical security investigations

Social Security Numbers

Pattern: [SSN-REDACTED] for full SSN
Pattern: ***-**-#### for partial visibility
Complete prohibition except with legal requirement

Credit Card Numbers

Pattern: [CARD-REDACTED] for full card numbers
Pattern: **** **** **** #### for last four digits only
Strict PCI-DSS compliance requirements

IP Addresses

Full logging for security analysis
Geographic anonymization for analytics
Retention of full IP for security incidents only
Privacy-compliant IP address handling for EU data

Anonymization Procedures

User Identifiers

Pseudonymous identifiers for routine operational logging
Full user identification reserved for security incidents
Regular rotation of pseudonym keys
Secure key management for re-identification when legally required

Session Identifiers

Hash-based session identifiers for general logging
Full session details retained only for security incidents
Time-limited retention of detailed session information
Privacy-compliant session tracking methodologies

Log Data Handling Procedures

Access Restrictions

Privacy officer approval for accessing unredacted personal data
Audit trail of all personal data access in logs
Justification documentation for personal data access requests
Regular privacy compliance reviews

Data Subject Rights

Process for responding to data subject access requests
Log data identification and retrieval procedures
Data correction and deletion capabilities
Documentation of data subject right fulfillment

Cross-Border Transfers

Privacy assessment for log data stored in different jurisdictions
Adequate protection measures for international log transfers
Compliance with GDPR, CCPA, and other applicable privacy laws
Regular review of data residency requirements

Security Policy
Incident Response Plan
Privacy Policy
Data Classification Policy
Access Control Policy
Vendor Risk Management Policy
Business Continuity Plan

Document Owner: Chief Information Security Officer
Review Schedule: Quarterly
Last Updated: [Current Date]
Version: 1.0

What to Log​

Authentication Events​

Administrative Actions​

Data Access Events​

Configuration Changes​

Security Events​

Log Retention​

Retention Schedule by Log Type​

Retention Justification​

Storage and Archival​

Alert Thresholds​

Critical Alerts (Immediate Response)​

High Priority Alerts (1-hour Response)​

Medium Priority Alerts (4-hour Response)​

Low Priority Alerts (24-hour Response)​

Paging & On-Call Rules​

On-Call Rotation​

Escalation Procedures​

Paging Rules by Severity​

Holiday and Weekend Coverage​

Log Integrity Protections​

Tamper-Evident Controls​

Log Access Controls​

Evidence of Control Monitoring​

Control Effectiveness Monitoring​

Metrics and Key Performance Indicators​

Documentation and Reporting​

Privacy Considerations​

Personal Data in Logs​

Log Redaction and Anonymization​

Automatic Redaction Rules​

Anonymization Procedures​

Log Data Handling Procedures​

Related Documents​