Logging & Monitoring
What to Log
Authentication Events
User Authentication Logs
- Successful logins (user ID, timestamp, source IP, user agent)
- Failed login attempts (user ID, timestamp, source IP, failure reason)
- Password changes and reset requests
- Multi-factor authentication events (enrollment, verification, bypass)
- Account lockouts and unlock events
- Session creation and termination
Service Account Authentication
- Service-to-service authentication events
- API key usage and authentication
- Database connection authentication
- Third-party integration authentication
- Scheduled task and job authentication
Administrative Actions
System Administration
- User account creation, modification, and deletion
- Privilege elevation and role changes
- System configuration changes
- Security policy modifications
- Administrative user activity and commands
- Backup and recovery operations
Database Administration
- Schema changes and DDL operations
- Data modification transactions (INSERT, UPDATE, DELETE)
- Database user management
- Database configuration changes
- Query execution monitoring for sensitive data access
Data Access Events
Sensitive Data Access
- Customer personal information access
- Financial data viewing and modification
- Healthcare or regulated data access
- Intellectual property and trade secret access
- Cross-boundary data transfers (internal to external)
File and Document Access
- File downloads and uploads
- Document sharing and permissions changes
- Print and export operations
- File deletion and modification
- Access to classified or confidential documents
Configuration Changes
Infrastructure Changes
- Server provisioning and deprovisioning
- Network configuration modifications
- Firewall rule changes
- Load balancer and DNS changes
- Cloud resource modifications (instances, storage, networking)
Application Configuration
- Feature flag changes
- Environment variable modifications
- API endpoint changes
- Third-party integration modifications
- Security control toggles
Security Events
Threat Detection
- Malware detection and quarantine
- Intrusion detection system alerts
- Vulnerability scanning results
- Suspicious network activity
- DDoS attack detection and mitigation
Access Control Violations
- Unauthorized access attempts
- Privilege escalation attempts
- Policy violation detections
- Data exfiltration attempts
- Insider threat indicators
Log Retention
Retention Schedule by Log Type
| Log Category | Retention Period | Storage Location | Archive Location |
|---|---|---|---|
| Authentication Logs | 2 years | Primary SIEM | Cold storage after 90 days |
| Administrative Actions | 7 years | Primary SIEM | Cold storage after 1 year |
| System Logs | 1 year | Centralized logging | Archive after 30 days |
| Application Logs | 90 days | Application servers | Archive after 30 days |
| Security Events | 7 years | Security monitoring | Cold storage after 1 year |
| Audit Logs | 7 years | Centralized audit store | Permanent archive |
| Database Logs | 2 years | Database audit logs | Archive after 90 days |
| Network Logs | 90 days | Network monitoring | Archive after 30 days |
Retention Justification
- Regulatory Requirements: SOX, HIPAA, PCI-DSS compliance needs
- Legal Discovery: Potential litigation and investigation support
- Security Analysis: Long-term threat hunting and analysis
- Operational Support: Troubleshooting and performance analysis
- Business Continuity: Historical data for business planning
Storage and Archival
- Hot Storage: Frequently accessed logs for active monitoring
- Warm Storage: Less frequently accessed logs with slower retrieval
- Cold Storage: Long-term archival storage with minimal access costs
- Geographic Distribution: Logs stored across multiple geographic regions
- Encryption: All log storage encrypted at rest and in transit
Alert Thresholds
Critical Alerts (Immediate Response)
Threshold: Immediate paging and 15-minute response time
- Multiple failed login attempts (5+ within 10 minutes)
- Successful login from unusual geographic location
- Privilege escalation or administrative access outside business hours
- Malware detection or virus quarantine events
- Data exfiltration attempts or unusual data transfer volumes
- System availability issues affecting customer-facing services
High Priority Alerts (1-hour Response)
Threshold: Email notification and 1-hour response time
- Failed login attempts (3+ within 30 minutes)
- Configuration changes in production systems
- Database performance issues or unusual query patterns
- API rate limit breaches or unusual API usage
- Security scan results indicating high-severity vulnerabilities
- Unauthorized access attempts to sensitive systems
Medium Priority Alerts (4-hour Response)
Threshold: Email notification and 4-hour response time
- Single failed login attempts from multiple IP addresses
- Unusual application error rates
- Capacity utilization above 80% for extended periods
- Backup operation failures
- Certificate expiration warnings (within 30 days)
- Security patch compliance issues
Low Priority Alerts (24-hour Response)
Threshold: Dashboard notification and 24-hour response time
- Informational events and routine operations
- Performance degradation below critical thresholds
- Scheduled maintenance reminders
- License expiration notifications
- Routine security scan results (low severity)
Paging & On-Call Rules
On-Call Rotation
Primary On-Call Engineer
- Responsible for initial alert response and triage
- Must acknowledge alerts within response time requirements
- Has authority to escalate to secondary on-call or management
- Maintains incident documentation and status updates
Secondary On-Call Engineer
- Backup coverage for primary on-call
- Provides technical expertise for complex incidents
- Takes over if primary on-call is unavailable
- Participates in incident resolution and post-incident review
Escalation Procedures
- Initial Alert: Primary on-call notification
- No Acknowledgment (5 minutes): Escalate to secondary on-call
- No Acknowledgment (15 minutes): Escalate to engineering manager
- No Resolution (1 hour): Escalate to director level
- Critical Impact (2 hours): Escalate to C-level executives
Paging Rules by Severity
Critical (Level 1)
- Page both primary and secondary on-call simultaneously
- Page engineering manager if unresolved after 30 minutes
- Page director if unresolved after 1 hour
- Page C-level if unresolved after 2 hours
High (Level 2)
- Page primary on-call engineer
- Page secondary on-call if no response in 15 minutes
- Page engineering manager if unresolved after 1 hour
Medium (Level 3)
- Email notification to primary on-call
- Escalate to engineering manager if no response in 4 hours
Low (Level 4)
- Dashboard notification
- No paging required unless aggregate impact becomes significant
Holiday and Weekend Coverage
- Reduced staffing expectations with extended response times
- Emergency contact procedures for critical business hours
- Remote access capabilities for all on-call engineers
- Automatic escalation for unacknowledged critical alerts
Log Integrity Protections
Tamper-Evident Controls
Write-Once, Read-Many (WORM) Storage
- Critical security logs stored in tamper-evident format
- Cryptographic hashing of log entries for integrity verification
- Immutable storage for compliance and legal requirements
- Regular integrity verification and validation procedures
Digital Signatures
- Cryptographic signing of log entries
- Public key infrastructure for log verification
- Chain of custody documentation for log evidence
- Regular key rotation and certificate management
Log Access Controls
Role-Based Access
- Read access based on job responsibilities and security clearance
- Administrative access logging and monitoring
- Separation of duties between log generation and log access
- Regular access reviews and certification
Audit Trail Maintenance
- All log access attempts recorded and monitored
- Detailed logging of log management operations
- Change tracking for log retention and archival policies
- Regular audit of log access patterns and anomalies
Evidence of Control Monitoring
Control Effectiveness Monitoring
Automated Control Testing
- Daily automated verification of log collection and transmission
- Weekly validation of alert threshold effectiveness
- Monthly assessment of log retention and archival processes
- Quarterly review of on-call rotation and escalation procedures
Manual Control Validation
- Weekly manual review of critical alerts and responses
- Monthly sampling of log entries for completeness and accuracy
- Quarterly on-call procedure testing and simulation exercises
- Annual comprehensive logging and monitoring system review
Metrics and Key Performance Indicators
System Availability
- Log collection uptime and availability (99.9% target)
- Alert system response time and delivery success rate
- On-call acknowledgment time performance
- Incident resolution time trends
Security Effectiveness
- False positive rate for security alerts
- Time to detection for security incidents
- Coverage of critical systems in logging scope
- Compliance with log retention requirements
Documentation and Reporting
Operational Reports
- Daily operational status reports
- Weekly alert summary and response metrics
- Monthly system health and performance reports
- Quarterly control effectiveness assessment reports
Executive Reporting
- Monthly executive dashboard with key metrics
- Quarterly business impact assessment
- Annual security posture improvement report
- Ad-hoc reports for specific incidents or audits
Privacy Considerations
Personal Data in Logs
Data Minimization
- Log only necessary personal information for security purposes
- Implement data pseudonymization where technically feasible
- Regular review of logged personal data for continued necessity
- Privacy impact assessments for new logging requirements
Purpose Limitation
- Logs collected solely for security and operational purposes
- Prohibition on using logs for employee monitoring or discipline
- Clear separation between security logs and HR/personnel files
- Regular training on appropriate log data usage
Log Redaction and Anonymization
Automatic Redaction Rules
Email Addresses
- Pattern:
[EMAIL-REDACTED]for full email addresses - Pattern:
user@[DOMAIN-REDACTED]for user identification - Exception: Security incident investigations with legal approval
Phone Numbers
- Pattern:
[PHONE-REDACTED]for full phone numbers - Pattern:
***-***-####for partial visibility - Exception: Critical security investigations
Social Security Numbers
- Pattern:
[SSN-REDACTED]for full SSN - Pattern:
***-**-####for partial visibility - Complete prohibition except with legal requirement
Credit Card Numbers
- Pattern:
[CARD-REDACTED]for full card numbers - Pattern:
**** **** **** ####for last four digits only - Strict PCI-DSS compliance requirements
IP Addresses
- Full logging for security analysis
- Geographic anonymization for analytics
- Retention of full IP for security incidents only
- Privacy-compliant IP address handling for EU data
Anonymization Procedures
User Identifiers
- Pseudonymous identifiers for routine operational logging
- Full user identification reserved for security incidents
- Regular rotation of pseudonym keys
- Secure key management for re-identification when legally required
Session Identifiers
- Hash-based session identifiers for general logging
- Full session details retained only for security incidents
- Time-limited retention of detailed session information
- Privacy-compliant session tracking methodologies
Log Data Handling Procedures
Access Restrictions
- Privacy officer approval for accessing unredacted personal data
- Audit trail of all personal data access in logs
- Justification documentation for personal data access requests
- Regular privacy compliance reviews
Data Subject Rights
- Process for responding to data subject access requests
- Log data identification and retrieval procedures
- Data correction and deletion capabilities
- Documentation of data subject right fulfillment
Cross-Border Transfers
- Privacy assessment for log data stored in different jurisdictions
- Adequate protection measures for international log transfers
- Compliance with GDPR, CCPA, and other applicable privacy laws
- Regular review of data residency requirements
Related Documents
- Security Policy
- Incident Response Plan
- Privacy Policy
- Data Classification Policy
- Access Control Policy
- Vendor Risk Management Policy
- Business Continuity Plan
Document Owner: Chief Information Security Officer
Review Schedule: Quarterly
Last Updated: [Current Date]
Version: 1.0