Skip to main content

Logging & Monitoring

What to Log

Authentication Events

User Authentication Logs

  • Successful logins (user ID, timestamp, source IP, user agent)
  • Failed login attempts (user ID, timestamp, source IP, failure reason)
  • Password changes and reset requests
  • Multi-factor authentication events (enrollment, verification, bypass)
  • Account lockouts and unlock events
  • Session creation and termination

Service Account Authentication

  • Service-to-service authentication events
  • API key usage and authentication
  • Database connection authentication
  • Third-party integration authentication
  • Scheduled task and job authentication

Administrative Actions

System Administration

  • User account creation, modification, and deletion
  • Privilege elevation and role changes
  • System configuration changes
  • Security policy modifications
  • Administrative user activity and commands
  • Backup and recovery operations

Database Administration

  • Schema changes and DDL operations
  • Data modification transactions (INSERT, UPDATE, DELETE)
  • Database user management
  • Database configuration changes
  • Query execution monitoring for sensitive data access

Data Access Events

Sensitive Data Access

  • Customer personal information access
  • Financial data viewing and modification
  • Healthcare or regulated data access
  • Intellectual property and trade secret access
  • Cross-boundary data transfers (internal to external)

File and Document Access

  • File downloads and uploads
  • Document sharing and permissions changes
  • Print and export operations
  • File deletion and modification
  • Access to classified or confidential documents

Configuration Changes

Infrastructure Changes

  • Server provisioning and deprovisioning
  • Network configuration modifications
  • Firewall rule changes
  • Load balancer and DNS changes
  • Cloud resource modifications (instances, storage, networking)

Application Configuration

  • Feature flag changes
  • Environment variable modifications
  • API endpoint changes
  • Third-party integration modifications
  • Security control toggles

Security Events

Threat Detection

  • Malware detection and quarantine
  • Intrusion detection system alerts
  • Vulnerability scanning results
  • Suspicious network activity
  • DDoS attack detection and mitigation

Access Control Violations

  • Unauthorized access attempts
  • Privilege escalation attempts
  • Policy violation detections
  • Data exfiltration attempts
  • Insider threat indicators

Log Retention

Retention Schedule by Log Type

Log CategoryRetention PeriodStorage LocationArchive Location
Authentication Logs2 yearsPrimary SIEMCold storage after 90 days
Administrative Actions7 yearsPrimary SIEMCold storage after 1 year
System Logs1 yearCentralized loggingArchive after 30 days
Application Logs90 daysApplication serversArchive after 30 days
Security Events7 yearsSecurity monitoringCold storage after 1 year
Audit Logs7 yearsCentralized audit storePermanent archive
Database Logs2 yearsDatabase audit logsArchive after 90 days
Network Logs90 daysNetwork monitoringArchive after 30 days

Retention Justification

  • Regulatory Requirements: SOX, HIPAA, PCI-DSS compliance needs
  • Legal Discovery: Potential litigation and investigation support
  • Security Analysis: Long-term threat hunting and analysis
  • Operational Support: Troubleshooting and performance analysis
  • Business Continuity: Historical data for business planning

Storage and Archival

  • Hot Storage: Frequently accessed logs for active monitoring
  • Warm Storage: Less frequently accessed logs with slower retrieval
  • Cold Storage: Long-term archival storage with minimal access costs
  • Geographic Distribution: Logs stored across multiple geographic regions
  • Encryption: All log storage encrypted at rest and in transit

Alert Thresholds

Critical Alerts (Immediate Response)

Threshold: Immediate paging and 15-minute response time

  • Multiple failed login attempts (5+ within 10 minutes)
  • Successful login from unusual geographic location
  • Privilege escalation or administrative access outside business hours
  • Malware detection or virus quarantine events
  • Data exfiltration attempts or unusual data transfer volumes
  • System availability issues affecting customer-facing services

High Priority Alerts (1-hour Response)

Threshold: Email notification and 1-hour response time

  • Failed login attempts (3+ within 30 minutes)
  • Configuration changes in production systems
  • Database performance issues or unusual query patterns
  • API rate limit breaches or unusual API usage
  • Security scan results indicating high-severity vulnerabilities
  • Unauthorized access attempts to sensitive systems

Medium Priority Alerts (4-hour Response)

Threshold: Email notification and 4-hour response time

  • Single failed login attempts from multiple IP addresses
  • Unusual application error rates
  • Capacity utilization above 80% for extended periods
  • Backup operation failures
  • Certificate expiration warnings (within 30 days)
  • Security patch compliance issues

Low Priority Alerts (24-hour Response)

Threshold: Dashboard notification and 24-hour response time

  • Informational events and routine operations
  • Performance degradation below critical thresholds
  • Scheduled maintenance reminders
  • License expiration notifications
  • Routine security scan results (low severity)

Paging & On-Call Rules

On-Call Rotation

Primary On-Call Engineer

  • Responsible for initial alert response and triage
  • Must acknowledge alerts within response time requirements
  • Has authority to escalate to secondary on-call or management
  • Maintains incident documentation and status updates

Secondary On-Call Engineer

  • Backup coverage for primary on-call
  • Provides technical expertise for complex incidents
  • Takes over if primary on-call is unavailable
  • Participates in incident resolution and post-incident review

Escalation Procedures

  1. Initial Alert: Primary on-call notification
  2. No Acknowledgment (5 minutes): Escalate to secondary on-call
  3. No Acknowledgment (15 minutes): Escalate to engineering manager
  4. No Resolution (1 hour): Escalate to director level
  5. Critical Impact (2 hours): Escalate to C-level executives

Paging Rules by Severity

Critical (Level 1)

  • Page both primary and secondary on-call simultaneously
  • Page engineering manager if unresolved after 30 minutes
  • Page director if unresolved after 1 hour
  • Page C-level if unresolved after 2 hours

High (Level 2)

  • Page primary on-call engineer
  • Page secondary on-call if no response in 15 minutes
  • Page engineering manager if unresolved after 1 hour

Medium (Level 3)

  • Email notification to primary on-call
  • Escalate to engineering manager if no response in 4 hours

Low (Level 4)

  • Dashboard notification
  • No paging required unless aggregate impact becomes significant

Holiday and Weekend Coverage

  • Reduced staffing expectations with extended response times
  • Emergency contact procedures for critical business hours
  • Remote access capabilities for all on-call engineers
  • Automatic escalation for unacknowledged critical alerts

Log Integrity Protections

Tamper-Evident Controls

Write-Once, Read-Many (WORM) Storage

  • Critical security logs stored in tamper-evident format
  • Cryptographic hashing of log entries for integrity verification
  • Immutable storage for compliance and legal requirements
  • Regular integrity verification and validation procedures

Digital Signatures

  • Cryptographic signing of log entries
  • Public key infrastructure for log verification
  • Chain of custody documentation for log evidence
  • Regular key rotation and certificate management

Log Access Controls

Role-Based Access

  • Read access based on job responsibilities and security clearance
  • Administrative access logging and monitoring
  • Separation of duties between log generation and log access
  • Regular access reviews and certification

Audit Trail Maintenance

  • All log access attempts recorded and monitored
  • Detailed logging of log management operations
  • Change tracking for log retention and archival policies
  • Regular audit of log access patterns and anomalies

Evidence of Control Monitoring

Control Effectiveness Monitoring

Automated Control Testing

  • Daily automated verification of log collection and transmission
  • Weekly validation of alert threshold effectiveness
  • Monthly assessment of log retention and archival processes
  • Quarterly review of on-call rotation and escalation procedures

Manual Control Validation

  • Weekly manual review of critical alerts and responses
  • Monthly sampling of log entries for completeness and accuracy
  • Quarterly on-call procedure testing and simulation exercises
  • Annual comprehensive logging and monitoring system review

Metrics and Key Performance Indicators

System Availability

  • Log collection uptime and availability (99.9% target)
  • Alert system response time and delivery success rate
  • On-call acknowledgment time performance
  • Incident resolution time trends

Security Effectiveness

  • False positive rate for security alerts
  • Time to detection for security incidents
  • Coverage of critical systems in logging scope
  • Compliance with log retention requirements

Documentation and Reporting

Operational Reports

  • Daily operational status reports
  • Weekly alert summary and response metrics
  • Monthly system health and performance reports
  • Quarterly control effectiveness assessment reports

Executive Reporting

  • Monthly executive dashboard with key metrics
  • Quarterly business impact assessment
  • Annual security posture improvement report
  • Ad-hoc reports for specific incidents or audits

Privacy Considerations

Personal Data in Logs

Data Minimization

  • Log only necessary personal information for security purposes
  • Implement data pseudonymization where technically feasible
  • Regular review of logged personal data for continued necessity
  • Privacy impact assessments for new logging requirements

Purpose Limitation

  • Logs collected solely for security and operational purposes
  • Prohibition on using logs for employee monitoring or discipline
  • Clear separation between security logs and HR/personnel files
  • Regular training on appropriate log data usage

Log Redaction and Anonymization

Automatic Redaction Rules

Email Addresses

  • Pattern: [EMAIL-REDACTED] for full email addresses
  • Pattern: user@[DOMAIN-REDACTED] for user identification
  • Exception: Security incident investigations with legal approval

Phone Numbers

  • Pattern: [PHONE-REDACTED] for full phone numbers
  • Pattern: ***-***-#### for partial visibility
  • Exception: Critical security investigations

Social Security Numbers

  • Pattern: [SSN-REDACTED] for full SSN
  • Pattern: ***-**-#### for partial visibility
  • Complete prohibition except with legal requirement

Credit Card Numbers

  • Pattern: [CARD-REDACTED] for full card numbers
  • Pattern: **** **** **** #### for last four digits only
  • Strict PCI-DSS compliance requirements

IP Addresses

  • Full logging for security analysis
  • Geographic anonymization for analytics
  • Retention of full IP for security incidents only
  • Privacy-compliant IP address handling for EU data

Anonymization Procedures

User Identifiers

  • Pseudonymous identifiers for routine operational logging
  • Full user identification reserved for security incidents
  • Regular rotation of pseudonym keys
  • Secure key management for re-identification when legally required

Session Identifiers

  • Hash-based session identifiers for general logging
  • Full session details retained only for security incidents
  • Time-limited retention of detailed session information
  • Privacy-compliant session tracking methodologies

Log Data Handling Procedures

Access Restrictions

  • Privacy officer approval for accessing unredacted personal data
  • Audit trail of all personal data access in logs
  • Justification documentation for personal data access requests
  • Regular privacy compliance reviews

Data Subject Rights

  • Process for responding to data subject access requests
  • Log data identification and retrieval procedures
  • Data correction and deletion capabilities
  • Documentation of data subject right fulfillment

Cross-Border Transfers

  • Privacy assessment for log data stored in different jurisdictions
  • Adequate protection measures for international log transfers
  • Compliance with GDPR, CCPA, and other applicable privacy laws
  • Regular review of data residency requirements
  • Security Policy
  • Incident Response Plan
  • Privacy Policy
  • Data Classification Policy
  • Access Control Policy
  • Vendor Risk Management Policy
  • Business Continuity Plan

Document Owner: Chief Information Security Officer
Review Schedule: Quarterly
Last Updated: [Current Date]
Version: 1.0