Lessons from the CrowdStrike outage: Why verification is the missing piece in modern security automation

Written by Stephen Ferrell, CISA, CRISC | Jul 24, 2024 3:00:00 PM

Recent events have underscored a critical vulnerability in the cybersecurity industry's growing embrace of automation: the lack of emphasis on robust change management and verification processes. A major outage caused by a bug in CrowdStrike, a widely used security tool, exemplifies the potential pitfalls of over-reliance on automation without proper safeguards. This blog explores the details of the CrowdStrike incident, analyzes its root causes, and proposes a path forward that leverages the benefits of automation while mitigating the risks.

The security landscape is constantly evolving, demanding ever-increasing vigilance from organizations of all sizes. Security professionals face the dual challenge of maintaining comprehensive defenses and streamlining operations to optimize efficiency. Automation has emerged as a powerful tool to address these challenges, promising to alleviate mundane tasks and free up security teams to focus on strategic initiatives. However, the recent CrowdStrike outage serves as a stark reminder that automation, without proper verification, can introduce significant risks.

The CrowdStrike outage: A case study in automation gone wrong

On July 19, 2024, at 04:09 UTC, CrowdStrike, a leading endpoint protection platform, experienced a widespread outage caused by a bug introduced during a software update. The root cause of the issue stemmed from a breakdown in their change management process, allowing a faulty update to be deployed to production environments. This bug resulted in a critical situation where Windows endpoints running CrowdStrike required manual intervention, including rebooting and reinstalling the software. The outage significantly impacted numerous organizations, disrupting their security posture and causing operational delays.

The perils of unchecked automation

The Crowdstrike incident highlights several key concerns associated with over-reliance on automation in security practices:

Increased vulnerability to errors: Automation can introduce new attack vectors if bugs or errors are present in the automation scripts or underlying software.
Reduced oversight: Overdependence on automation can lead to complacency and a decline in human oversight, potentially masking critical issues before they escalate.
Limited adaptability: Automated processes may struggle to adapt to unforeseen circumstances or novel threats, potentially leading to ineffective responses.

The importance of verification: A multi-layered approach

To mitigate the risks associated with automation, security teams must prioritize robust verification processes. This requires a multi-layered approach that incorporates the following elements:

Pre-deployment testing: Implement rigorous testing procedures to identify and address bugs before deploying new security software or updates.
Real-time monitoring: Continuously monitor security tools and automation processes for anomalies or unexpected behavior.
Human-in-the-loop verification: Integrate human oversight into critical automation workflows to ensure proper execution and catch potential issues before they cause widespread disruption.
AI-accelerators: Use of large language models to provide constant verification and flagging of collected evidence.
Third-party validation: Consider leveraging independent security assessments to validate the effectiveness of security tools and automation processes.
Compliance resilience management: Compliance Management Platforms like Strike Graph help build resilience into change and configuration management processes and help manage and mitigate risks.

Segmentation: A key principle for secure automation

The concept of segmentation plays a critical role in establishing a secure automation environment. By separating security tool operation from its validation, organizations can create a system of checks and balances. This segmentation allows for independent verification of automated tasks, minimizing the risk of undetected errors or vulnerabilities within the automation itself.

Strike Graph: A solution built on secure automation principles

Strike Graph, as a leading provider of security automation solutions, champions a balanced approach that leverages automation while prioritizing verification and human oversight. Our platform offers the following features that align with the principles outlined in this blog:

Pre-built compliance workflows: Strike Graph provides pre-built automation workflows that are rigorously tested to ensure accuracy and effectiveness.
Real-time visibility and analytics: Our platform offers comprehensive dashboards and reporting tools that allow security teams to monitor the performance of automated tasks and identify potential issues.
AI-driven verification: Our in-house, developed, and managed tools allow for accelerated evidence-gathering and evaluation.
Customizable controls: Strike Graph empowers security teams to configure automation workflows to meet their specific needs and risk tolerance while maintaining human oversight in critical decision-making.
Integration with third-party tools: Strike Graph seamlessly integrates with leading security tools and platforms, enabling robust verification and a holistic view of the security landscape.

The road ahead

The CrowdStrike outage serves as a valuable learning experience for the entire cybersecurity industry. By prioritizing verification alongside automation, security teams can harness the power of automation without compromising security posture. Strike Graph offers a solution that embodies this philosophy, empowering organizations to achieve comprehensive security with confidence. As we move forward, a commitment to secure automation practices is essential to building a more resilient security landscape.

View full post