- PagerDuty /
- Blog /
- Best Practices & Insights /
- When Minutes Matter: The Iberian Peninsula Outage and the Future of Digital Resilience
Blog
When Minutes Matter: The Iberian Peninsula Outage and the Future of Digital Resilience
On April 28, 2025, Spain, Portugal, and briefly some parts of France experienced what would become one of Europe’s most significant power outages in recent history. As millions across the Iberian Peninsula found themselves suddenly disconnected, a stark reality emerged: in our interconnected world, the ripple effects of major incidents extend far beyond their immediate impact zone.
Yet Another Wake-Up Call
This isn’t just another outage story. It’s a powerful reminder of what our recent survey revealed: 88% of executives expect an incident of a similar scale as the July 2024 global IT outage this year. The Iberian Peninsula outage proves they’re right – it’s no longer a question of if, but when. And just last week, a computer network failure shut down the entire BART system in San Francisco, stranding 40,000 commuters.
But what makes this Iberian Peninsula incident particularly noteworthy isn’t just its scale – affecting over 60 million people – but how it exposed the intricate dependencies in our modern infrastructure. When the power grid failed, it wasn’t just lights that went out. Telecommunications dropped to 17% of normal capacity. Banking systems went offline. Transportation networks ground to a halt. This cascade of failures demonstrates why traditional approaches to reliability aren’t enough anymore.
Beyond the Myth of Perfect Prevention
There’s a common misconception in our industry that with enough redundancy and preventive measures, we can make systems failproof. This outage teaches us otherwise. As we saw during last year’s global IT outage, organizations that thrived weren’t those that tried to prevent every possible failure – they were the ones prepared to respond effectively when incidents occurred.
During that previous global incident, PagerDuty’s platform processed over 60,000 notifications per minute while maintaining our 15-second average notification time. This wasn’t luck – it was the result of systematic preparation and the right tools in place.
Real Resilience in Action
Now, you may ask: what does effective incident management look like in practice? Let’s break it down:
- Early Warning Systems Matter: The Iberian outage began with grid oscillations at 12:03 CEST, but the system collapsed 30 minutes later. PagerDuty’s AIOps can help teams detect and respond to such anomalies before they cascade into major incidents:
-
- Using machine learning to identify patterns and potential issues
- Providing automated alert grouping to reduce noise
- Offering intelligent alert routing to the right teams
- Delivering context-rich notifications for faster resolution
- Automation is Your First Responder: During the July 2024 outage, our customers who leveraged PagerDuty’s automation capabilities saw a 1425% increase in automation usage, allowing them to handle routine tasks while human responders focused on critical decision-making. This same principle applies to power grid management and infrastructure monitoring. Our platform enables:
-
- Automated incident classification and prioritization
- Pre-built response playbooks for common scenarios
- Intelligent workflow automation
- Automated stakeholder communications
- Integration with over 700+ tools and services
- Coordinated Response is Critical: The Spain and Portugal incident required coordination between multiple power grid operators, emergency services, and government agencies across two countries. Our end-to-end incident management platform ensures clear communication channels and structured workflows when every second counts thanks to:
-
- Real-time collaboration tools
- Structured incident command protocols
- Automated escalation policies
- Stakeholder updates and Status Pages
- Mobile-first design for on-the-go response
Building Tomorrow’s Resilience Today
Recent data shows that 86% of leaders recognize they’ve been prioritizing security at the expense of operational readiness. The Iberian Peninsula outage reinforces what we’ve long advocated – resilience requires a holistic approach that combines:
- Real-time monitoring and early warning systems
- Automated response capabilities
- Clear incident management protocols
- Cross-team coordination tools
- Continuous testing and improvement
As we analyze this incident, one thing becomes clear: the organizations that weather major outages most effectively are those that have invested in modern incident management capabilities. They understand that resilience isn’t about preventing every possible failure – it’s about building systems and processes that can detect, respond to, and recover from incidents quickly and effectively.
At PagerDuty, we’ve seen how organizations that embrace this approach consistently outperform during major incidents. During the July 2024 outage, our customers resolved incidents just 29% slower than on a normal day, despite a 192% spike in incident volume. This is the kind of resilience every organization needs in today’s interconnected world.
Taking Action
The Iberian power outage serves as a timely reminder that major incidents are inevitable. The question isn’t whether your organization will face a similar challenge, but how prepared you’ll be when it happens. With the right tools, processes, and mindset, you can build the resilience needed to maintain service continuity even in the face of major disruptions.
Want to learn more about preparing for outages? Check out our on-demand webinar Learn from Incidents to Stay Prepared for the Next Outage, and this checklist to review your operational resiliency to prepare for the next outage.
Eduardo Crespo is Vice President of EMEA at PagerDuty. With extensive experience in digital operations management across Europe, the Middle East, and Africa, he helps organizations build resilient digital operations that can withstand and recover from major incidents.