In-person + Virtual
18-21 April
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2023 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Central European Summer Time (UTC +2). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Back To Schedule
Thursday, April 20 • 17:25 - 18:00
Disaster Recovery: Bringing Back Production from Scratch in Under 1 Hour Using KOps, ArgoCD and Velero - Andre Jay Marcelo-Tanner, Ada Support

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

This is a real life story of how our company had an operational incident caused by misconfiguration and the most reliable way to get everything working again was to rebuild our entire cluster from scratch. This was only possible because of the investments we had made into GitOps, ArgoCD, kOps and keeping our infrastructure as code. Out of nowhere our cluster was failing in a way we had never seen before. Our standard backup and recovery methods were not working, etcd was inaccessible, customers were down. Our last card was to recreate the entire cluster and re-install all our services. We had practiced it many times but this would be the first real disaster recovery. Would all our planning and migration to GitOps and ArgoCD pay off, could things be brought back to a healthy state? How fast could we do it? In the end we managed to recreate the cluster in 51 minutes and we learned a lot along the way. Many of the tools we invested in did not work as expected, disaster recovery guides were outdated and things we had never planned for occurred. We talk about the workarounds we had to employ, the work we had to do afterwards and how we plan to improve on this in the future (what we learned).

avatar for Andre Marcelo-Tanner

Andre Marcelo-Tanner

Staff DevOps Enginner, Ada Support Inc.
Andre has over 22 years of experience as a software engineer, building online distributed systems on large databases and tackling hard problems at scale. In his free time, Andre enjoys running, going through his catalog of PlayStation games, and spending time with his family.

Thursday April 20, 2023 17:25 - 18:00 CEST
Hall 7, Room A | Ground Floor | Europe Complex