Enterprise Evolution | Global Azure 2021
April 17, 2021
Enterprise Evolution
DevOps
Take an example Pit stop 1950 Indianapolis 500
- Low Maturity
- 67 Seconds
2013 Melbourne
- Increase in number of people
- Automation (backup)
DevOps is the union of people, process, and products to enable continuous delivery of value to your end users.
- Engineering practice
- Agility is one aspect
- Mindset shift
Agile is a way of working to enable continuous delivery of value to your end users.
- Planning and Management process practice
- DevOps is one aspect
- Mindset shift
Agile and DevOps are same things (same coin, just the different sides)
DevOps companies spend more time innovating and less time “Keeping the lights on” The result: better products, delivered faster, to happier customers by more engaged teams
Epic failures
- Poor Quality (Windows Vista)
- Mismatch to customer desires (Windows 8)
How Microsoft become agile and transformed a box software product into CD as a service
-
Team Foundation Server VSTS
- Lagging in industry to leading in industry
- There cannot be a more important thing for an engineer, for a product team, than to work on the systems that drive our productivity.
- So I would, any day of the week, trade off features of our own productivity.
- I want our best engineers to work on enginnering systems, so that we can later come back and build all of the new concepts we want — Satya Nadella
Most important thing is to spend time on productivity
https://aka.ms/DevOpsAtMicrosoft
What Microsoft has learned so far
-
Be customer obsessed
- Listen to your customer
- Developer Community
- Stack Overflow [azure-devops]
- Help in your product
- Report a problem
- Make a suggestion
- Definition of Done
- Live in production, collecting telemetry supporting or diminishing the starting hypothesis.
- collect data broadly (but carefully)
- Application Insights Analytics (project Kusto)
-
KPI’s
- Monitor usage
- Acquisition
- Engagement
- Satisfaction
- Churn
- Featuer Usage
- Monitor Velocity
- Time to Build
- Time to Self Test
- Time to Deploy
- Time to Learn
- Live Site Health
- Time to Detect
- Time to Communicate
- Time to Mitigate
- Customer Impact
- Incident Prevention Items
- Aging Live Site Problems
- SLA per Customer
- Customer Support Metrics
- scurm.org on Evidence based management
- Getting the right data
- Monitoring the right data to drive behaviors in your team and in your products
- Things we don’t watch (Monitoring these will not result in positive outcomes, vanity matrix no benefits)
- Original Estimate
- Completed hours
- Lines of Code
- Team Capacity
- Team burndown
- Team velocity
- number of bugs found
-
Iterate over Pain (find what hurts and keep doing it a bit better)
- Find the part of your process in getting value to customers that slows you down or hurts the most.
- Make it incrementally better each sprint
- Re-evaluate and improve the next most painful
- 3 Week sprints 1 Week deployment
-
Features delivered per year
- Deliver more value to customers
- Faster responses to customer and market changes
- Improved engineering satisfaction
- 2x productivity increase
- Maintaining enterprise rigor
- Everyone is on ONE main master branch
- Git helps with lightweight topic branching
- Tiny, continuous merging
- Code is fresh in your mind
- Release Flow Using Trunk Based Development to avoid Merge Hell
-
Master
- topic
- hotfix
- releases/M129
- releases/M130 (sprint)
-
Feature Flags
- To turn on and turn off features
- Features turned on globally just before the keynote, it did not go well
-
Production First Mindset
- Be Transparent
- Automate completely
- No more “one time” commands run manually
- Every command goes in PowerShell scirpts that are checked in
- Deployment to pre-production & canary is the same as deployment to production every time
- All orchestrated with Azure Pipelines
-
Live Site Culture
- Live site status is always the top priority
- Weekly live site review
- Root cause everything
- LSI fixes go into backlog (2 sprint rule)
- Actionable alerts
- Monthly service review
- On-call Designated Responsible Individuals (DRI)
- Customer Focused Availability Model (SLA)
- Per team / service health reports
- Your aim won’t be perfect
- Control your blast radius
- Tracking Deployments to Production (5 Rings)
- 1 Canary (internal users)
- 2 Smallest external data center
- 3 Largest external data center
- 4 International data centers
- 5 All the rest
-
Software delivery paradox
- Speed (innovation) vs Control (reliability)
-
Agile at Scale with Aligned Autonomy
- Book Drive by Daniel H. Pink
-
Team Autonomy + Enterprise Alignment
- Team Structure
- Feature Teams (cross functional team) —> Customers
- Teams
- Physical team rooms
- Cross discipline
- 10-12 people
- Self managing
- Clear charter and goals
- Intact for 12-18 months
- Own features in production
- Own deployment of features
-
Shift left Quality
- Testing: Shift left from integration to unit
- L0 - Requires only built binaries, no dependencies
- L1 - Adds ability to use SQL and file system Run L0 & L1 in the pull request builds
- L2 - Test a service via REST APIs
- L3 - Full environment to test end to end
- Infrastructure as Flexible Resource
- Don’t over-think, learn how to fail fast
-
A Journey of a thousand miles begins with a single sprints
- Evolution not a transformation
- You are constantly changing to the business demand / Customer demands
- Learn to feel comfortable with change, to experiment, to try new thing that you know might not be successful and realize there are no failures there are only learnings, the only failure is in not learning from your failures .