Part 2: Be not afraid of greatness. ✨ Navigating a Performance Calibration.
I’m baaaaack, and in record time. I’ve never written two back-to-back blogs so quickly. ⚡️
In the first part of this series, we discussed the importance and complexity of connecting performance and compensation. I talked you through how to create a tool of some sort to measure performance and provided an overview of how we developed our performance snapshot at Whereby.
Now, it’s time to dive into the second part of this three-parter: Calibration. This step is crucial to ensure fairness and consistency in performance assessment across the organization, and ultimately, in determining compensation. Without this, there is a real risk of allowing bias, misinformation, and poor management to not only hurt your understanding of performance, but begin to creep into compensation.
Many folks are rightfully skeptical of the connection between performance and compensation because they themselves have suffered at the hands of a poor calibration process, or a poorly implemented pay-for-performance scheme. You likely know what this looks like: a manager says “doing well” but gives reasons like, “the whole team likes them, and they go above and beyond.” What on earth does that mean? Nothing, that’s what.
Alternatively, you may have been a manager yourself watching other teams contribute less to their team’s performance, or be easier on their team’s expectations, and felt you had to respond by loosening your understanding of performance in a way that felt uncomfortable. Well, a well designed and executed calibration process should remedy those concerns too.
It is important to note that calibration is not a pay change discussion, and feedback gathered is not a calibration. These three things (pay, feedback, and calibration) are all a part of the overall performance infrastucture, and they are deeply connected, but you should not be discussing pay in calibration or feedback discussions. I will talk a bit more about how to keep these separate in Part 3.
What is a calibration?
Many companies do it differently once they’re in the room, but the general premise is almost always the same: The objective of this meeting is to establish a common understanding of what great (and not great) performance looks like across various jobs and levels in your team.
In a performance calibration, success means improved clarity on performance expectations that can be applied to setting ratings and goals, managing performance, hiring new talent, and aligning talent with opportunities.
In practice that means a group of managers (often functionally close but not necessarily perfectly aligned) sit together for a few hours to discuss their team’s performance as it relates to the performance tool you’d have created in Part 1 (let’s call it the ‘Snapshot’ for ease).
I suggest these meetings happen at least twice a year, ideally four. I like to align them with the ceremonies around goal-setting and planning on a Term (bi-yearly), Third, or Quarterly cycle.
Depending on the size and scale of management in your company, you may have one, two, or even more levels of calibration. Managers, Directors and VPs (Managers of Managers), and Execs. The way I generally like to set it up is in slightly cross-functional groups such as this:
- Product & Engineering
- Marketing, BI, & Growth
- Finance, People & Operations
- Customer Success, Support, & Sales
The ideal size of a calibration is from 5–8 managers in my experience, and each level should calibrate their own direct reports, and the level above should calibrate their direct reports and “spot check” the evidence of a random selection of individuals within their sub-team, as well as looking at the overall results and trends of their functions.
If you haven’t been involved in a calibration before, the above paragraph may feel pretty cerebral, so let me be explicit about a suggestion on how to structure calibration meetings for a business of around 100–150 people, with a maximum of 5 layers of hierarchy.
Direct Managers break into 5 calibration sessions (all of which should also include one functionally aligned People Partner).
- 🧑🏾💻 Product & Design (5 managers/15 team to be calibrated)
- 🔧Engineering (7 managers/25 team)
- 📰 Marketing, BI, & Growth (5 managers/16 team)
- 📊 Finance, People & Operations (5 managers/15 team)
- ⭐️ Customer Success, Support, & Sales (5 managers/15 team)
On timing: I would suggest around 1 hour for every 10 people, plus an additional 30 minutes reading, discussion, and preparation time. This means the calibrations above will be around 2 to 3.5 hours each. If more time is required, the requisite managers will work together to schedule more time before the next “layer” of calibration.
Managers of Managers break into 2 calibration sessions (all of which should also include one functionally aligned People Partner).
- 🧑🏾💻 Product, Design and Engineering (6 directors & VPs / 12 managers to be calibrated / 40 “sub team” to spot-check)
- 📊 Marketing, BI, Growth, Finance, People Ops, Success, and Support (7 Directors & VPs / 15 team / 46 “sub team”)
Before this meeting, the People Team should have prepared a report with the outcomes of the “sub team” shown with their performance outcomes visualised by tenure, gender, level, and team. I also find it very helpful to show this either on a bell curve of performance data, or against last year’s results. This gives good indication where you see outliers that are worth inspection.
On timing: In this meeting, the same calibration process as above happens, plus an additional 30 minutes in each meeting for reviewing the overall trends of the “sub team” calibration data, and also spot-checking outliers and a random selection of evidenced calibrations.
Executives hold 1 calibration session for all company, lead by People Leader.
- 📊 All-company (1 VP, 4 C-Suite / 12 Directors & VPs to calibrate / 113 “sub team”)
This meeting is much more focussed on the overall trends and analysis. Ideally it is a full day or half-day session. In this meeting, the same calibration process as above happens, plus an additional 2 hours for reviewing the overall trends of the “sub team” calibration data, and also spot-checking outliers and a random selection of evidenced calibrations. This is really about understanding holistic performance and discussing the quality of the outcomes, as well as any observed trends that exec saw when they read through the calibration evidence. I am the kind of person who likes to read every single team member’s calibration results and evidence, and I encourage other executives to do the same, this information is so valuable to understand your team, and takes a few evenings or a focus day.
What happens in the room
The calibration room is a sacred place. Every calibration should follow the same structure:
- 5 minutes to review an article on bias together in the room. I like this one: https://www.cultureamp.com/blog/performance-review-bias
- 10 minutes to quickly read through the calibration document and evidence (I like to use a simple google sheet and I’ve linked one in the further reading below)
- A reader to read aloud the golden rules of calibration. (Outlined below)
- Follow the structure of a set of calibration exercises. (Outlined below)
- Questions, closure.
The Golden Rules of Calibration
Before discussing the calibration process, let’s go over some important talking points that should be covered before every talent calibration meeting:
- The objective of this meeting is to establish a common understanding of what great (and not great) performance looks like across various jobs and levels. Success means improved clarity on performance expectations that can be applied to setting ratings and goals, managing performance, hiring new talent, and aligning talent with opportunities.
- Real-world examples will be discussed to inform the talent calibration conversation, but the meeting’s purpose is not to set employee ratings as a committee. Ratings are ultimately decided by the leadership in your reporting line, and your evidence will be read by your manager and, ultimately, the executive team.
- Ratings may change during these discussions, but this should result from better calibration and increased clarity on performance expectations. If you feel “overruled” rather than reaching genuine clarity, respectfully challenge with data and evidence.
- The specifics discussed in this room are highly confidential and must not be shared outside this group of participants. No exceptions. I mean none.
- As a leader, you are responsible for owning and delivering any employee-facing messaging resulting from these discussions. Never “pass the buck” by saying, “the calibration room said this” or “here is what the other leader(s) thought.” This kind of behaviour is disciplinary and should not be tolerated.
- Managers often have a proximity bias when assessing their employees’ talent. In this room, be open to feedback and perspective, and consider how your own perspective may not represent the whole picture. Be open to identifying and addressing biases in the room, and be grateful for others’ help in seeing past them.
- During discussions, “show, don’t tell.” First, clarify your expectations for an employee along a specific dimension and then explain how their behaviors or accomplishments met, exceeded, or fell short of those expectations. It is the alignment between employee behaviors and your expectations that we are calibrating, so make these things clear.
- This means you should avoid common generalities when discussing performance. Instead, provide specific examples of actions or behaviors and their impact. Be wary of shortcuts like “good/bad communication,” “presence,” “visibility,” and “confidence.”
With these talking points in mind, let’s explore the calibration process!
The overall exercise structure centers around managers coming together to discuss performance expectations, share feedback, and align on what constitutes meeting, exceeding, or falling short of expectations.
There are a few different ways teams can structure the discussions. I tend to encourage the team to read through their evidence and ratings in groups of employees within the same role or level. So all Level 2 Software Engineers will be discussed in the same “block”.
- First, talk through 2–4 examples of strong and clear over-performance. If these are non-controversial, move on. Strong performance is generally easily evidenced and will be a good litmus test.
- Then, identify your “bubble population” — i.e. identify the top 1–3 employees from the more “perfectly aligned” population that are the most likely to also be overperformers. These are often employees where the managers struggled to decide where to rate.
- In the course of discussing these 3–7 employees, the group will usually identify what the important dimensions of overperformance are, and it’s typically made pretty clear from the discussion whether any ratings should change.
- Then, repeat 1–3, but for the underperforming population.
- Then, discuss outliers. This can/should include:
- Employees with high performance but low growth ratings or vice-versa.
- Employees who have been overperforms cycle-over-cycle but have not yet been promoted.
- Employees who have been underperforms cycle-over-cycle but have not yet exited the business.
- Then, do an aggregate bias check. Look at calibrated population by level (to see if we’re over/undervaluing more senior employees relative to junior) and similarly by gender and geography. (And other dimensions of importance.)
An alternative way to run this is purely level-by-level, using the progression framework as the structure. In this scenario, each level is discussed in turn, where the general progression or role framework is used as the basis of understanding. Start by identifying the ideal performance within the progression framework for the level, either by identify someone in the population who is a strong performer at that level or discussing what that evidence would need to look like, and then work backwards in order of performance in all of the team within the level, before quickly re-addressing any unresolved points.
Allow time for other managers to ask questions. Here are some of my favourite:
- Why is this person not performing at the expectations above or below? What would “better” look like?
- When looking at this team member’s contributions, are there any places they made the same mistakes twice, or worked on the same things multiple times?
- Which pieces of evidence did you struggle to validate or discover? Why do you think this is necessary for over/under/middle performance?
- Managers should be deeply interested in these conversations, and disengage at their own peril!
The reason not to do round-robin or manager-by-manager is that the sequencing of individual discussions is important. For example, if we’re discussing top performers, discussing the work and impact of two engineers working on similar kinds of problems in sequence is more helpful in identifying the characteristics of over-performance than it is if we jump around from employee to employee based on which manager we want to speak or trying to give everybody equal time.
Remember, the challenges and questions should come from a place of trying to understand better how performance works — not telling another manager to change their rating. Each manager should be given 24–48 hours after the meeting to adjust their evidence and ratings according to their newfound appreciation for cross-functional performance.
That’s all folks. 🎉
In Part 3, we will discuss forecasting, budgeting, and applying calibrated performance to compensation. We will show you how to turn the calibrated performance inputs into a compensation ‘algorithm’ that produces repeatable, fair compensation changes. Stay tuned! 🥂
I’ve made a template here that I suggest to be used for calibrations. I am a big believer in avoiding software bloat and encouraging teams to build reporting and tools themselves, so I tend to prefer google sheets over proprietary tools. That said, Google Sheets will become very brittle above 150–200 people. https://docs.google.com/spreadsheets/d/1wb8MS4-Z_1bOV7zkK2viu_v6xZhV83dtcIkS-07fLKs/edit#gid=758067516
A huge thank you to my partner in crime, Andy Tyra, who has worked with me to draft, refine, and implement the golden roles and calibration process you see above.