Introduction
This Summer’s 13th of July marked three years since Mystic Sanctuary’s ban in Pauper. That day in 2020, inspired by the prospect of a new era of Pauper, I started tracking my matches on spreadsheets.
Toying with data is a fundamental part of what makes a competitive format fun for me, even though I’m only an amateur with very little technical knowledge. I’ve tracked my Pauper matches since the very first one at Paupergeddon Pisa 2019. Before Sanctuary’s ban, though, I was using a website. Moving my data collection to spreadsheets (many thanks to Mathonical for providing me with the right one!) allowed for a higher degree of customization and better analysis. It was also the moment when I started playing challenges on mtgo.
In this series of articles, I will show you the data that I’ve gathered between Sanctuary’s ban and the release of Commander Masters. Since I’ve always wanted to read something like this, I hope it will please some readers. I also think that attempting the quantification of some aspects of the game can deepen our understanding of it.
Before I carry on with the article, I’d like to challenge you to a little game. Guess what’s the difference in Grixis Affinity’s win rate between games where it keeps seven and games where it mulligans to six. This isn’t a trick question: I’m just curious to see the answers because I feel like this kind of number isn’t part of our common knowledge. You will discover my answer in the second article (coming up next week), where I will break down everything that I have that is related to mulligans. If you take a guess now, I will add to the second article the average distance of readers’ guesses from the value recorded in my sheets. You can answer here.
The third article is going to be about the differences between eras of Pauper and how a deck’s performance changes over time.
The fourth and final part is going to be about different kinds of tournaments. I will compare paper Pauper with MtGO, and for the online part, I will also focus on my opponents and their results and add some additional fun stats.
All charts are made with Flourish, with the generous help of PyotrPavel.
Terminology
Sometimes I see ambiguity around common terms in the mtg community, but here I’d like to avoid that. A tournament is divided into rounds, where each participant plays one match at the best of three games. Each game is divided into turns. This is the way I use these words.
In these articles, I will frequently call decks “proactive” or “reactive”. I generally consider proactive the decks that can consistently kill within turn 4 if uninterrupted, and reactive all the other decks, although I make some exceptions: for example, I consider Mono Blue Faeries a proactive deck.
Another way to consider this concept is this: a proactive deck acts as the beatdown in the majority of its matches, while a reactive deck acts as the beatdown in the minority of its matches. This definition highlights how the concept is dependent on the contingencies of the metagame.
Data overview and sample size
The dataset consists of 3452 Pauper matches, , which amount to a total of 8496 games, played with 44 different decks. I’ve tried to be very careful, but there are likely some human errors in my spreadsheets. Please understand that it isn’t easy to manually handle this amount of data and take the results for the approximations that they are.
Most of the data that you’re going to see is going to be deck-specific. Of course, many of the 44 decks that I’ve played will be often left out of my charts because I haven’t played them enough to get any insight. As a rule of thumb, I will show decks of which I have recorded at least 100 games, i.e. around 40 matches. 17 of the 44 decks that I have played meet the 100+ games criterion. This is a visual rendition of how many matches I’ve played with each one of those:
Now, let’s briefly touch on a delicate subject: sample size. There’s an article by Frank Karsten that covers it much better than I could. You can also refer to this excellent summary by Sierkovitz (open and read the thread if you’re interested):
Early format data are starting to trickle down in our feeds. Remember, small sample sizes mean larger uncertainties. Here is a cheat sheet for you to estimate how large the confidence intervals are and some boring explanations what it means.
1/x pic.twitter.com/nVI9q1WghI— Sierkovitz (@Sierkovitz) September 7, 2023
As you can see, the only data that could have statistical significance is the data from Affinity and Elves, and even that is questionable because the matches weren’t recorded all in the same conditions (i.e. the metagame changes over time). Is it over then? I don’t think so.
Luckily, when we play Magic, we aren’t trying to decipher the internal laws that govern the observed phenomena purely on the basis of outcomes. We have other powerful pieces of information. We know how many copies of each card our decks run so that when we observe a certain play pattern, we can rather reliably predict how often it’s going to come up. And we have general knowledge that has been passed on for decades, like the Xerox principle.
If we accept that Magic isn’t an exact science, our knowledge can help us make sense of data below the threshold of significance.
Competitive Magic is the art of correctly generalizing from sample sizes too small to draw real conclusions.
— Andrew Elenbogen (@Ajelenbogen) January 10, 2019
For the most part, I will write general interpretations, and only occasionally I will touch on single data points (there’s no room to do otherwise). Personally, I think that many numbers in my data are more or less believable, while some others aren’t. It will be left to the reader’s judgment to discern that. On every chart, you will be able to hover over a data point (or tap on it, if you’re reading from mobile) to see its sample size. Sometimes the sample will be measured in games, other times in matches, depending on the kind of data. Only in exceptional cases, I will leave some data points out of the charts because there is a recognizable cause that corrupted the data. For more information, you can write me on Twitter or Discord.
Here you have my win rates with the selected 17 decks. All the data that will follow in these articles should be read within this frame. For example, knowing that Affinity mulligans more than Mono U Faeries is useful, but we should always keep in mind that, between the two, Affinity is the one that wins the most.
One thing you can notice here is the inescapable gap between tiers. I’d say that I play Elves and Affinity at about the same level. If you count the matches I did in the Mystic Sanctuary era (that aren’t reported here), I’ve played a similar amount of matches with the two decks. And yet, my win rate with Affinity has been significantly higher than my win rate with Elves. This might seem trivial to you, but for a long time I’ve operated under the illusion that specializing in a tier 2 deck could compensate for the inherent issues of the deck, while, in reality, if you spend the same time learning a tier 1 deck, you’re obviously going to achieve better results.
Sideboarding
How does a deck’s performance change after sideboarding? Generally, proactive decks get worse because their opponent brings in powerful answers to their strategy, while reactive decks get better because they can select a new package of answers that is more fit for the matchup.
There are some understandable exceptions. Despite being a reactive deck, Affinity has to endure plenty of powerful hate cards in the postboard games. The two factors eventually balance themselves out so that Affinity’s win rate is more or less unchanged postboard.
In the following chart, each value is obtained by subtracting the deck’s preboard win% from its postboard win%. A positive value means that the deck has won the most in postboard games, while a negative value means that the deck has won the most in preboard games.
Being on the play
How important is it to be on the play in Pauper? Let’s start by looking at the impact that going first has on a game.
It’s common knowledge that proactive decks benefit more than reactive decks from being on the play. This notion makes sense to me, as proactive decks usually want to kill the opponent before card quantity has the opportunity to matter. My data seems to support this belief. It’s curious how some reactive decks have won more on the draw than on the play. Given the sample size, I would encourage you to take this data as if it were just saying that those decks are less favored than others by being on the play, even though I wouldn’t completely rule out the possibility that they actually prefer being on the draw.
In the following chart, each value is obtained by subtracting the deck’s game win% on the draw from its game win% on the play. Therefore, a positive value indicates that the deck has won the most on the play, while a negative value indicates that it has won the most on the draw.
So, given what we know about games, what’s the impact of the die roll on the outcome of a match? Pproteus did the math for us. Theoretically, for a game w% roughly between 30% and 70% and an OtP gw% delta roughly within 30% – and these numbers are quite extreme – the OtP match w% delta can be approximated to half of the OtP gw% delta. To give you an example, if Kuldotha Red is 3.2% more likely to win a game on the play than on the draw, then it should be about 1.6% more likely to win a match after winning the die roll than it is to do it after losing the die roll.
Surprisingly, my actual data doesn’t correspond to what Pproteus predicted. Overall, during these three years, my game win% has been 4.9% higher on the play than on the draw, and my match win% has been 4.3% higher when I won the die roll compared to when I lost the die roll, while the prediction was for this last number to be around 2.5%, i.e. half of the first one.
While trying to understand why this was the case, I stumbled into another curious fact. My G1 OtP gw% delta was 8.94%, much higher than the one for G2, 2.36%, or for G3, 1.23%. If we factor this difference into the prediction, we predict a 3.27% OtP mw% delta, which is closer to reality.
Maybe we accidentally discovered that being on the play has the highest impact on preboard games. Perhaps sideboard cards water down the effect of being on the play because drawing them makes you win regardless of who went first. I don’t have a fully satisfying explanation, so I’ll let you figure that out!
And this is it for this article. See you next week, with my mulligan stats!