Baking The Cake

Written 02-07-2024

Introduction

In my opinion, there is no true motivator better than a deadline. Without a solid goalpost to compare against it is easy to push things back to "eventually" and "later" -- and this is what happened with some of the features on the ADV Draft section of this website for quite some time. Fortunately, the deadline I'd needed had come to me: soon, Emerald Open 5 will open signups and what will essentially be the 2025 ADV Draft season will begin. This deadline spurred me to work on some database features and functionality I had previously teased but had put no action towards -- which I am writing this post to talk about, as I very much enjoyed the technical aspects of implementing them. The previous ADV Draft blogpost I wrote, Using a Real Oven, laid out the groundwork for what I wanted to add, and reading that post now serves as a happy reminder of how far the site has come to its arrival at Version 1.0.

Player Aliasing and the Unified Player ID

An issue that I brought up in Using a Real Oven is that of player aliasing -- namely, the fact that players have changed names over the course of the sample I have in the database. This is sometimes voluntary, but the biggest shift came between Emerald Open 3 and Emerald Open 4 with the dropping of the four-digit identifier from Discord usernames. As a result, players that have multiple tournament appearances from both before 2024 and in 2024 are likely to have results under two different aliases. I get around this by implementing something called a "United Player Alias" which unifies player results under one name. Each team reads results from the battle log, players aggregate teams that are associated with their player name, and the unified player alias aggregates results for the same player that are under different names and assigns a Unified Player ID. I keep track of these both in a separate Google Sheet and a table in the database

An example of player consolidation into a single "Unified Player ID".

This aggregation allows me to truly capture a player's results across all tournaments. The player search function on the website still uses all known aliases (you can look up "Zugu", "Zugubu" or "Zugubu Royale" and get the same result), while player profiles show just the Unified Player Alias and associated teams and results.

Ribbons / Tournament Accolades

Starting in the third generation of Pokemon games, players can get ribbons for their guys that can be seen on the summary screen for various accomplishments -- beating the game, participating in contests, and more. I thought that this would be a nice touch to add to the website to highlight player accomplishments. And, thus, the Accolades section of the player profile and the jokes that I only design ribbons were born. I believe there are four types of ADV Draft tournament: Major, Large, Small, and Invitational. Major tournaments have 128 or more players, Large tournaments have 64 or more players, Invitationals are closed-invite, and Small tournaments are those that don't fall into these other categories. With more players in a tournament, it presents more opportunity to get an impressive result, which is why I use and direct people to percentile placement as a metric to judge results. 2nd out of 64 is arguably more impressive than 1st out of 16 in the same level of competition, and the percentile placement of these results reflects this. When assigning accolades to high results, I decided to use Emerald Open 1 as a baseline and assign ribbons based on the following thresholds:

Winner Ribbon: Win The Tournament
2nd Ribbon: Place in the top 3.125%
3rd Ribbon: Place in the top 12.5%

The Emerald Open ribbons.

For Emerald Open 1, this translates into the ribbon split being 16/4/1. This is what I use for large and major tournaments today. Small tournaments follow the same set of thresholds, but only have two ribbons:

Winner Ribbon: Win The Tournament
2nd Ribbon: Place in the top 12.5%

G-Dras Draft League Ribbons.

For small tournaments that straddle the classification and have 48 or more players, I create three ribbons and follow the first set of thresholds. When I get an awkward number, like a 56 player tournament giving a split of 7/1.75/1, I tend to be more inclusive and do something like 8/3/1. Standardizing the ribbons gives a set goal to aim for, however small, and also helps streamline the creative process (I make all of these by hand!) for me.

Tournament accolades are the main ribbon players can earn, but there are a few other types -- veteran ribbons earned by lots of tournament entries, battles, or drafts, ribbons earned by helping out around the database, or reaching various markers such as a playoff berth. You can find the list of all existing accolades here.

Expert Ribbons

A few weeks ago, I implemented something that I call "Expert Ribbons" -- marks of success with a particular Pokemon that appear on a player profile. These ribbons are specially themed to each particular Pokemon, with the first batch including a total of 13, based off of average cost and perceived impact.

A list of all of the expert accolades. Click the image to open it and make it bigger.

I've very recently added a second wave of 11 ribbons into the mix with Pokemon that have slightly lower costs and usage -- the current threshold is 80% draft rate and 13 or higher average cost. I have a third wave in the works as a possibility, but this is likely all I will add as I want these ribbons to be exclusive to strong Pokemon and, again, I design all of these by hand. It was very fun to design these ribbons, though, and certainly scratched a creative itch of mine.

In order to qualify for an expert ribbon with a specific Pokemon, three criteria must be met: the player must draft it on at least 3 of their teams, the player must have placed in the top 12.5% of a tournament with it, and the player must have at or above a 55% win rate with it. Win Rate in a format with so much variance is a somewhat misleading metric, as consistent tournament placements are more important and paint a more complete picture of player performance, but I do believe a so-called "expert" of a Pokemon should win more often with it than they don't. These three requirements make it such that a player can't be experts of too many Pokemon at once due to the nature of the format -- which I like. I was originally going to allow only one person to be an expert and call them a "champion" of the Pokemon, but realized that was abusable and thus let multiple people be experts of the same Pokemon (at the time of writing, there are 5 Aerodactyl experts). Implementing these was a slight nightmare to get going at first but I was able to create a SQL proc with a cursor that iterates over the list of Pokemon eligible for an expert ribbon, and now it is re-run whenever I put a new tournament into the database to give ribbons to people who have earned them.

Player Insights

Like I mentioned in my last article, a feature I had eyes on adding for a long time were player-specific draft insights. These are preferences caught from a player in the drafting phase, and can be used to both scout against a certain player and make players more aware of their draft pools. If you're with two guys that love Regice and draft Regice early, you probably don't want to plan a draft that involves Regice! Or, if you do pick Regice, pick it early.

There are currently three types of player insights on the website: Common Pokemon, Pokemon Preference, and Preferred Draft Archetype. Common Pokemon are just that -- Pokemon that are chosen within 50% or more of a player's drafts. If someone finds themselves picking Donphan as a defensive piece in every draft, it will wind up here. Pokemon Preferences have to do with the where a Pokemon is chosen, and compares a Pokemon's ADP to where the player typically chooses it. If there is a difference of 3 or more picks, the insight appears on the website. Finally, there is the Preferred Draft Archetype, of which there are three: Ice Soup drafts that spend 93% or more of the budget within the first six picks, 8-Mon drafts that don't have a Pokemon costing below 5 points (i.e. 8 "viable" options for any given battle), and Standard drafts that encompass all other drafts. This used to be called Preferred Team Archetype. The three of these coalesce into a solid picture of how players like to draft, what they like to draft, and how important those pieces are to them.

A peek into the insights table -- player 1 drafts Dugtrio 5.6 picks earlier than usual, player 2 drafts the standard team archetype on 75% of their teams, and player 3 drafts Hypno in 60% of their drafts.

The technical implementation of this was neat to come up with, as these are stats with multiple different variables for each. The solution I came up with was to run queries on the draft log to grab each of these stats, mark each query with a specific flag (CommonMon, TeamArchetype, PickPref), and then union the queries together in a "Player Insights" table the website draws from. There are columns for the values (Val1, Val2, etc.) that hold various bits of information for each player insight -- this also helps me scale things up in case I want to add another category. For example, Val2 holds a player's total drafts for Common Pokemon, the player's average draft position for Pokemon Preference, and teams fitting the archetype for Preferred Draft Archetype. To not dilute player insights with small sample sizes, I make it such that these are only visible once a player has four or more drafts to their name.

Duo Analysis

Like I wrote in my first article, something that entices me about this data is Duo Analysis -- looking at common cores people build around, seeing if certain pairs of Pokemon complement each other or have poor synergy, and finding trends (do Celebi drafters take Aerodactyl far more as their second pick, when available?) is a cool way to analyze the data. Duo Analysis, before, was done in a clunky Google Sheet workbook that had very limited functionality. In an actual database structure, I am not dependent on Google and can run procedures to get the information I want. I have mostly the same suite of statistics as I did before: duo win rate, duo win rate differences, and the big one: Average Win Rate Difference. This statistic measures whether a duo is effective together or not. Since this does involve a cartesian product with over 80,000 combinations, I cap what's available on the website to be only duos with 10 or more drafts together -- which is manageable for the time being, cutting the dataset to around 5% of its original size (and arguably, the only 5% that actually matters).

The actual technical implementation of this was pretty easy: read the teams from the team log, find win/loss records for various duos, add to table, repeat for all duos within the cartesian product. At the time of writing this, it's still a bare-bones feature and I could certainly add on it. Something that's missing that was in the original sheet is Pair Rate, or the amount of drafts a Pokemon is part of a duo divided by its total drafts -- this is something I can add, but the table on the duos page is quite full of information already.

Rebuilding the Battle Log

The Battle Log used to be one of the simplest tables in the database. It had four columns: an identifier, player 1's entrant name, player 2's entrant name, and the victor. This was true until late in 2024 and through multiple tournament additions. However, after enough waffling about it, I finally decided I wanted to "make the lift" and add replays into the database to prevent what happened last year from affecting me later. In addition to doing this, I also added fields for team number to help me keep track of leagues which have transactions a bit easier, a flag that determines whether a win was a forfeit or not, a field that shows tournament stage ("Pools / Swiss", "Bracket", or "Redraft"), and a "Comment" field for additional notes such as specifying which battle pool a game took place in or what part of the bracket was associated with a certain battle.

The crown jewel, of course, is the replay link -- one that I can download directly from the Pokemon Showdown website and analyze. Of course, I only have replays from Emerald Open 4 and later, but at this point this composes over half of my dataset and that percentage will only grow with time. I've also created a reskin of the Pokemon Showdown replay client on my own website, using the color scheme and font the rest of the database uses. This proved to be quite the undertaking due to the nature of how Showdown has created an entirely browser-based Pokemon simulator, but I managed to get through the weeds and extract the important information for the website. I do this via creating a "shell" page that houses the replay infrastructure and then placing the relevant replay log, which is stored locally on the website, within the shell to create a functional replay page.

Rebuilding the battle log also required a dive into the older data (hopefully the last one I'll have to take...) to find out which games were forfeit. Forfeited games are treated as they never happened on the database -- they don't factor into Pokemon or player stats, and the only place they do exist is on the battle log. I believe this is a fair treatment because, of course, they never actually happened.

MrReplay and The Analyzer

Now that I had thousands of files containing battle logs, the next logical step was to analyze them. I mention this in Using a Real Oven. To do this, I introduce two new characters: MrReplay and The Analyzer.

In college, I did research on gravitational wave data. In particular, I led a project on coherence data quality work. In hindsight, I wish I could go back in time and use what I know now to help a younger Larry with this, but of course this is not possible. However, I can bring the past into the present -- and this is where MrSnippy.py comes in. MrSnippy was a script I created that pulled down relevant data from LIGO servers, trimmed it to only include important data, and then placed it into appropriate databases for analysis. I've had the same laptop since my freshman year of college, so I am in possession of the original MrSnippy and was able to repurpose it into a Pokemon Showdown replay grabber. Thus, MrReplay was born, and has been pulling replays down from the Pokemon Showdown servers for my personal logs ever since.

Next comes The Analyzer. I have a friend who used to work in college admissions and would occasionally feed us memorable quotes from essays -- one of which truly stuck with me. This prospective college student promised to build "The Analyzer," a "machine to give solutions to the world's problems". I found this to be absolutely hilarious -- and when I had a problem that involved analysis of replays, I knew exactly what to name my script. Pokemon Showdown replays work by loading an HTML page and filling it with pertinent battle information (in a very similar manner to my shell page solution described above). There is then a script that reads this log, line by line, and takes actions according to what the line says and what category the line is in. For example, a line that says |damage|p1a: Celebi|54/100 reduces Player 1's Celebi to 54/100 health, or 54%. This is why you cannot place a pipe character into a Pokemon's nickname. The Analyzer works by parsing the replay log, line by line, splitting it by the pipe character, looking at the first element at that list and then acting accordingly and storing information in variables that reflect the current status of the battle at that time. Some lines are procedural, such as those beginning with "|p1" and "|p2" establishing the players of the battle, but most have to do with the Pokemon on the field and their actions. Interestingly, Pokemon are not identified by their species and player -- rather, by given nickname, so I had to link lines to the log by player + nickname combination. This taught me that even in games as early as Generation 3, there are a lot of quirks to consider: Ghost-type Curse, Future Sight / Doom Desire, Healing Abilities, Belly Drum, Memento, and Leech Seed in particular required some thought to implement and made the test run fail a few times. Additionally, some values are missing due to the incomplete information replays provide: not all moves are known, no IVs or EVs are known, and all HP stats are recorded as percentages. Big thanks to ADV Draft friends Aeonfa and SPFunkM for helping me get this off of the ground.

Now that MrReplay and The Analyzer were created, I was able to dig into the statistics I couldn't glean just from a document: things like usage rate (how often a Pokemon is brought and revealed to a battle), direct damage dealt, damage taken, healing done, et cetera. In order to not have these stats be diluted by small sample sizes I only allow Battle Stats, as I call them, to be searchable for Pokemon who have appeared in 50 or more replays. This, of course, means Battle Stats are always going to be incomplete due to replays prior to Emerald Open 4 being missing. However, as I said before, this missing section will become a smaller and smaller percentage of the dataset as more ADV Draft is played. These stats can tell a lot of useful information about various Pokemon -- for example, Milotic and Blissey lead defensive metrics like times switched in, recovery, and damage taken. However, they can also be misleading: for example, damage done due to weather is credited to the weather setter, and this is nearly always going to be Tyranitar and its ability Sand Stream which sets a permanent Sandstorm on the field. Tyranitar deals over 250% damage per game, but is this actually indicative of getting 2.5 KOs per game? The answer, of course, is no -- it only has around 1.6 per game -- but contributes to them through sandstorm chip damage.

In the same swoop I added these stats, I added rankings among certain draft stats and almost every battle stat to clarify the data amongst qualified Pokemon, in a similar vein to how the NBA shows rankings for points, assists, and rebounds on their player pages. I've decided to also change the "common" (31.2% draft rate, or around a 95% chance that a Pokemon gets drafted in 1 out of 8 drafts) qualifier I've used in the past to make certain Pokemon "qualified" (21.1% draft rate, or around a 85% chance a Pokemon is drafted in 1 out of 8 drafts, plus appearing in 50 replays). Qualifying certain Pokemon in this manner makes it so even Pokemon with lower draft rates can be recognized in filters, granted they have the replay usage to back them up. I think these are solid bounds which capture Pokemon within the 20-30% draft rate range that do find quite a bit of modern use, such as Regirock.

The 2025 ADV Draft Circuit

A shower thought turned real, the 2025 ADV Draft circuit is the culmination of the work done to proliferate the ADV Draft format across multiple different tournament series: not just Emerald Open and the occasional one-off tournament. In late November, I typed out a small novel in my notes app (on the train, as I tend to write) and sent it to a few tournament organizers. Everyone seemed to like the idea, and with some polishing of details the ADV Draft Circuit was born.

This is very much an experimental first run of things -- the tournaments themselves are largely independent, and I am only taking the results from them to determine participants for my own tournament, so I have confidence that things will run well. I ran a simulation for a hypothetical set of participants for a 2024 Hoenn Draft Invitational, and I'm happy with the results: a mix of established draft talent qualified, and no one player (besides the auto-qualifiers) was carried hard enough by a single performance to qualify. I've run a few tournaments before but never one at the scale of what I think Revival Draft will gather -- and am very excited to see how it shakes out. The circuit, coincidentally, is going to begin at a time where I feel like a lot of the site's features are "finished" and the database is useful for grabbing answers to a variety of questions. Of course, it will still grow with each tournament entry and wacky idea I have, but at this point I am able to take a step back and be proud of what I've accomplished.

Conclusion: Pokemon WAR?

Speaking of wacky ideas, how about Pokemon WAR? A concept that I touch on in my first blogpost on ADV Draft, I believe that having a number to quantify how good a Pokemon is in most scenarios is a potentially good thing for the database to have. This, of course, brings up the question: WAR stands for Wins Above Replacement. What is, then, the "replacement" Pokemon? I'd argue that there are five "group" that draft picks live in: the obvious top-tier guys such as Metagross and Tyranitar, the supplemental pieces that form cores and get drafted nearly always such as Aerodactyl and Swampert, the middling guys that fit on some teams and provide fair value on their own, like Hypno and Blaziken, the Pokemon lower on the board that round out 7- or 8-Pokemon drafts like Swalot and Jumpluff, and then what the community endearingly refers to as "shitmons" -- those that cost 1-3 points and almost exclusively ride the bench. What, out of these tiers, is the "replacement" level? The third group? The fifth? Once I decide that, what numbers should actually feed into the calculation? Now that I have battle stats available, I can create a more nuanced figure than one that just includes draft stats, but I haven't started on any of this work yet beyond vague ideas.

I don't want to add so many features that it becomes bloated, but I do think I've achieved what I set out to do: create Baseball Reference for ADV Draft. With a variety of lookups and ways to find lots of information, I'm proud of what exists here. There's plenty I could add, like the draft input feature I talked about in Using a Real Oven, but I am at a point where I am happy to stamp what I have as Version 1.0. Thank you to anyone who's helped contribute to the site, via summaries or otherwise!