Using a Real Oven

Written 09-18-2024

Introduction

I once read a comment on a Stack Overflow solution that has stuck with me to this day. It read:

"Using Google Sheets to analyze large amounts of data is like using an Easy Bake Oven to make a cake."

This stuck to me because I consider myself adept at Sheets -- I've made hundreds of workbooks in my lifetime for various reasons, and one of my pride and joys in the form of MSheet lives there. I've sunk thousands of hours of tinkering and creating into Google Sheets. And this entire time, I was playing with an easy bake oven? I don't think this comment is 100% true, but I do think it applies to some of my Google Sheets endeavors, particularly true for my ADV Draft database. I've written about this before, and how I stored data in the sheet and had multiple links to undergo data analysis. However, the analysis within Sheets is slow, sluggish, and would sometimes crash the page -- especially for things such as the Duo Analysis and tournament filtering. I had felt the limitations of the easy bake oven and realized that keeping my ADV draft data in Google Sheets was not sustainable as more tournaments completed. It was time to cook with gas.

The Move

To pack up the faux-relational-database I had in Google Sheets and move to the actual relational database I have now, I knew that I had to restructure things a bit. It very much helps that my job for the past two years has directly involved digging through relational databases! In the workbook, the structure was a bit flowy and convoluted with formulas connecting across multiple spreadsheets while in the database I could store data in individual tables and link them through a common field. Most often, this is Pokemon name.

The sheet used "entries," combinations of player and tournament, to store information about teams and battle records. I kept this structure, creating an "entry" for each player per tournament. The entries connect to a few other foundational tables: a battle log with each player and a victor specified, a draft log with an entry for each pick every player makes, and a team log which is updated using a stored procedure which utilizes the draft log. The key field involving entries, "EntryName", is a combination of a distinct code given to each tournament (and stored in a separate tournament log) alongside the player's name. Of course, the Pokemon themselves have stats, which are stored in a separate Pokedex table. Additionally, the cost of Pokemon per tournament is stored on a separate draft board table. This is something that varies over time as trends in tournament play change and can be tracked.

Once the foundational database structure was laid out and the data was properly transferred, the sky was the limit when it came to analysis. SQL is what is called a query-based language, meaning that users ask it a question (in its syntax) and it returns a result (as a table). These results can be stored within the database as something called a view, which is what actually drives the statistical analysis I've done on the data. The mistake I made with the workbook in Google Sheets is that I coalesced foundational tables and views to make a web of formulas which slowed things down -- using a relational database structure that was designed to do this type of thing makes much more sense. The results from these views are what users see on the database's web interface.

Website Interface

Web design is something that I am wholly unfamiliar with (this entire website is built from scratch) and am building up my knowledge little by little as I work on bits and pieces of this website. I consider myself a creative person and can see the result of what I want out of a project like this with some clarity and find myself scrounging for bits and pieces of solutions and cobbling them together in practice. This is exactly what wound up happening with the database's web interface, of course, but I do think I smoothed things out in a more consistent manner than I usually do.

The web interface is still very much a work in progress, and I am consistently poking the ADV Draft community when it comes to wanted features. My main goal with this is to make something like Baseball Reference for ADV Draft, which got validated immediately as soon as I first set the database loose by someone saying "this mf making baseball reference for adv draft". At the time of writing, I've made an individual Pokemon lookup tool, an aggregate Pokemon lookup tool, and a general summary of the data I have in the database. These show individual summary pages for each Pokemon (such as Metagross, as an example) and a table of the same stats which are filterable by draft rate. As mentioned earlier, I am a dumb donkey when it comes to web design, so I opted for the simple method of the website altering the URL for specific queries and a script to load certain webpages given information in the URL (the "GET" method) as opposed to having the webpage generate results itself (the "POST" method). This also, unintentionally, allows users to share their queries directly -- albeit with a bit of an ugly URL.

I"ve compartmentalized the database into four parts in my head, based on the difficulty of logging and storing the information: there"s the draft part where teams are actually determined, the battle part where Pokemon generally prove their worth, the procedural part that contains the things that run the tournament (this is your draft board, or the tournaments themselves), and finally aggregate player stats across tournaments. I"ve got the first three on the site via lookup tools or summary tables. The last part will be a bit more difficult to implement due to the same players using different aliases across multiple tournaments, but I am working on getting it up and running. I will probably update this paragraph when that does happen.

Data Integrity

Initial reception to the database has been pretty positive, which is great! It motivates me to keep developing new features for it and better both the data collection process and database accessibility (case in point: the website). I do want to take this section to warn users of potential pitfalls that could be run into.

Firstly, the data here is manually scraped and gathered by me from a disparate set of Google Sheets and Discord servers, as tournament owners tend to create insular environments that hold all of their information. This is fine in a vacuum but can be cumbersome for me to collect -- especially in the case where tournament owners are possessive. Therefore, the data is only as accurate as my data collection efforts are. This should be 100% accurate, but there are potential misses on my end that could affect stats. For example, I had mismanaged Emerald Open 4"s data for some months and excluded one-quarter of the battle pools while double-counting another set. At the time of writing, I think there are no errors. If there are, I'll be quick to correct them.

Next, there are a lot of potential dangers with small sample size statistics. There"s been 117 drafts at the time of writing, and the Pokemon with the highest number of battles is Raikou at 541. Those with below a 100% draft rate appear in even less, of course. I have a filter that I use of a 31.2% of higher draft rate (that I call the "common" filter) -- any Pokemon here or above has a 95% probability of appearing in your average draft. I talk about the math behind it in my previous ADV Draft article. The average common Pokemon has around 150 battles to its name at this point. This is not a huge sample by any means, and the "sleeper picks" with lower draft rates have even fewer battles to draw conclusions from. Win Rate and stats based on it such as my value metrics and FrAUD Index are risky to use when it comes to Pokemon with a low amount of battles. By extension, ADP and draft-based statistics are risky to use when it comes to rarely-drafted Pokemon. Just because Magikarp has a win rate of 85.7%, does it mean it"s the best Pokemon in the format and should cost 20 points to draft next tournament? Of course not. Use discretion when pulling stats and making decisions with regards to tiering Pokemon for your own tournament or drafting them for your own team.

What's Next?

Now that the groundwork is laid out and I have both a functional relational database on my computer and a section of my website that users can use to interface with it, I have to ask myself -- what's next?

If I were to add a new field to my tables, such as replays for battles or entrant result in a particular tournament, I'll have to go back to the beginning and scrape all of the appropriate results. I've considered adding both ideas, but both will require considerable lifts to add. Each have considerable upside and difficulties associated with them. Replays, in particular, open many paths of analysis such as tracking individual Pokemon damage/recovery, common movesets (both on the Pokemon and against the Pokemon, answering questions like "how often does Breloom make the opponent bring Flying coverage?"), and usage stats (which low-tier Pokemon are used most often?). However, this would inherently be an incomplete dataset due to Pokemon replays between September 2020 and December 2023 no longer being available (this affects nearly half of my total battles to date), I have no reliable method of getting meaningful data from the JSON file replays generated, and complete teams aren't shown in the case a Pokemon does not switch in for the entire match.

With each passing tournament, the database will grow, and the dataset will become stronger. I've already started making some charts to help tournament organizers create their own draft boards, but there is further advanced analysis beyond calculating draft rate and win rate to be done. Now that I have 11 tournaments under my belt, I can start weighing results differently to bias stats like ADP towards recent events. I can refine what I've made for value metrics to get closer to a "Pokemon WAR" -- whatever that ends up meaning in the long run. My imagination is the limiting factor when it comes to my personal analysis here.

For the website, there's plenty of features to add. Like I said above, I still have to add player stats and plan to make a player lookup (akin to my individual Pokemon lookup) that generates a player profile listing tournament appearances, favorite Pokemon to draft, and best results. I also plan to add a feature that allows users to put in a mock draft and display results and suggestions (such as "are you sure you want to draft Miltank first overall?"). Nothing on my website uses the "Steal Value" and "Pick Difference" draft metrics yet, so this would be a nice introduction to those. These features, among others, will come in due time. I'm one guy developing all of this in my spare time, after all! As the community grows, I also hope that I will be able to gather more summaries for the individual Pokemon pages -- but I am anything but qualified to write them for most Pokemon. So, when it comes to "what's next," I've got plans -- I just have to cook.

Acknowledgements

I would like to thank Khaz, Goldmason, and the entire Emerald Open staff for consistent conversations and questions about the ADV Draft data that does very much help push its development. I'd also like to thank anyone who has helped me gather data, let me edit their Google Sheet document where tournament information is stored to more easily gather it, or pointed me towards new ADV Draft leagues. 2024 has been a strong year for this budding category of Pokemon draft, and by the end of the year more than half of the tournaments I have will have kicked off after Emerald Open IV on 2/29. I truly hope that I can make the database a great resource for future ADV Draft tournaments.