As a diehard fan of the Peruvian Soccer (Fútbol) Team, I’m obsessed with the South American Soccer Qualifying Tournament. Peru’s last appearance in the finals was 35 years ago, and Slate Magazine describes my support for Peru as "an act of fatalism". However, though potentially delusional, I know this is our year.
Unfortunately for Peru, per DeadSpin, this year’s South American qualifying group is deemed the toughest in the world. With such a difficult group, who qualifies always comes down to the last few games. Especially for Peru. In Peru, we have a saying: “Pray to the calculator gods” because usually whether we qualify depends on the results of other teams. “We’ll make it only if Argentina ties with Chile, AND Colombia beats Brazil by 3 goals, AND…” Everyone becomes a math expert and creates elaborate result predictions and scenarios. It’s a Peruvian tradition.
That is why every 4 years, my ritual, as we approach the end of the tournament, is to build a model in Excel to predict the score of the final few games and play around with different scenarios. Since I LOVE Get & Transform (a new set of powerful Excel 2016 features which provides easy data gathering and shaping capabilities), I thought that this time around I would build the logic exclusively using this new tool.
My predictions on how the last 2 decisive games will play out, and the results from the model.Currently, there are 2 games left to be played in South America, on October 5th and 10th. Against all odds, Peru is in good shape this time around, and currently sits in 4th place. If the standings hold, Peru will qualify.
I’m providing a link to the Excel file at the end of the post, so you can predict your own results. Note that Get & Transform is available natively as part of Excel 2016 and Office 365 subscriptions. If you have Excel 2010 or Excel 2013, you can also take advantage of these capabilities by downloading the latest Power Query for Excel add-in.
Here is roughly how I put it together.
I sourced some raw data from Wikipedia with the scores from all the previous games. Since I’m not anticipating the historical scores changing, I simply copied the results from the site into the “Source Data” tab in Excel. However, if we had more games to go and I wanted the scores to update, I could have used the “From Web” connector. Since we are only 2 games away from the end, I don’t care too much about having a live connection to the website.
Of course, this data is super messy, so the first thing I did was to bring it into “Get & Transform” using the “From Table/Range” button in the “Data Tab”. Note the ribbon might look a little different depending on which version of Excel you have.
I cleaned the data by applying several transformations in the Query Editor, including removing columns and errors, splitting columns, joining tables, adding conditional columns, etc.. The result was a clean table with just the scores for each game (see below).
First set of transformations in the Query Editor
The fun part was building the logic for Points, Goals and tie breakers exclusively using Get & Transform. BTW - I know there are multiple ways of building this in Excel, and would love to see other approaches. The next challenge for myself will be to build the logic using PowerPivot and DAX.
At a high level, the logic that I built is as follows:
Points: 3 points for a win, 1 point for a draw, 0 points for a loss
If tied on points, use overall goal difference as the tie breaker
If still tied, use overall goals scored as the tie breaker
Finally, I combined the historical results table with the score predictions table and brought the standings table back to the grid in Excel.
In the process of building this model, I also learned about the “Query Dependencies” view that quickly helped me visualize how all the steps and queries are interconnected (see below).
To access this view, while in the “Query Editor” click on the “Query Dependencies” button in the “View” Tab.
As promised, here is a link to my South American Qualifying spreadsheet for you to build your own predictions. Note that you might get prompted by a "Security Warning" in Excel, click "Enable Content" to proceed. If you want to see the details behind each of the queries, go to the “Data Tab> Get Data> Launch Query Editor”. If you want to predict your own scenarios and outcomes you can simply change the scores in the “Score Predictions” table, and click “Refresh All” in the Data Tab. This will update the standings table.
The next challenge will be to build the logic using Power Pivot and DAX.