{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import seaborn as sns\n", "sns.set(font_scale=2)\n", "sns.set_style(\"whitegrid\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparing countries with PCA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we've looked at positions for general players, we can try and compare the players from two different countries. This may allow us to predict the winner of a match between two countries, during the World Cup for example.\n", "\n", "We'll first compare Brazil, a perennial powerhouse, and Japan, a relative newcomer to professional football. We'll construct two datasets, one with goal-keepers, and one with \"regular\" players." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"FIFA_2018.csv\",encoding = \"ISO-8859-1\",index_col = 0, low_memory = False)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AccelerationAggressionAgilityBalanceBall controlComposureCrossingCurveDribblingFinishing...Sprint speedStaminaStanding tackleStrengthVisionVolleysPositionNameNationalityClub
294569682959275819689...907824538083FWDNeymarBrazilParis Saint-Germain
3070777468808360616838...747489817463DEFThiago SilvaBrazilParis Saint-Germain
3977847782888590808467...798185777554DEFMarceloBrazilReal Madrid CF
5184827979818286788255...889384807068MIDAlex SandroBrazilJuventus
5488559292888577848874...778044618775MIDCoutinhoBrazilLiverpool
\n", "

5 rows \u00d7 38 columns

\n", "
" ], "text/plain": [ " Acceleration Aggression Agility Balance Ball control Composure \\\n", "2 94 56 96 82 95 92 \n", "30 70 77 74 68 80 83 \n", "39 77 84 77 82 88 85 \n", "51 84 82 79 79 81 82 \n", "54 88 55 92 92 88 85 \n", "\n", " Crossing Curve Dribbling Finishing ... Sprint speed Stamina \\\n", "2 75 81 96 89 ... 90 78 \n", "30 60 61 68 38 ... 74 74 \n", "39 90 80 84 67 ... 79 81 \n", "51 86 78 82 55 ... 88 93 \n", "54 77 84 88 74 ... 77 80 \n", "\n", " Standing tackle Strength Vision Volleys Position Name \\\n", "2 24 53 80 83 FWD Neymar \n", "30 89 81 74 63 DEF Thiago Silva \n", "39 85 77 75 54 DEF Marcelo \n", "51 84 80 70 68 MID Alex Sandro \n", "54 44 61 87 75 MID Coutinho \n", "\n", " Nationality Club \n", "2 Brazil Paris Saint-Germain \n", "30 Brazil Paris Saint-Germain \n", "39 Brazil Real Madrid CF \n", "51 Brazil Juventus \n", "54 Brazil Liverpool \n", "\n", "[5 rows x 38 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "country_1 = 'Brazil'\n", "country_2 = 'Japan'\n", "\n", "D = df[df['Nationality'].isin([country_1, country_2])].copy()\n", "\n", "D.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Construct two datasets, one with goal-keepers (name it `D_gk`), and one with \"regular\" players (name it `D_reg`). The dataset with regular players should have no goal-keeping statistics, and vice versa." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# clear\n", "D_gk = D[D['Position'] == 'GK'].copy()\n", "D_gk = D_gk[['GK diving', 'GK handling', 'GK kicking', 'GK positioning', 'GK reflexes',\n", " 'Nationality']]\n", "\n", "D_reg = D[D['Position'] != 'GK'].copy()\n", "D_reg = D_reg.drop(['GK diving', 'GK positioning', 'GK handling', \n", " 'GK kicking', 'GK positioning', 'GK reflexes'],1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we can once again subtract the mean, compute the SVD, and add the first two principal components as columns in the dataframes" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# clear\n", "X_reg = D_reg.iloc[:,:-4].copy()\n", "X_gk = D_gk.iloc[:,:-1].copy()\n", "\n", "A = X_reg - X_reg.mean()\n", "B = X_gk - X_gk.mean()\n", "\n", "\n", "U, S, Vt = np.linalg.svd(A, full_matrices = False)\n", "V = Vt.T\n", "\n", "u, s, vt = np.linalg.svd(B, full_matrices = False)\n", "v = vt.T\n", "\n", "D_reg['pc1'] = U[:,0]*S[0]\n", "D_reg['pc2'] = U[:,1]*S[1]\n", "\n", "D_gk['pc1'] = u[:,0]*s[0]\n", "D_gk['pc2'] = u[:,1]*s[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll first compare the goalkeepers, by plotting the first two principal components (use the same `lmplot` code snippet from part 1). Since there are only 5 goalkeeper attributes, we can plot all attributes and see how the two countries stack up." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It appears that Brazilian goalkeepers have a clear advantage in handling, positioning, reflexes, and diving. Kicking is a little more even, but it still looks like Brazil has an advantage.\n", "\n", "Furthermore, the best Brazilian goal-keepers seem to be much better than the best Japanese goal-keepers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's compare the forward players. You will need to first extract the dataset in which `D_reg['Position']=='FWD'`. Then plot the first two principal components and the projections for attributes [2,9,19,21, 24]." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From here, it looks like Japan has many more below-average forwards than Brazil. Nearly all Japanese forwards have below average stamina and reaction, and Brazilian forwards are more likely to have stronger finishing, agility, and shot Power." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compare the mid-fielder players. First extract the dataset in which `D_reg['Position']=='MID'`. Then plot the first two principal components and the projections for attributes [8,12,14,18,24]." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, it seems that Brazilian forwards are more skilled. Even when Japanese mid-fielders are skilled defensive players (so that they are above-average in interceptions), their defensive-minded Brazilian counterparts do not have below-average grades in other skills." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Last, compare the defense players. First extract the dataset in which `D_reg['Position']=='DEF'`. Then plot the first two principal components and the projections for attributes [1,12, 14, 22,24, 26]." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, Japanese defenders are behind Brazilian defenders when it comes to important defensive attributes like interceptions, sliding tackles, aggression, and long passing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It seems clear that Brazil tends to have more skilled football players than Japan, which should be of no surprise due to Brazil's decades of dominance in the sport. While they never played each other in the 2018 World Cup, it should be no surprise that Brazil finished with a better record, and advanced further in the final bracket." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can now repeat the analysis for any two countries of your choice! Can principal component analysis explain any of the results from the last world cup? That is, was it obvious beforehand that France would beat Croatia in the final match? Are there any results that are surprising?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 2 }