{ "cells": [ { "cell_type": "markdown", "metadata": { "comet_cell_id": "b4bcefbe0a41f" }, "source": [ "Before beginning task 3, make sure to run the following cell to import all necessary packages. If you need any additional packages, add the import statement(s) to the cell below and re-run the cell before adding and running code that uses the additional packages.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "comet_cell_id": "2c7cd3e562e7" }, "outputs": [], "source": [ "# Load all necessary packages\n", "import numpy as np\n", "import sklearn as skl\n", "import six\n", "\n", "# dataset\n", "from aif360.datasets import AdultDataset\n", "\n", "# models\n", "from sklearn.linear_model.logistic import LogisticRegression \n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.ensemble import RandomForestClassifier \n", "from sklearn.svm import SVC \n", "\n", "# metric\n", "from sklearn.metrics import accuracy_score" ] }, { "cell_type": "markdown", "metadata": { "comet_cell_id": "46991f420accb" }, "source": [ "# Tutorial 3: scikit-learn\n", "\n", "Now we show you how to train and evaluate models using scikit-learn. You will use the knowledge from this tutorial to complete Task 3, so please read thoroughly and execute the code cells in order.\n", "\n", "## Step 1: Import the dataset\n", "\n", "First we need to import the dataset we will use for training and testing our model.\n", "\n", "Below, we provide code that imports the Adult dataset. **Note: a warning may pop up when you run this cell. As long as you don't see any errors in the code, it is fine to continue.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "comet_cell_id": "936840797dfba" }, "outputs": [], "source": [ "data_orig = AdultDataset()" ] }, { "cell_type": "markdown", "metadata": { "comet_cell_id": "1f7cb8ab1c822" }, "source": [ "## Step 2: Split the dataset into train and test data\n", "\n", "Now that the dataset has been imported, we need to split the original dataset into training and test data. \n", "\n", "The code to do so is as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "comet_cell_id": "8f3e98f0712d1" }, "outputs": [], "source": [ "data_orig_train, data_orig_test = data_orig.split([0.7], shuffle=False)" ] }, { "cell_type": "markdown", "metadata": { "comet_cell_id": "8e6392efce817" }, "source": [ "## Step 3: Initialize model \n", "\n", "Next, we need to initialize our model. We can initialize a model with the default parameters (see documentation), no parameters (which initializes with default parameter values), or we can modify parameter values.\n", "\n", "For the tutorial, we use the Logistic Regression model with default hyper-parameter values; you will be able to use any of the scikit-learn models listed above, and modify hyper-parameter values, when completing the exercise. \n", "\n", "Below we provide code for initialzing the Logistic Regression model, with default hyper-parameter values. We also provide (commented) code that reminds you of how to initialize each model available during this exercise." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "comet_cell_id": "e89b66337a2a6" }, "outputs": [], "source": [ "# model is populated with default values; modifying parameters is allowed but optional\n", "model = LogisticRegression(penalty='l2', dual=False,tol=0.0001,C=1.0,\n", " fit_intercept=True,intercept_scaling=1,class_weight=None,\n", " random_state=None,solver='liblinear',max_iter=100, \n", " multi_class='warn',verbose=0,warm_start=False,\n", " n_jobs=None)\n", "\n", "#model = KNeighborsClassifier(n_neighbors=5,weights='uniform',algorithm='auto',\n", "# leaf_size=30,p=2,metric='minkowski',metric_params=None,\n", "# n_jobs=None)\n", "\n", "#model = RandomForestClassifier(n_estimators='warn',criterion='gini',max_depth=None,\n", "# min_samples_leaf=1,min_weight_fraction_leaf=0.0,\n", "# min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None, \n", "# random_state=None, verbose=0, warm_start=False, class_weight=None)\n", "\n", "#model = SVC(C=1.0, kernel='rbf', degree=3, gamma='auto_deprecated', coef0=0.0, shrinking=True, \n", "# probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, \n", "# max_iter=-1, decision_function_shape='ovr', random_state=None)" ] }, { "cell_type": "markdown", "metadata": { "comet_cell_id": "7bba44eb7e995" }, "source": [ "## Step 4: Train the model\n", "\n", "After initialing the model, we train it using the training dataset. \n", "\n", "Below we provide code that prepares our dataset to be used with scikit-learn and trains the model using our prepared data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "comet_cell_id": "470c3e83a0934" }, "outputs": [], "source": [ "# prepare data for use with scikit-learn\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "scaler = StandardScaler()\n", "\n", "x_train = scaler.fit_transform(data_orig_train.features)\n", "y_train = data_orig_train.labels.ravel()\n", "\n", "\n", "model.fit(x_train, y_train)" ] }, { "cell_type": "markdown", "metadata": { "comet_cell_id": "483779469d731" }, "source": [ "## Step 5: Evaluate the model\n", "\n", "Now we're ready to evaluate your trained model with the test data using the performance metric provided by scikit-learn.\n", "\n", "Below we provide code snippets that show how to evaluate a model's performance using scikit-learn." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "comet_cell_id": "f3c98baf23fd4" }, "outputs": [ { "ename": "NameError", "evalue": "name 'lr' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mpredictions\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpredict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata_orig_test\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0maccuracy\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maccuracy_score\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata_orig_test\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlabels\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpredictions\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'lr' is not defined" ] } ], "source": [ "x_test = scaler.fit_transform(data_orig_test.features)\n", "\n", "predictions = model.predict(x_test)\n", "accuracy = accuracy_score(data_orig_test.labels.ravel(), predictions)\n", "\n", "print ('Accuracy = ' + str(accuracy))\n" ] }, { "cell_type": "markdown", "metadata": { "comet_cell_id": "f42e810454232" }, "source": [ "# Task 3: Model evaluation with scikit-learn\n", "\n", "Your turn! Use what you learned in the above tutorial to train and evaluate models for performance, fairness, and overall quality. You will use functionality provided by scikit-learn to meet the following goals:\n", "\n", "1. **Describe a model you believe will perform the best (e.g., have the highest accuracy score).** \n", "\n", "2. **Describe a model you believe will be the most fair, regardless of performance.** \n", "\n", "3. **Describe a model you believe will best balance both performance and fairness.** \n", "\n", "Make sure you include any modifications to model hyper-parameters.\n", "\n", "**Keep in mind, training machine learning models is often a time intensive endeavor.** One way you can minimize time to finish the assignment is to minimize the times you have to, for example, train a given model to then evaluate it by putting the code that initializes and trains your model(s) in its own separate cell.\n", "\n", "\n", "## Submitting your response \n", "\n", "Once you feel you've met the above goals, go to the Evaluating ML Models Exercise Response Form to enter your responses under the section labeled 'Task 3'.\n", "\n", "If you accidentally closed your response form, check your email for the link to re-open it." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "comet_cell_id": "9d75f2fa60fd2" }, "outputs": [], "source": [ "# TODO : Use this cell to write code for completing task 3\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "comet_cell_id": "23c64e4f35267" }, "source": [ "Once you've completed this final task, make sure you're satisfied with your responses, complete the exercise feedback portion and submit the form." ] } ], "metadata": { "comet_paths": [ [ "4c7b42fa/ML Model Eval Assignment.ipynb", 1567249528393 ], [ "008a0d50/Task_3.ipynb", 1567538778080 ], [ "db4861d2/Task_3.ipynb", 1567621567465 ] ], "comet_tracking": true, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }