# Objectives#

## Overview#

One of the key choices to make when training an ML model is what metric to choose by which to measure the efficacy of the model at learning the signal. Such metrics are useful for comparing how well the trained models generalize to new similar data.

This choice of metric is a key component of AutoML because it defines the cost function the AutoML search will seek to optimize. In EvalML, these metrics are called **objectives**. AutoML will seek to minimize (or maximize) the objective score as it explores more pipelines and parameters and will use the feedback from scoring pipelines to tune the available hyperparameters and continue the search. Therefore, it is critical to have an objective function that represents how the model will be
applied in the intended domain of use.

EvalML supports a variety of objectives from traditional supervised ML including mean squared error for regression problems and cross entropy or area under the ROC curve for classification problems. EvalML also allows the user to define a custom objective using their domain expertise, so that AutoML can search for models which provide the most value for the user’s problem.

### Optimization vs Ranking Objectives#

There are many common objectives used for evaluating model performance. However, not all of these objectives should be used to optimize AutoMLSearch. Consider the popular objective `recall`

, which is the number of true positives divided by the number of true positives and false negatives. If the model has no false negatives, the `recall`

ends up being a perfect score of 1. During automatic optimization, models can exploit this by predicting the positive label in every case, making a
completely useless but seemingly highly performant model. However, this objective is still useful when trying to evaluate performance after a model has been trained.

Due to this potential issue, we define two types of objectives: optimization and ranking. Optimization objectives are those that can be used within AutoMLSearch to train performant models. Ranking objectives can be used after AutoMLSearch has been run, to rank or otherwise evaluate model performance. These include all of the optimization metrics, as well as all other important metrics such as recall that are excluded from optimization.

Note that we also define a third class of objectives, non-core objectives, which are domain-specific and require additional configuration before they can be used.

## Optimization Objectives#

Use the `get_optimization_objectives`

method to get a list of which objectives can be used for optimization in AutoMLSearch for each problem type:

```
[1]:
```

```
from evalml.objectives import get_optimization_objectives
from evalml.problem_types import ProblemTypes
for objective in get_optimization_objectives(ProblemTypes.BINARY):
print(objective.name)
```

```
MCC Binary
Log Loss Binary
Gini
AUC
Precision
F1
Balanced Accuracy Binary
Accuracy Binary
```

## Ranking Objectives#

Use the `get_ranking_objectives`

method to get a list of which objectives are included with EvalML for each problem type:

```
[2]:
```

```
from evalml.objectives import get_ranking_objectives
for objective in get_ranking_objectives(ProblemTypes.BINARY):
print(objective.name)
```

```
MCC Binary
Log Loss Binary
Gini
AUC
Recall
Precision
F1
Balanced Accuracy Binary
Accuracy Binary
```

EvalML defines a base objective class for each problem type: `RegressionObjective`

, `BinaryClassificationObjective`

and `MulticlassClassificationObjective`

. All EvalML objectives are a subclass of one of these.

### Binary Classification Objectives and Thresholds#

All binary classification objectives have a `threshold`

property. Some binary classification objectives like log loss and AUC are unaffected by the choice of binary classification threshold, because they score based on predicted probabilities or examine a range of threshold values. These metrics are defined with `score_needs_proba`

set to False. For all other binary classification objectives, we can compute the optimal binary classification threshold from the predicted probabilities and the
target.

```
[3]:
```

```
from evalml.pipelines import BinaryClassificationPipeline
from evalml.demos import load_fraud
from evalml.objectives import F1
X, y = load_fraud(n_rows=100)
X.ww.init(
logical_types={
"provider": "Categorical",
"region": "Categorical",
"currency": "Categorical",
"expiration_date": "Categorical",
}
)
objective = F1()
pipeline = BinaryClassificationPipeline(
component_graph=[
"Imputer",
"DateTime Featurizer",
"One Hot Encoder",
"Random Forest Classifier",
]
)
pipeline.fit(X, y)
print(pipeline.threshold)
print(pipeline.score(X, y, objectives=[objective]))
y_pred_proba = pipeline.predict_proba(X)[True]
pipeline.threshold = objective.optimize_threshold(y_pred_proba, y)
print(pipeline.threshold)
print(pipeline.score(X, y, objectives=[objective]))
```

```
Number of Features
Boolean 1
Categorical 6
Numeric 5
Number of training examples: 100
Targets
False 91.00%
True 9.00%
Name: count, dtype: object
```

```
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
```

```
None
OrderedDict([('F1', 1.0)])
```

```
0.37905689607742854
OrderedDict([('F1', 1.0)])
```

## Custom Objectives#

Often times, the objective function is very specific to the use-case or business problem. To get the right objective to optimize requires thinking through the decisions or actions that will be taken using the model and assigning a cost/benefit to doing that correctly or incorrectly based on known outcomes in the training data.

Once you have determined the objective for your business, you can provide that to EvalML to optimize by defining a custom objective function.

### Defining a Custom Objective Function#

To create a custom objective class, we must define several elements:

`name`

: The printable name of this objective.`objective_function`

: This function takes the predictions, true labels, and an optional reference to the inputs, and returns a score of how well the model performed.`greater_is_better`

:`True`

if a higher`objective_function`

value represents a better solution, and otherwise`False`

.`score_needs_proba`

: Only for classification objectives.`True`

if the objective is intended to function with predicted probabilities as opposed to predicted values (example: cross entropy for classifiers).`decision_function`

: Only for binary classification objectives. This function takes predicted probabilities that were output from the model and a binary classification threshold, and returns predicted values.`perfect_score`

: The score achieved by a perfect model on this objective.`expected_range`

: The expected range of values we want this objective to output, which doesn’t necessarily have to be equal to the possible range of values. For example, our expected R2 range is from`[-1, 1]`

, although the actual range is`(-inf, 1]`

.

### Example: Fraud Detection#

To give a concrete example, let’s look at how the fraud detection objective function is built.

```
[4]:
```

```
from evalml.objectives.binary_classification_objective import (
BinaryClassificationObjective,
)
import pandas as pd
class FraudCost(BinaryClassificationObjective):
"""Score the percentage of money lost of the total transaction amount process due to fraud"""
name = "Fraud Cost"
greater_is_better = False
score_needs_proba = False
perfect_score = 0.0
def __init__(
self,
retry_percentage=0.5,
interchange_fee=0.02,
fraud_payout_percentage=1.0,
amount_col="amount",
):
"""Create instance of FraudCost
Args:
retry_percentage (float): What percentage of customers that will retry a transaction if it
is declined. Between 0 and 1. Defaults to .5
interchange_fee (float): How much of each successful transaction you can collect.
Between 0 and 1. Defaults to .02
fraud_payout_percentage (float): Percentage of fraud you will not be able to collect.
Between 0 and 1. Defaults to 1.0
amount_col (str): Name of column in data that contains the amount. Defaults to "amount"
"""
self.retry_percentage = retry_percentage
self.interchange_fee = interchange_fee
self.fraud_payout_percentage = fraud_payout_percentage
self.amount_col = amount_col
def decision_function(self, ypred_proba, threshold=0.0, X=None):
"""Determine if a transaction is fraud given predicted probabilities, threshold, and dataframe with transaction amount
Args:
ypred_proba (pd.Series): Predicted probablities
X (pd.DataFrame): Dataframe containing transaction amount
threshold (float): Dollar threshold to determine if transaction is fraud
Returns:
pd.Series: Series of predicted fraud labels using X and threshold
"""
if not isinstance(X, pd.DataFrame):
X = pd.DataFrame(X)
if not isinstance(ypred_proba, pd.Series):
ypred_proba = pd.Series(ypred_proba)
transformed_probs = ypred_proba.values * X[self.amount_col]
return transformed_probs > threshold
def objective_function(self, y_true, y_predicted, X):
"""Calculate amount lost to fraud per transaction given predictions, true values, and dataframe with transaction amount
Args:
y_predicted (pd.Series): predicted fraud labels
y_true (pd.Series): true fraud labels
X (pd.DataFrame): dataframe with transaction amounts
Returns:
float: amount lost to fraud per transaction
"""
if not isinstance(X, pd.DataFrame):
X = pd.DataFrame(X)
if not isinstance(y_predicted, pd.Series):
y_predicted = pd.Series(y_predicted)
if not isinstance(y_true, pd.Series):
y_true = pd.Series(y_true)
# extract transaction using the amount columns in users data
try:
transaction_amount = X[self.amount_col]
except KeyError:
raise ValueError("`{}` is not a valid column in X.".format(self.amount_col))
# amount paid if transaction is fraud
fraud_cost = transaction_amount * self.fraud_payout_percentage
# money made from interchange fees on transaction
interchange_cost = (
transaction_amount * (1 - self.retry_percentage) * self.interchange_fee
)
# calculate cost of missing fraudulent transactions
false_negatives = (y_true & ~y_predicted) * fraud_cost
# calculate money lost from fees
false_positives = (~y_true & y_predicted) * interchange_cost
loss = false_negatives.sum() + false_positives.sum()
loss_per_total_processed = loss / transaction_amount.sum()
return loss_per_total_processed
```