evalml.automl.AutoMLSearch.__init__

AutoMLSearch.__init__(problem_type=None, objective='auto', max_pipelines=None, max_time=None, patience=None, tolerance=None, data_split=None, allowed_pipelines=None, allowed_model_families=None, start_iteration_callback=None, add_result_callback=None, additional_objectives=None, random_state=0, n_jobs=-1, tuner_class=None, verbose=True, optimize_thresholds=False, _max_batches=None)[source]

Automated pipeline search

Parameters
  • problem_type (str or ProblemTypes) – Choice of ‘regression’, ‘binary’, or ‘multiclass’, depending on the desired problem type.

  • objective (str, ObjectiveBase) – The objective to optimize for. When set to auto, chooses: LogLossBinary for binary classification problems, LogLossMulticlass for multiclass classification problems, and R2 for regression problems.

  • max_pipelines (int) – Maximum number of pipelines to search. If max_pipelines and max_time is not set, then max_pipelines will default to max_pipelines of 5.

  • max_time (int, str) – Maximum time to search for pipelines. This will not start a new pipeline search after the duration has elapsed. If it is an integer, then the time will be in seconds. For strings, time can be specified as seconds, minutes, or hours.

  • patience (int) – Number of iterations without improvement to stop search early. Must be positive. If None, early stopping is disabled. Defaults to None.

  • tolerance (float) – Minimum percentage difference to qualify as score improvement for early stopping. Only applicable if patience is not None. Defaults to None.

  • allowed_pipelines (list(class)) – A list of PipelineBase subclasses indicating the pipelines allowed in the search. The default of None indicates all pipelines for this problem type are allowed. Setting this field will cause allowed_model_families to be ignored.

  • allowed_model_families (list(str, ModelFamily)) – The model families to search. The default of None searches over all model families. Run evalml.pipelines.components.utils.allowed_model_families(“binary”) to see options. Change binary to multiclass or regression depending on the problem type. Note that if allowed_pipelines is provided, this parameter will be ignored.

  • data_split (sklearn.model_selection.BaseCrossValidator) – data splitting method to use. Defaults to StratifiedKFold.

  • tuner_class – the tuner class to use. Defaults to scikit-optimize tuner

  • start_iteration_callback (callable) – function called before each pipeline training iteration. Passed three parameters: pipeline_class, parameters, and the AutoMLSearch object.

  • add_result_callback (callable) – function called after each pipeline training iteration. Passed three parameters: A dictionary containing the training results for the new pipeline, an untrained_pipeline containing the parameters used during training, and the AutoMLSearch object.

  • additional_objectives (list) – Custom set of objectives to score on. Will override default objectives for problem type if not empty.

  • random_state (int, np.random.RandomState) – The random seed/state. Defaults to 0.

  • n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used.

  • verbose (boolean) – If True, turn verbosity on. Defaults to True

  • _max_batches (int) – The maximum number of batches of pipelines to search. Parameters max_time, and max_pipelines have precedence over stopping the search.