evalml.automl.make_data_splitter

evalml.automl.make_data_splitter(X, y, problem_type, problem_configuration=None, n_splits=3, shuffle=True, random_state=0)[source]

Given the training data and ML problem parameters, compute a data splitting method to use during AutoML search.

Parameters
  • X (pd.DataFrame, ww.DataTable) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, ww.DataColumn) – The target training data of length [n_samples].

  • problem_type (ProblemType) – the type of machine learning problem.

  • problem_configuration (dict, None) – Additional parameters needed to configure the search. For example, in time series problems, values should be passed in for the gap and max_delay variables.

  • n_splits (int, None) – the number of CV splits, if applicable. Default 3.

  • shuffle (bool) – whether or not to shuffle the data before splitting, if applicable. Default True.

  • random_state (int, np.random.RandomState) – The random seed/state. Defaults to 0.

Returns

data splitting method.

Return type

sklearn.model_selection.BaseCrossValidator