evalml.preprocessing.split_data

evalml.preprocessing.split_data(X, y, problem_type, problem_configuration=None, test_size=0.2, random_state=0)[source]

splits data into train and test sets.

Parameters
  • X (ww.datatable, pd.dataframe or np.ndarray) – data of shape [n_samples, n_features]

  • y (ww.datacolumn, pd.series, or np.ndarray) – target data of length [n_samples]

  • problem_type (str or problemtypes) – type of supervised learning problem. see evalml.problem_types.problemtype.all_problem_types for a full list.

  • problem_configuration (dict, None) – Additional parameters needed to configure the search. For example, in time series problems, values should be passed in for the gap and max_delay variables.

  • test_size (float) – What percentage of data points should be included in the test set. Defaults to 0.2 (20%).

  • random_state (int, np.random.RandomState) – Seed for the random number generator

Returns

Feature and target data each split into train and test sets

Return type

ww.DataTable, ww.DataTable, ww.DataColumn, ww.DataColumn