`openml.extensions.sklearn`.SklearnExtension¶

class openml.extensions.sklearn.SklearnExtension¶

Connect scikit-learn to OpenML-Python. The estimators which use this extension must be scikit-learn compatible, i.e needs to be a subclass of sklearn.base.BaseEstimator”.

classmethod can_handle_flow(flow: OpenMLFlow) → bool¶

Check whether a given describes a scikit-learn estimator.

This is done by parsing the external_version field.

Parameters:

flowOpenMLFlow

Returns:

bool

classmethod can_handle_model(model: Any) → bool¶

Check whether a model is an instance of sklearn.base.BaseEstimator.

Parameters:

modelAny

Returns:

bool

check_if_model_fitted(model: Any) → bool¶

Returns True/False denoting if the model has already been fitted/trained

Parameters:

modelAny

Returns:

bool

create_setup_string(model: Any) → str¶

Create a string which can be used to reinstantiate the given model.

Parameters:

modelAny

Returns:

str

flow_to_model(flow: OpenMLFlow, initialize_with_defaults: bool = False, strict_version: bool = True) → Any¶

Initializes a sklearn model based on a flow.

Parameters:

flowmixed: the object to deserialize (can be flow object, or any serialized parameter value that is accepted by)
initialize_with_defaultsbool, optional (default=False): If this flag is set, the hyperparameter values of flows will be ignored and a flow with its defaults is returned.
strict_versionbool, default=True: Whether to fail if version requirements are not fulfilled.

Returns:

mixed

get_version_information() → list[str]¶

List versions of libraries required by the flow.

Libraries listed are Python, scikit-learn, numpy and scipy.

Returns:

List

instantiate_model_from_hpo_class(model: Any, trace_iteration: OpenMLTraceIteration) → Any¶

Instantiate a base_estimator which can be searched over by the hyperparameter optimization model.

Parameters:

modelAny: A hyperparameter optimization model which defines the model to be instantiated.
trace_iterationOpenMLTraceIteration: Describing the hyperparameter settings to instantiate.

Returns:

Any

is_estimator(model: Any) → bool¶

Check whether the given model is a scikit-learn estimator.

This function is only required for backwards compatibility and will be removed in the near future.

Parameters:

modelAny

Returns:

bool

model_to_flow(model: Any) → OpenMLFlow¶

Transform a scikit-learn model to a flow for uploading it to OpenML.

Parameters:

modelAny

Returns:

OpenMLFlow

obtain_parameter_values(flow: OpenMLFlow, model: Any = None) → list[dict[str, Any]]¶

Extracts all parameter settings required for the flow from the model.

If no explicit model is provided, the parameters will be extracted from flow.model instead.

Parameters:

flowOpenMLFlow: OpenMLFlow object (containing flow ids, i.e., it has to be downloaded from the server)
model: Any, optional (default=None): The model from which to obtain the parameter values. Must match the flow signature. If None, use the model specified in OpenMLFlow.model.

Returns:

list: A list of dicts, where each dict has the following entries: - oml:name : str: The OpenML parameter name - oml:value : mixed: A representation of the parameter value - oml:component : int: flow id to which the parameter belongs

seed_model(model: Any, seed: int | None = None) → Any¶

Set the random state of all the unseeded components of a model and return the seeded model.

Required so that all seed information can be uploaded to OpenML for reproducible results.

Models that are already seeded will maintain the seed. In this case, only integer seeds are allowed (An exception is raised when a RandomState was used as seed).

Parameters:

modelsklearn model: The model to be seeded
seedint: The seed to initialize the RandomState with. Unseeded subcomponents will be seeded with a random number from the RandomState.

Returns:

Any

classmethod trim_flow_name(long_name: str, extra_trim_length: int = 100, _outer: bool = True) → str¶

Shorten generated sklearn flow name to at most max_length characters.

Flows are assumed to have the following naming structure: (model_selection)? (pipeline)? (steps)+ and will be shortened to: sklearn.(selection.)?(pipeline.)?(steps)+ e.g. (white spaces and newlines added for readability)

sklearn.pipeline.Pipeline(
    columntransformer=sklearn.compose._column_transformer.ColumnTransformer(
        numeric=sklearn.pipeline.Pipeline(
            imputer=sklearn.preprocessing.imputation.Imputer,
            standardscaler=sklearn.preprocessing.data.StandardScaler),
        nominal=sklearn.pipeline.Pipeline(
            simpleimputer=sklearn.impute.SimpleImputer,
            onehotencoder=sklearn.preprocessing._encoders.OneHotEncoder)),
    variancethreshold=sklearn.feature_selection.variance_threshold.VarianceThreshold,
    svc=sklearn.svm.classes.SVC)

-> sklearn.Pipeline(ColumnTransformer,VarianceThreshold,SVC)

Parameters:

long_namestr: The full flow name generated by the scikit-learn extension.
extra_trim_length: int (default=100): If the trimmed name would exceed extra_trim_length characters, additional trimming of the short name is performed. This reduces the produced short name length. There is no guarantee the end result will not exceed extra_trim_length.
_outerbool (default=True): For internal use only. Specifies if the function is called recursively.

Returns:

str