openml.extensions.sklearn.SklearnExtension

class openml.extensions.sklearn.SklearnExtension

Connect scikit-learn to OpenML-Python. The estimators which use this extension must be scikit-learn compatible, i.e needs to be a subclass of sklearn.base.BaseEstimator”.

classmethod can_handle_flow(flow: OpenMLFlow) bool

Check whether a given describes a scikit-learn estimator.

This is done by parsing the external_version field.

Parameters:
flowOpenMLFlow
Returns:
bool
classmethod can_handle_model(model: Any) bool

Check whether a model is an instance of sklearn.base.BaseEstimator.

Parameters:
modelAny
Returns:
bool
check_if_model_fitted(model: Any) bool

Returns True/False denoting if the model has already been fitted/trained

Parameters:
modelAny
Returns:
bool
create_setup_string(model: Any) str

Create a string which can be used to reinstantiate the given model.

Parameters:
modelAny
Returns:
str
flow_to_model(flow: OpenMLFlow, initialize_with_defaults: bool = False, strict_version: bool = True) Any

Initializes a sklearn model based on a flow.

Parameters:
flowmixed

the object to deserialize (can be flow object, or any serialized parameter value that is accepted by)

initialize_with_defaultsbool, optional (default=False)

If this flag is set, the hyperparameter values of flows will be ignored and a flow with its defaults is returned.

strict_versionbool, default=True

Whether to fail if version requirements are not fulfilled.

Returns:
mixed
get_version_information() List[str]

List versions of libraries required by the flow.

Libraries listed are Python, scikit-learn, numpy and scipy.

Returns:
List
instantiate_model_from_hpo_class(model: Any, trace_iteration: OpenMLTraceIteration) Any

Instantiate a base_estimator which can be searched over by the hyperparameter optimization model.

Parameters:
modelAny

A hyperparameter optimization model which defines the model to be instantiated.

trace_iterationOpenMLTraceIteration

Describing the hyperparameter settings to instantiate.

Returns:
Any
is_estimator(model: Any) bool

Check whether the given model is a scikit-learn estimator.

This function is only required for backwards compatibility and will be removed in the near future.

Parameters:
modelAny
Returns:
bool
model_to_flow(model: Any) OpenMLFlow

Transform a scikit-learn model to a flow for uploading it to OpenML.

Parameters:
modelAny
Returns:
OpenMLFlow
obtain_parameter_values(flow: OpenMLFlow, model: Any | None = None) List[Dict[str, Any]]

Extracts all parameter settings required for the flow from the model.

If no explicit model is provided, the parameters will be extracted from flow.model instead.

Parameters:
flowOpenMLFlow

OpenMLFlow object (containing flow ids, i.e., it has to be downloaded from the server)

model: Any, optional (default=None)

The model from which to obtain the parameter values. Must match the flow signature. If None, use the model specified in OpenMLFlow.model.

Returns:
list

A list of dicts, where each dict has the following entries: - oml:name : str: The OpenML parameter name - oml:value : mixed: A representation of the parameter value - oml:component : int: flow id to which the parameter belongs

seed_model(model: Any, seed: int | None = None) Any

Set the random state of all the unseeded components of a model and return the seeded model.

Required so that all seed information can be uploaded to OpenML for reproducible results.

Models that are already seeded will maintain the seed. In this case, only integer seeds are allowed (An exception is raised when a RandomState was used as seed).

Parameters:
modelsklearn model

The model to be seeded

seedint

The seed to initialize the RandomState with. Unseeded subcomponents will be seeded with a random number from the RandomState.

Returns:
Any
classmethod trim_flow_name(long_name: str, extra_trim_length: int = 100, _outer: bool = True) str

Shorten generated sklearn flow name to at most max_length characters.

Flows are assumed to have the following naming structure: (model_selection)? (pipeline)? (steps)+ and will be shortened to: sklearn.(selection.)?(pipeline.)?(steps)+ e.g. (white spaces and newlines added for readability)

sklearn.pipeline.Pipeline(
    columntransformer=sklearn.compose._column_transformer.ColumnTransformer(
        numeric=sklearn.pipeline.Pipeline(
            imputer=sklearn.preprocessing.imputation.Imputer,
            standardscaler=sklearn.preprocessing.data.StandardScaler),
        nominal=sklearn.pipeline.Pipeline(
            simpleimputer=sklearn.impute.SimpleImputer,
            onehotencoder=sklearn.preprocessing._encoders.OneHotEncoder)),
    variancethreshold=sklearn.feature_selection.variance_threshold.VarianceThreshold,
    svc=sklearn.svm.classes.SVC)

-> sklearn.Pipeline(ColumnTransformer,VarianceThreshold,SVC)

Parameters:
long_namestr

The full flow name generated by the scikit-learn extension.

extra_trim_length: int (default=100)

If the trimmed name would exceed extra_trim_length characters, additional trimming of the short name is performed. This reduces the produced short name length. There is no guarantee the end result will not exceed extra_trim_length.

_outerbool (default=True)

For internal use only. Specifies if the function is called recursively.

Returns:
str