openml.extensions.sklearn.SklearnExtension

class openml.extensions.sklearn.SklearnExtension

Connect scikit-learn to OpenML-Python.

classmethod can_handle_flow(flow: 'OpenMLFlow') → bool

Check whether a given describes a scikit-learn estimator.

This is done by parsing the external_version field.

Parameters
flowOpenMLFlow
Returns
bool
classmethod can_handle_model(model: Any) → bool

Check whether a model is an instance of sklearn.base.BaseEstimator.

Parameters
modelAny
Returns
bool
create_setup_string(self, model: Any) → str

Create a string which can be used to reinstantiate the given model.

Parameters
modelAny
Returns
str
flow_to_model(self, flow: 'OpenMLFlow', initialize_with_defaults: bool = False, strict_version: bool = True) → Any

Initializes a sklearn model based on a flow.

Parameters
flowmixed

the object to deserialize (can be flow object, or any serialized parameter value that is accepted by)

initialize_with_defaultsbool, optional (default=False)

If this flag is set, the hyperparameter values of flows will be ignored and a flow with its defaults is returned.

strict_versionbool, default=True

Whether to fail if version requirements are not fulfilled.

Returns
mixed
get_version_information(self) → List[str]

List versions of libraries required by the flow.

Libraries listed are Python, scikit-learn, numpy and scipy.

Returns
List
instantiate_model_from_hpo_class(self, model: Any, trace_iteration: openml.runs.trace.OpenMLTraceIteration) → Any

Instantiate a base_estimator which can be searched over by the hyperparameter optimization model.

Parameters
modelAny

A hyperparameter optimization model which defines the model to be instantiated.

trace_iterationOpenMLTraceIteration

Describing the hyperparameter settings to instantiate.

Returns
Any
is_estimator(self, model: Any) → bool

Check whether the given model is a scikit-learn estimator.

This function is only required for backwards compatibility and will be removed in the near future.

Parameters
modelAny
Returns
bool
model_to_flow(self, model: Any) → 'OpenMLFlow'

Transform a scikit-learn model to a flow for uploading it to OpenML.

Parameters
modelAny
Returns
OpenMLFlow
obtain_parameter_values(self, flow: 'OpenMLFlow', model: Any = None) → List[Dict[str, Any]]

Extracts all parameter settings required for the flow from the model.

If no explicit model is provided, the parameters will be extracted from flow.model instead.

Parameters
flowOpenMLFlow

OpenMLFlow object (containing flow ids, i.e., it has to be downloaded from the server)

model: Any, optional (default=None)

The model from which to obtain the parameter values. Must match the flow signature. If None, use the model specified in OpenMLFlow.model.

Returns
list

A list of dicts, where each dict has the following entries: - oml:name : str: The OpenML parameter name - oml:value : mixed: A representation of the parameter value - oml:component : int: flow id to which the parameter belongs

seed_model(self, model: Any, seed: Union[int, NoneType] = None) → Any

Set the random state of all the unseeded components of a model and return the seeded model.

Required so that all seed information can be uploaded to OpenML for reproducible results.

Models that are already seeded will maintain the seed. In this case, only integer seeds are allowed (An exception is raised when a RandomState was used as seed).

Parameters
modelsklearn model

The model to be seeded

seedint

The seed to initialize the RandomState with. Unseeded subcomponents will be seeded with a random number from the RandomState.

Returns
Any
classmethod trim_flow_name(long_name: str, extra_trim_length: int = 100, _outer: bool = True) → str

Shorten generated sklearn flow name to at most max_length characters.

Flows are assumed to have the following naming structure: (model_selection)? (pipeline)? (steps)+ and will be shortened to: sklearn.(selection.)?(pipeline.)?(steps)+ e.g. (white spaces and newlines added for readability) sklearn.pipeline.Pipeline(

columntransformer=sklearn.compose._column_transformer.ColumnTransformer(
numeric=sklearn.pipeline.Pipeline(

imputer=sklearn.preprocessing.imputation.Imputer, standardscaler=sklearn.preprocessing.data.StandardScaler),

nominal=sklearn.pipeline.Pipeline(

simpleimputer=sklearn.impute.SimpleImputer, onehotencoder=sklearn.preprocessing._encoders.OneHotEncoder)),

variancethreshold=sklearn.feature_selection.variance_threshold.VarianceThreshold, svc=sklearn.svm.classes.SVC)

-> sklearn.Pipeline(ColumnTransformer,VarianceThreshold,SVC)

Parameters
long_namestr

The full flow name generated by the scikit-learn extension.

extra_trim_length: int (default=100)

If the trimmed name would exceed extra_trim_length characters, additional trimming of the short name is performed. This reduces the produced short name length. There is no guarantee the end result will not exceed extra_trim_length.

_outerbool (default=True)

For internal use only. Specifies if the function is called recursively.

Returns
str