openml.extensions.sklearn
.SklearnExtension¶
- class openml.extensions.sklearn.SklearnExtension¶
Connect scikit-learn to OpenML-Python. The estimators which use this extension must be scikit-learn compatible, i.e needs to be a subclass of sklearn.base.BaseEstimator”.
- classmethod can_handle_flow(flow: OpenMLFlow) bool ¶
Check whether a given describes a scikit-learn estimator.
This is done by parsing the
external_version
field.- Parameters:
- flowOpenMLFlow
- Returns:
- bool
- classmethod can_handle_model(model: Any) bool ¶
Check whether a model is an instance of
sklearn.base.BaseEstimator
.- Parameters:
- modelAny
- Returns:
- bool
- check_if_model_fitted(model: Any) bool ¶
Returns True/False denoting if the model has already been fitted/trained
- Parameters:
- modelAny
- Returns:
- bool
- create_setup_string(model: Any) str ¶
Create a string which can be used to reinstantiate the given model.
- Parameters:
- modelAny
- Returns:
- str
- flow_to_model(flow: OpenMLFlow, initialize_with_defaults: bool = False, strict_version: bool = True) Any ¶
Initializes a sklearn model based on a flow.
- Parameters:
- flowmixed
the object to deserialize (can be flow object, or any serialized parameter value that is accepted by)
- initialize_with_defaultsbool, optional (default=False)
If this flag is set, the hyperparameter values of flows will be ignored and a flow with its defaults is returned.
- strict_versionbool, default=True
Whether to fail if version requirements are not fulfilled.
- Returns:
- mixed
- get_version_information() List[str] ¶
List versions of libraries required by the flow.
Libraries listed are
Python
,scikit-learn
,numpy
andscipy
.- Returns:
- List
- instantiate_model_from_hpo_class(model: Any, trace_iteration: OpenMLTraceIteration) Any ¶
Instantiate a
base_estimator
which can be searched over by the hyperparameter optimization model.- Parameters:
- modelAny
A hyperparameter optimization model which defines the model to be instantiated.
- trace_iterationOpenMLTraceIteration
Describing the hyperparameter settings to instantiate.
- Returns:
- Any
- is_estimator(model: Any) bool ¶
Check whether the given model is a scikit-learn estimator.
This function is only required for backwards compatibility and will be removed in the near future.
- Parameters:
- modelAny
- Returns:
- bool
- model_to_flow(model: Any) OpenMLFlow ¶
Transform a scikit-learn model to a flow for uploading it to OpenML.
- Parameters:
- modelAny
- Returns:
- OpenMLFlow
- obtain_parameter_values(flow: OpenMLFlow, model: Any | None = None) List[Dict[str, Any]] ¶
Extracts all parameter settings required for the flow from the model.
If no explicit model is provided, the parameters will be extracted from flow.model instead.
- Parameters:
- flowOpenMLFlow
OpenMLFlow object (containing flow ids, i.e., it has to be downloaded from the server)
- model: Any, optional (default=None)
The model from which to obtain the parameter values. Must match the flow signature. If None, use the model specified in
OpenMLFlow.model
.
- Returns:
- list
A list of dicts, where each dict has the following entries: -
oml:name
: str: The OpenML parameter name -oml:value
: mixed: A representation of the parameter value -oml:component
: int: flow id to which the parameter belongs
- seed_model(model: Any, seed: int | None = None) Any ¶
Set the random state of all the unseeded components of a model and return the seeded model.
Required so that all seed information can be uploaded to OpenML for reproducible results.
Models that are already seeded will maintain the seed. In this case, only integer seeds are allowed (An exception is raised when a RandomState was used as seed).
- Parameters:
- modelsklearn model
The model to be seeded
- seedint
The seed to initialize the RandomState with. Unseeded subcomponents will be seeded with a random number from the RandomState.
- Returns:
- Any
- classmethod trim_flow_name(long_name: str, extra_trim_length: int = 100, _outer: bool = True) str ¶
Shorten generated sklearn flow name to at most
max_length
characters.Flows are assumed to have the following naming structure:
(model_selection)? (pipeline)? (steps)+
and will be shortened to:sklearn.(selection.)?(pipeline.)?(steps)+
e.g. (white spaces and newlines added for readability)sklearn.pipeline.Pipeline( columntransformer=sklearn.compose._column_transformer.ColumnTransformer( numeric=sklearn.pipeline.Pipeline( imputer=sklearn.preprocessing.imputation.Imputer, standardscaler=sklearn.preprocessing.data.StandardScaler), nominal=sklearn.pipeline.Pipeline( simpleimputer=sklearn.impute.SimpleImputer, onehotencoder=sklearn.preprocessing._encoders.OneHotEncoder)), variancethreshold=sklearn.feature_selection.variance_threshold.VarianceThreshold, svc=sklearn.svm.classes.SVC)
->
sklearn.Pipeline(ColumnTransformer,VarianceThreshold,SVC)
- Parameters:
- long_namestr
The full flow name generated by the scikit-learn extension.
- extra_trim_length: int (default=100)
If the trimmed name would exceed extra_trim_length characters, additional trimming of the short name is performed. This reduces the produced short name length. There is no guarantee the end result will not exceed extra_trim_length.
- _outerbool (default=True)
For internal use only. Specifies if the function is called recursively.
- Returns:
- str