mleko.dataset.data_schema
#
Module for DataSchema class, used for storing type information about the dataset.
Module Contents#
Classes#
DataSchema class for storing type information about the dataset. |
Attributes#
- mleko.dataset.data_schema.logger#
The logger for the module.
- mleko.dataset.data_schema.DataType#
Type alias for the data types.
- class mleko.dataset.data_schema.DataSchema(numerical: list[str] | tuple[str, Ellipsis] | tuple[] = (), categorical: list[str] | tuple[str, Ellipsis] | tuple[] = (), boolean: list[str] | tuple[str, Ellipsis] | tuple[] = (), datetime: list[str] | tuple[str, Ellipsis] | tuple[] = (), timedelta: list[str] | tuple[str, Ellipsis] | tuple[] = ())#
DataSchema class for storing type information about the dataset.
Initialize DataSchema with the given features.
- Parameters:
numerical (list[str] | tuple[str, Ellipsis] | tuple[]) – List of numerical features.
categorical (list[str] | tuple[str, Ellipsis] | tuple[]) – List of categorical features.
boolean (list[str] | tuple[str, Ellipsis] | tuple[]) – List of boolean features.
datetime (list[str] | tuple[str, Ellipsis] | tuple[]) – List of datetime features.
timedelta (list[str] | tuple[str, Ellipsis] | tuple[]) – List of timedelta features.
- Raises:
ValueError – If feature names are not unique across all types.
- get_features(types: list[DataType] | tuple[DataType, Ellipsis] | tuple[] = ()) list[str] #
Get features of a given type.
If no type is specified, all features are returned.
- Parameters:
types (list[DataType] | tuple[DataType, Ellipsis] | tuple[]) – List of data types to be returned.
- Returns:
List of features of a given type.
- Return type:
list[str]
- get_type(feature: str) DataType #
Get the type of a given feature.
- Parameters:
feature (str) – Feature name.
- Raises:
ValueError – If feature is not found in the schema.
- Returns:
Feature data type.
- Return type:
DataType
- drop_features(features: set[str] | list[str] | tuple[str, Ellipsis] | tuple[]) DataSchema #
Drop a feature from the DataSchema.
- Parameters:
features (set[str] | list[str] | tuple[str, Ellipsis] | tuple[]) – List of feature names to be dropped.
- Return type:
- add_feature(feature: str, dtype: DataType) DataSchema #
Add a feature to the DataSchema.
- Parameters:
feature (str) – Feature name.
dtype (DataType) – Feature data type.
- Raises:
ValueError – If feature is already present in the schema.
- Return type:
- change_feature_type(feature: str, dtype: DataType) DataSchema #
Change the type of a feature in the DataSchema.
- Parameters:
feature (str) – Feature name.
dtype (DataType) – Feature data type.
- Raises:
ValueError – If feature is not present in the schema.
- Return type:
- to_dict() dict[str, list[str]] #
Return the dict representation of DataSchema.
- Returns:
Dict representation of DataSchema.
- Return type:
dict[str, list[str]]
- copy() DataSchema #
Create a copy of this DataSchema.
- Returns:
A copy of this DataSchema.
- Return type: