mleko.dataset.data_schema#

Module for DataSchema class, used for storing type information about the dataset.

Module Contents#

Classes#

DataSchema

DataSchema class for storing type information about the dataset.

Attributes#

logger

The logger for the module.

DataType

Type alias for the data types.

mleko.dataset.data_schema.logger#

The logger for the module.

mleko.dataset.data_schema.DataType#

Type alias for the data types.

class mleko.dataset.data_schema.DataSchema(numerical: list[str] | tuple[str, Ellipsis] | tuple[] = (), categorical: list[str] | tuple[str, Ellipsis] | tuple[] = (), boolean: list[str] | tuple[str, Ellipsis] | tuple[] = (), datetime: list[str] | tuple[str, Ellipsis] | tuple[] = (), timedelta: list[str] | tuple[str, Ellipsis] | tuple[] = ())#

DataSchema class for storing type information about the dataset.

Initialize DataSchema with the given features.

Parameters:
  • numerical (list[str] | tuple[str, Ellipsis] | tuple[]) – List of numerical features.

  • categorical (list[str] | tuple[str, Ellipsis] | tuple[]) – List of categorical features.

  • boolean (list[str] | tuple[str, Ellipsis] | tuple[]) – List of boolean features.

  • datetime (list[str] | tuple[str, Ellipsis] | tuple[]) – List of datetime features.

  • timedelta (list[str] | tuple[str, Ellipsis] | tuple[]) – List of timedelta features.

Raises:

ValueError – If feature names are not unique across all types.

get_features(types: list[DataType] | tuple[DataType, Ellipsis] | tuple[] = ()) list[str]#

Get features of a given type.

If no type is specified, all features are returned.

Parameters:

types (list[DataType] | tuple[DataType, Ellipsis] | tuple[]) – List of data types to be returned.

Returns:

List of features of a given type.

Return type:

list[str]

get_type(feature: str) DataType#

Get the type of a given feature.

Parameters:

feature (str) – Feature name.

Raises:

ValueError – If feature is not found in the schema.

Returns:

Feature data type.

Return type:

DataType

drop_features(features: set[str] | list[str] | tuple[str, Ellipsis] | tuple[]) DataSchema#

Drop a feature from the DataSchema.

Parameters:

features (set[str] | list[str] | tuple[str, Ellipsis] | tuple[]) – List of feature names to be dropped.

Return type:

DataSchema

add_feature(feature: str, dtype: DataType) DataSchema#

Add a feature to the DataSchema.

Parameters:
  • feature (str) – Feature name.

  • dtype (DataType) – Feature data type.

Raises:

ValueError – If feature is already present in the schema.

Return type:

DataSchema

change_feature_type(feature: str, dtype: DataType) DataSchema#

Change the type of a feature in the DataSchema.

Parameters:
  • feature (str) – Feature name.

  • dtype (DataType) – Feature data type.

Raises:

ValueError – If feature is not present in the schema.

Return type:

DataSchema

to_dict() dict[str, list[str]]#

Return the dict representation of DataSchema.

Returns:

Dict representation of DataSchema.

Return type:

dict[str, list[str]]

copy() DataSchema#

Create a copy of this DataSchema.

Returns:

A copy of this DataSchema.

Return type:

DataSchema