Changelog#
v4.3.0 (2024-06-08)#
β¨ Features#
model: Add check for fitted model in LGBMModel fingerprint. (
f6a0933
)
π Bug Fixes#
v4.2.0 (2024-05-21)#
β¨ Features#
transformer: Update
ExpressionTransformer
to useTypedDict
instead of tuples. (3950abd
)
v4.1.0 (2024-05-18)#
β¨ Features#
tuning: Add support for enqueuing trials in
OptunaTuner
. (9e0b6b2
)data splitting: Add support for stratification on multiple features in the
RandomSplitter
. (d745434
)transformer: Add
metadata
option for theExpressionTransformer
that allows for creation of meta features not tracked in theDataSchema
. (f16ea8b
)transformer: Add
ExpressionTransformer
for creating features using thevaex
expression system. (c0faf74
)
v4.0.0 (2024-05-09)#
βοΈ BREAKING CHANGES#
β¨ Features#
exporter: Add
LocalManifest
support forLocalExporter
which simplifies caching logic and enables S3 manifest translations. (2199ff0
)exporter: Add support for multiple data export using
LocalExporter
. (ff988b6
)data source: Add support for reading manifest files from S3 buckets in
S3Ingester
. (9c68a9b
)pipeline: Add
disable_cache
parameter toPipeline
execution. (da1e31a
)
π Bug Fixes#
π οΈ Code Refactoring#
data source: Extract shared S3 logic to
utils
which can be then used byS3Exporter
. (97a7974
)
v3.2.0 (2024-04-18)#
β¨ Features#
tuning: Add support for
RDSStorage
using theOptunaTuner
(cc06ddd
)
π Bug Fixes#
v3.1.0 (2024-04-12)#
β¨ Features#
v3.0.0 (2024-04-05)#
βοΈ BREAKING CHANGES#
model: Update
LGBMModel
to use dependency injection, now expects alightgbm.LGBMModel
as argument. (7250f34
)
π Bug Fixes#
v2.2.0 (2024-03-22)#
β¨ Features#
filter: Add
ImblearnResamplingFilter
which is a wrapper forimblearn
over- and under-samplers. (77a3d7d
)filter: Add
ExpressionFilter
and base class for simple DataFrame filtering usingvaex
expressions. (dc679ff
)cache: Add
disable_cache
argument to all cached functions to completely bypass all caching functionality. (fbdfc5d
)
π Documentation#
Update
CHANGELOG.md
format to include missing categories. (d97b32c
)
v2.1.0 (2024-02-24)#
β¨ Features#
Update Titanic dataset to
mleko
2.0 API. (62bf991
)tuning: Add
optuna-dashboard
support toOptunaTuner
including automatically generated experiment notes. (29d81c2
)transformer: Improve flexibility of
LabelEncoderTransformer
by adding optional null encoding and manual dictionary mapping. (f7b30a9
)Set
cache_directory
as optional argument, with custom default locations. (08e8777
)
π Bug Fixes#
data cleaning: Fix
meta_columns
not being forcefully cast to correct data type inCSVToVaexConverter
. (b42b9ed
)
π Documentation#
Update year in Copyright in README.md (#192) (
eeb56e1
)
π§ͺ Tests#
Fix test cases generating cache directory outside temporary directory. (
ba57fbf
)
v2.0.0 (2024-02-07)#
βοΈ BREAKING CHANGES#
pipeline: Refactor
PipelineStep
to useTypedDict
for both inputs and outputs. (2eb623c
)
β¨ Features#
π Bug Fixes#
π οΈ Code Refactoring#
π Documentation#
Refactor mleko package documentation to format bullet list correctly. (
76ee895
)
π€ Continous Integration#
v1.2.6 (2024-01-25)#
π Bug Fixes#
Bump patch release. (
ff5f94e
)
v1.2.5 (2024-01-25)#
π Bug Fixes#
Fix
CHANGELOG.md
template location (141c9b7
)
v1.2.4 (2024-01-25)#
π Bug Fixes#
Trigger patch release. (
7269dca
)
ποΈ Build#
semantic versioning: Update
CHANGELOG.md
template and semantic versioning logic. (1727e09
)
v1.2.3 (2024-01-25)#
π Bug Fixes#
Remove coverage from workflow (
09eb09d
)
v1.2.2 (2024-01-25)#
π Bug Fixes#
Switch to trusted publishing (
e84712d
)
v1.2.1 (2024-01-25)#
π Bug Fixes#
Experiment with semantic versioning (
0942196
)
ποΈ Build#
v1.2.0 (2023-10-09)#
β¨ Features#
data source: β¨ Add support for pattern matching in
*Ingester
and addLocalManifest
to index fetched files. (75974a4
)
π Bug Fixes#
logging: π Fix LGBM logging routing to correct log level. (
0e5fa77
)
π¨ Style#
ποΈ Build#
ποΈ Bump
gitpython
to resolve CVE-2023-41040 and CVE-2023-40590. (79627bd
)
v1.1.0 (2023-09-27)#
β¨ Features#
tuning: β¨ Add hyperparameter tuning functionality, initially including
OptunaTuner
. (be38c07
)
π§ͺ Tests#
tuning: π§ͺ Add test cases for
TuneStep
. (d811c7d
)
v1.0.0 (2023-09-20)#
βοΈ BREAKING CHANGES#
π Improve
README.md
with more up to date information. (b388b59
)
β¨ Features#
transformer: β¨ Add
DataSchema
API to transformersfit
,transform
andfit_transform
. (e053c85
)
π Documentation#
π Add example notebook for
Titanic
dataset. (e651af9
)
v0.8.1 (2023-09-07)#
π Bug Fixes#
config: π Fix readthedocs build to only generate html. (
13fc207
)
v0.8.0 (2023-09-06)#
β¨ Features#
model: β¨ Add
LGBMModel
along with base class which can be extended for all types of future models. (b47a241
)β¨ Add
DataSchema
which tracks dataset features throughout the pipeline and methods. (e03bd2c
)feature selection: β¨ Update
BaseFeatureSelector
and children to use thefit
,transform
andfit_transform
pattern. (62e4dd1
)transformer: β¨ Add
fit
,transform
andfit_transform
to allTransformers
, along with API and caching simplificatons. (5cc4ebc
)cache: β¨ Add
CacheHandler
which allows customization of read/write functions for each cached return value individually. (609e084
)
π Bug Fixes#
feature selection: π Add
DataSchema
as partial return from allfit
methods in feature selectors. (ebf2484
)
π οΈ Code Refactoring#
cache: πΈ Replace
disable_cache
with a check ifcache_size=0
forLRUCacheMixin
. (cfd7592
)
v0.7.0 (2023-07-11)#
β¨ Features#
π Bug Fixes#
data cleaning: π Switched to HDF5 as file format for faster I/O and better SageMaker support. (
61f9e42
)
v0.6.1 (2023-06-30)#
π Bug Fixes#
ποΈ Build#
config: π§ Switch mypy for pyright and update configuration. (
5631aed
)
v0.6.0 (2023-06-26)#
β¨ Features#
v0.5.0 (2023-06-17)#
β¨ Features#
transformer: β¨ Add MinMaxScalerTransformer for normalizing numerical features. (
9b26c00
)transformer: β¨ Add MaxAbsScalerTransformer that scales numerical features. (
1fd2a93
)transformer: β¨ Add CompositeTransformer for chaining together multiple transformers sequentially. (
006d741
)transformer: β¨ Add LabelEncoderTransformer for ordinal encoding. (
41a4c45
)transformer: β¨ Add FrequencyEncoderTransformer along with support for pipeline. (
465e6db
)
π οΈ Code Refactoring#
π« Switch to tqdm.auto to prevent breaking in Jupyter notebooks. (
dc139cf
)
π§ͺ Tests#
β Now _get_local_filenames returns a sorted list of filenames to ensure stability. (
774e8eb
)
v0.4.2 (2023-06-11)#
π Performance improvements#
β‘οΈ Optimize VarianceFeatureSelector when threshold is 0. (
906dde3
)
π οΈ Code Refactoring#
β Remove pandas dependency. (
40e264c
)
π€ Continous Integration#
semantic versioning: π· Add more sections to changelog based on conventional commit categories. (
e5b1594
)
v0.4.1 (2023-06-04)#
π Bug Fixes#
v0.4.0 (2023-06-03)#
β¨ Features#
feature selection: β¨ Add that filters out invariant features. (
798c261
)feature selection: β¨ Add
PearsonCorrelationFeatureSelector
which drops highly correlated features. (66e5cd2
)feature selection: β¨ Add
CompositeFeatureSelector
, for chaining multiple feature selection steps on the same DataFrame. (3d75079
)feature selection: β¨ Add standard deviation feature selector. (
c56177b
)feature selection: β¨ Add missing rate feature selector. (
d5ba8b5
)
π Bug Fixes#
π Fix typeguard breaking changes causing build to fail. (
66c6a8e
)
π οΈ Code Refactoring#
v0.3.1 (2023-05-21)#
π Bug Fixes#
:bug: Added notes to pipeline step docstrings. (
d94f899
)
π οΈ Code Refactoring#
data source: :bug: Added note to the KaggleDataSource init docstring. (
d5f12d3
)
π€ Continous Integration#
:rocket: Removed semantic PR workflow and updated test workflow to not run on release commits. (
8138745
)
v0.3.0 (2023-05-21)#
β¨ Features#
new notes (#54) (
21239f7
)
π Bug Fixes#
π€ Continous Integration#
:rocket: Updated release to only trigger if the commit message does not contain chore(release). (
c9f3f3f
)
v0.2.0 (2023-05-21)#
β¨ Features#
add data splitting step (#53) (
a668b1a
)
π Documentation#
v0.1.3 (2023-05-13)#
π Bug Fixes#
cache: :bug: Cache modules exposed in subpackage init. (
fd65e9d
)
v0.1.2 (2023-05-13)#
π Bug Fixes#
π Documentation#
:memo: Fixed sphinx-autoapi build warnings. (
040963a
)
v0.1.0 (2023-05-12)#
β¨ Features#
data source: :sparkles: Add KaggleDataSource to download the dataset from Kaggle by providing a destination directory, owner slug, dataset slug, and necessary API credentials. (
3fa07b6
)
π Bug Fixes#
cache: :bug: Fixed test by not testing it⦠(
e3a0ce9
)cache: :bug: Try logging using assert to fix GH issue (
5e247ec
)cache: :bug: Attempting to fix test case failing in GH actions. (
4892591
)cache: :bug: LRUCacheMixin now relies on file modification time instead of access time due to system limitations. (
127d657
):bug: Fixed docstrings for private methods in KaggleDataSource and removed xdoctest from build steps (
bb55cf5
)