mleko: Streamlining Machine Learning Pipelines in Python#
Simplify and accelerate your machine learning development with mleko
. Designed with modularity and customization in mind, it seamlessly integrates into your existing workflows. Its robust caching system optimizes performance, taking you from data ingestion to finalized models with unparalleled efficiency.
Features#
mleko
is engineered to address the end-to-end needs of machine learning pipelines, providing robust, scalable solutions for data science challenges:
Ingest: Seamlessly integrates with data sources like AWS S3 and Kaggle, offering hassle-free data ingestion and compatibility.
Export: Supports exporting data to various formats and platforms, locally or in the cloud, to ensure that your data is accessible and shareable.
Convert: Specializes in data format transformations, prominently featuring high-performance conversions from
CSV
toVaex DataFrame
, to make your data pipeline-ready.Split: Employs sophisticated data partitioning algorithms, allowing you to segment DataFrames into train, test, and validation sets for effective model training and evaluation. -Filter: Provides a suite of filtering techniques such as resampling or simple expression-based filtering, enabling you to focus on the most relevant data.
Feature Selection: Equipped with a suite of feature selection techniques,
mleko
enables model performance by focusing on the most impactful variables.Transformation: Facilitates data manipulations such as Frequency Encoding and Standardization, ensuring that your data conforms to the prerequisites of the machine learning algorithms.
Model: Provides a core set of functionalities for machine learning models, including in-built support for hyperparameter tuning, thereby streamlining the path from data to deployable model.
Pipeline: Unifies the entire workflow into an intuitive directed acyclic graph (
DAG
) architecture, promoting reproducibility and reducing iteration time and time-to-market for machine learning models.
By integrating these features, mleko
serves as a comprehensive toolkit for machine learning practitioners looking to build robust models efficiently.
Installation#
You can install mleko
via pip
from PyPI:
$ pip install mleko
Usage & Examples#
See the documentation for more information or check out the usage examples on well-known datasets like the Titanic Dataset.
Issues#
If you encounter any problems, please file an issue along with a detailed description.
Contributing#
We are open to, and grateful for, any contributions made by the community. To learn more, see the Contributor Guide.
Release History#
See our changelog.
Acknowledgements#
The development of mleko
was influenced by existing work of the following individuals:
Felipe Breve Siola (fsiola)
Sai Ma (metanouvelle)
Ahmet Anil Pala (aanilpala)
Their insights and contributions provided a solid foundation for this library. We appreciate their effort and recognize their contributions that led to the creation of mleko
.
License#
Copyright © 2024 Klarna Bank AB
For license details, see the LICENSE file in the root of this project.