mleko.cache.fingerprinters.csv_fingerprinter
#
The module contains a fingerprinter for CSV files supporting Gzipped and raw CSV files.
Module Contents#
Classes#
A fingerprinter for CSV files supporting Gzipped and raw CSV files. |
Attributes#
The logger for the module. |
- mleko.cache.fingerprinters.csv_fingerprinter.logger#
The logger for the module.
- class mleko.cache.fingerprinters.csv_fingerprinter.CSVFingerprinter(n_rows: int = 1000)#
Bases:
mleko.cache.fingerprinters.base_fingerprinter.BaseFingerprinter
A fingerprinter for CSV files supporting Gzipped and raw CSV files.
Initialize the CSVFingerprinter.
Warning
The fingerprint is generated by reading the first n_rows of each CSV file. If the CSV file is larger than n_rows, only the first n_rows are read. This means that the fingerprint is not unique for the entire CSV file, but only for the first n_rows.
- Parameters:
n_rows (int) – The number of rows to sample from each CSV file for fingerprinting.
Examples
>>> fingerprinter = CSVFingerprinter(n_rows=1000) >>> fingerprinter.fingerprint(["data.csv", "data2.csv"]) "fingerprint"
- fingerprint(data: list[str] | list[pathlib.Path]) str #
Generate a fingerprint for the given list of CSV files.
The currently supported file types are .csv, .gz, and .csv.gz.
- Parameters:
data (list[str] | list[pathlib.Path]) – A list of file paths to CSV files.
- Returns:
The fingerprint as a hexadecimal string.
- Return type:
str
- _fingerprint_csv_file(file_path: pathlib.Path) str #
Generate a fingerprint for a single CSV file.
- Parameters:
file_path (pathlib.Path) – The file path to a CSV file.
- Raises:
ValueError – File is unsupported file type.
- Returns:
The fingerprint as a hexadecimal string.
- Return type:
str