preprocessing
boxcox(method='mle')
Applies the Box-Cox transformation to numeric columns in a panel DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method |
str
|
The method used to determine the lambda parameter of the Box-Cox transformation. Supported methods:
|
'mle'
|
coerce_dtypes(schema)
Coerces the column datatypes of a DataFrame using the provided schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema |
Mapping[str, DataType]
|
A dictionary-like object mapping column names to the desired data types. |
required |
detrend(method='linear')
Removes mean or linear trend from numeric columns in a panel DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method |
str
|
If |
'linear'
|
diff(order, sp=1, fill_strategy=None)
Difference time-series in panel data given order and seasonal period.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
order |
int
|
The order to difference. |
required |
sp |
int
|
Seasonal periodicity. |
1
|
fill_strategy |
Optional[str]
|
Strategy to fill nulls by. Nulls are not filled if None. Supported strategies include: ["backward", "forward", "mean", "zero"]. |
None
|
impute(method)
Performs missing value imputation on numeric columns of a DataFrame grouped by entity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method |
Union[str, int, float]
|
The imputation method to use. Supported methods are:
|
required |
lag(lags, fill_strategy=None)
Applies lag transformation to a LazyFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lags |
List[int]
|
A list of lag values to apply. |
required |
fill_strategy |
Optional[str]
|
Strategy to fill nulls by. Nulls are not filled if None. Supported strategies include: ["backward", "forward", "mean", "zero"]. |
None
|
one_hot_encode(drop_first=False)
Encode categorical features as a one-hot numeric array.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
drop_first |
bool
|
Drop the first one hot feature. |
False
|
Raises:
Type | Description |
---|---|
ValueError
|
if X passed into |
reindex(drop_duplicates=False)
Reindexes the entity and time columns to have every possible combination of (entity, time).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
drop_duplicates |
bool
|
Defaults to False. If True, duplicates are dropped before reindexing. |
False
|
resample(freq, agg_method, impute_method)
Resamples and transforms a DataFrame using the specified frequency, aggregation method, and imputation method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
freq |
str
|
Offset alias supported by Polars. |
required |
agg_method |
str
|
The aggregation method to use for resampling. Supported values are 'sum', 'mean', and 'median'. |
required |
impute_method |
Union[str, int, float]
|
The method used for imputing missing values. If a string, supported values are 'ffill' (forward fill) and 'bfill' (backward fill). If an int or float, missing values will be filled with the provided value. |
required |
roll(window_sizes, stats, freq)
Performs rolling window calculations on specified columns of a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_sizes |
List[int]
|
A list of integers representing the window sizes for the rolling calculations. |
required |
stats |
List[Literal['mean', 'min', 'max', 'mlm', 'sum', 'std', 'cv']]
|
A list of statistical measures to calculate for each rolling window. Supported values are:
|
required |
freq |
str
|
Offset alias supported by Polars. |
required |
scale(use_mean=True, use_std=True, rescale_bool=False)
Performs scaling and rescaling operations on the numeric columns of a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
use_mean |
bool
|
Whether to subtract the mean from the numeric columns. Defaults to True. |
True
|
use_std |
bool
|
Whether to divide the numeric columns by the standard deviation. Defaults to True. |
True
|
rescale_bool |
bool
|
Whether to rescale boolean columns to the range [-1, 1]. Defaults to False. |
False
|
time_to_arange(eager=False)
Coerces time column into arange per entity.
Assumes even-spaced time-series and homogenous start dates.
trim(direction='both')
Trims time-series in panel to have the same start or end dates as the shortest time-series.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
direction |
Literal['both', 'left', 'right']
|
Defaults to "both". If "left" trims from start date of the shortest time series); if "right" trims up to the end date of the shortest time-series; or otherwise "both" trims between start and end dates of the shortest time-series |
'both'
|
yeojohnson(brack=(-2, 2))
Applies the Yeo-Johnson transformation to numeric columns in a panel DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
brack |
2 - tuple
|
The starting interval for a downhill bracket search with optimize.brent. Note that this is in most cases not critical; the final result is allowed to be outside this bracket. |
(-2, 2)
|