feature_extraction
FeatureExtractor
absolute_energy()
Compute the absolute energy of a time series.
Returns:
Type  Description 

An expression of the output


absolute_maximum()
Compute the absolute maximum of a time series.
Returns:
Type  Description 

An expression of the output


absolute_sum_of_changes()
Compute the absolute sum of changes of a time series.
Returns:
Type  Description 

An expression of the output


autocorrelation(n_lags)
Calculate the autocorrelation for a specified lag. The autocorrelation measures the linear dependence between a timeseries and a lagged version of itself.
Parameters:
Name  Type  Description  Default 

n_lags 
int

The lag at which to calculate the autocorrelation. Must be a nonnegative integer. 
required 
Returns:
Type  Description 

An expression of the output


benford_correlation()
Returns the correlation between the first digit distribution of the input time series and the NewcombBenford's Law distribution.
Returns:
Type  Description 

An expression of the output


binned_entropy(bin_count=10)
Calculates the entropy of a binned histogram for a given time series. It is highly recommended that you impute the time series before calling this.
Parameters:
Name  Type  Description  Default 

bin_count 
int

The number of bins to use in the histogram. Default is 10. 
10

Returns:
Type  Description 

An expression of the output


c3(n_lags)
Measure of nonlinearity in the time series using c3 statistics.
Parameters:
Name  Type  Description  Default 

n_lags 
int

The lag that should be used in the calculation of the feature. 
required 
Returns:
Type  Description 

An expression of the output


change_quantiles(q_low, q_high, is_abs=True)
First fixes a corridor given by the quantiles ql and qh of the distribution of x. Then calculates the average, absolute value of consecutive changes of the series x inside this corridor.
Parameters:
Name  Type  Description  Default 

q_low 
float

The lower quantile of the corridor. Must be less than 
required 
q_high 
float

The upper quantile of the corridor. Must be greater than 
required 
is_abs 
bool

If True, takes absolute difference. 
True

Returns:
Type  Description 

An expression of the output


cid_ce(normalize=False)
Computes estimate of timeseries complexity[^1].
A more complex time series has more peaks and valleys. This feature is calculated by:
Parameters:
Name  Type  Description  Default 

normalize 
bool

If True, znormalizes the timeseries before computing the feature. Default is False. 
False

Returns:
Type  Description 

An expression of the output


count_above(threshold=0.0)
Calculate the percentage of values above or equal to a threshold.
Parameters:
Name  Type  Description  Default 

threshold 
float

The threshold value for comparison. 
0.0

Returns:
Type  Description 

An expression of the output


count_above_mean()
Count the number of values that are above the mean.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

An expression of the output


count_below(threshold=0.0)
Calculate the percentage of values below or equal to a threshold.
Parameters:
Name  Type  Description  Default 

threshold 
float

The threshold value for comparison. 
0.0

Returns:
Type  Description 

An expression of the output


count_below_mean()
Count the number of values that are below the mean.
Returns:
Type  Description 

An expression of the output


cusum(threshold, warmup_period, drift=0.0)
Cumulative sum (CUSUM) filter to detect abrupt changes in data.
The CUSUM filter is a quality control method, designed to detect a shift in the mean value of the measured quantity away from a target value.
The general formula for the CUSUM filter can be found here: https://en.wikipedia.org/wiki/CUSUM
And the original paper that introduces it can be found here: https://www.tandfonline.com/doi/abs/10.1080/00401706.1961.10489922
Parameters:
Name  Type  Description  Default 

threshold 
float

The threshold for the change (x_t+1  x_t) to be counted 
required 
warmup_period 
int

The number of observations which are used to estimate the mean and standard deviation of the data. 
required 
drift 
float

The drift coefficient for the CUSUM filter. Default value is 0. 
0.0

Returns:
Type  Description 

An expression of the output


detrend(method='linear')
Detrends the time series by either removing a fitted linear regression or by removing the mean. This assumes that data is in order.
Parameters:
Name  Type  Description  Default 

method 
DetrendMethod

Either 
'linear'

Returns:
Type  Description 

An expression representing detrended column


energy_ratios(n_chunks=10)
Calculates sum of squares over the whole series for n_chunks
equally segmented parts of the timeseries.
All ratios for all chunks will be returned at once.
Parameters:
Name  Type  Description  Default 

n_chunks 
int

The number of equally segmented parts to divide the timeseries into. Default is 10. 
10

Returns:
Type  Description 

An expression of the output


first_location_of_maximum()
Returns the first location of the maximum value of x. The position is calculated relatively to the length of x.
Returns:
Type  Description 

An expression of the output


first_location_of_minimum()
Returns the first location of the minimum value of x. The position is calculated relatively to the length of x.
Returns:
Type  Description 

An expression of the output


frac_diff(d, min_weight=None, window_size=None)
Compute the fractional differential of a time series.
This particular functionality is referenced in Advances in Financial Machine Learning by Marcos Lopez de Prado (2018).
For feature creation purposes, it is suggested that the minimum value of d is used that removes stationarity from the time series. This can be achieved by running the augmented dickeyfuller test on the time series for different values of d and selecting the minimum value that makes the time series stationary.
Parameters:
Name  Type  Description  Default 

d 
float

The fractional order of the differencing operator. 
required 
min_weight 
float

The minimum weight to use for calculations. If specified, the window size is computed from this value and not needed. 
None

window_size 
int

The window size of the fractional differencing operator. If specified, the minimum weight is not needed. 
None

harmonic_mean()
Returns the harmonic mean of the expression
Returns:
Type  Description 

An expression of the output


has_duplicate()
Check if the timeseries contains any duplicate values.
Returns:
Type  Description 

An expression of the output


has_duplicate_max()
Check if the timeseries contains any duplicate values equal to its maximum value.
Returns:
Type  Description 

An expression of the output


has_duplicate_min()
Check if the timeseries contains duplicate values equal to its minimum value.
Returns:
Type  Description 

An expression of the output


index_mass_quantile(q)
Calculates the relative index i of time series x where q% of the mass of x lies left of i. For example for q = 50% this feature calculator will return the mass center of the time series.
Parameters:
Name  Type  Description  Default 

q 
float

The quantile. 
required 
Returns:
Type  Description 

An expression of the output


large_standard_deviation(ratio=0.25)
Checks if the timeseries has a large standard deviation: std(x) > r * (max(X)min(X))
.
As a heuristic, the standard deviation should be a forth of the range of the values.
Parameters:
Name  Type  Description  Default 

ratio 
float

The ratio of the interval to compare with. 
0.25

Returns:
Type  Description 

An expression of the output


last_location_of_maximum()
Returns the last location of the maximum value of x. The position is calculated relatively to the length of x.
Returns:
Type  Description 

An expression of the output


last_location_of_minimum()
Returns the last location of the minimum value of x. The position is calculated relatively to the length of x.
Returns:
Type  Description 

An expression of the output


lempel_ziv_complexity(threshold, as_ratio=True)
Calculate a complexity estimate based on the LempelZiv compression algorithm. The implementation here is currently a Rust rewrite of Lilian Besson'code. Instead of returning the complexity value, we return a ratio w.r.t the length of the input series. If null is encountered, it will be interpreted as 0 in the bit sequence.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
threshold 
Union[float, Expr]

Either a number, or an expression representing a comparable quantity. If x > threshold, then it will be binarized as 1 and 0 otherwise. 
required 
as_ratio 
bool

If true, return the complexity divided by length of sequence 
True

Returns:
Type  Description 

Expr


Reference
https://github.com/Naereen/LempelZiv_Complexity/tree/master https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv_complexity
linear_trend()
Compute the slope, intercept, and RSS of the linear trend.
Returns:
Type  Description 

An expression of the output


longest_losing_streak()
Returns the longest losing streak of the time series. A loss is counted when (x_t+1  x_t) <= 0
Returns:
Type  Description 

An expression of the output


longest_streak_above(threshold)
Returns the longest streak of changes >= threshold of the time series. A change is counted when (x_t+1  x_t) >= threshold. Note that the streaks here are about the changes for consecutive values in the time series, not the individual values.
Parameters:
Name  Type  Description  Default 

threshold 
float

The threshold value for comparison. 
required 
Returns:
Type  Description 

An expression of the output


longest_streak_above_mean()
Returns the length of the longest consecutive subsequence in x that is greater than the mean of x.
Returns:
Type  Description 

An expression of the output


longest_streak_below(threshold)
Returns the longest streak of changes <= threshold of the time series. A change is counted when (x_t+1  x_t) <= threshold. Note that the streaks here are about the changes for consecutive values in the time series, not the individual values.
Parameters:
Name  Type  Description  Default 

threshold 
float

The threshold value for comparison. 
required 
Returns:
Type  Description 

An expression of the output


longest_streak_below_mean()
Returns the length of the longest consecutive subsequence in x that is smaller than the mean of x.
Returns:
Type  Description 

An expression of the output


longest_winning_streak()
Returns the longest winning streak of the time series. A win is counted when (x_t+1  x_t) >= 0
Returns:
Type  Description 

An expression of the output


max_abs_change()
Compute the maximum absolute change from X_t to X_t+1.
Returns:
Type  Description 

An expression of the output


mean_abs_change()
Compute mean absolute change.
Returns:
Type  Description 

An expression of the output


mean_change()
Compute mean change.
Returns:
Type  Description 

An expression of the output


mean_n_absolute_max(n_maxima)
Calculates the arithmetic mean of the n absolute maximum values of the time series.
Parameters:
Name  Type  Description  Default 

n_maxima 
int

The number of maxima to consider. 
required 
Returns:
Type  Description 

An expression of the output


mean_second_derivative_central()
Returns the mean value of a central approximation of the second derivative.
Returns:
Type  Description 

An expression of the output


number_crossings(crossing_value=0.0)
Calculates the number of crossings of x on m, where m is the crossing value.
A crossing is defined as two sequential values where the first value is lower than m and the next is greater, or viceversa. If you set m to zero, you will get the number of zero crossings.
Parameters:
Name  Type  Description  Default 

crossing_value 
float

The crossing value. Defaults to 0.0. 
0.0

Returns:
Type  Description 

An expression of the output


number_peaks(support)
Calculates the number of peaks of at least support n in the time series x. A peak of support n is defined as a subsequence of x where a value occurs, which is bigger than its n neighbours to the left and to the right.
Hence in the sequence
x = [3, 0, 0, 4, 0, 0, 13]
4 is a peak of support 1 and 2 because in the subsequences
[0, 4, 0] [0, 0, 4, 0, 0]
4 is still the highest value. Here, 4 is not a peak of support 3 because 13 is the 3th neighbour to the right of 4 and its bigger than 4.
Parameters:
Name  Type  Description  Default 

support 
int

Support of the peak 
required 
Returns:
Type  Description 

An expression of the output


percent_reoccurring_points()
Returns the percentage of nonunique data points in the time series. Nonunique data points are those that occur more than once in the time series.
The percentage is calculated as follows:
# of data points occurring more than once / # of all data points
This means the ratio is normalized to the number of data points in the time series, in contrast to the
percent_reoccuring_values
function.
Returns:
Type  Description 

An expression of the output


percent_reoccurring_values()
Returns the percentage of values that are present in the time series more than once.
The percentage is calculated as follows:
len(different values occurring more than once) / len(different values)
This means the percentage is normalized to the number of unique values in the time series, in contrast to the
percent_reoccurring_points
function.
Returns:
Type  Description 

An expression of the output


permutation_entropy(tau=1, n_dims=3, base=math.e)
Computes permutation entropy. It is recommended that users should impute the time series before calling this.
Parameters:
Name  Type  Description  Default 

tau 
int

The embedding time delay which controls the number of time periods between elements of each of the new column vectors. The recommended value is 1. 
1

n_dims 
int, > 1

The embedding dimension which controls the length of each of the new column vectors. The recommended range is 37. 
3

base 
float

The base for log in the entropy computation 
e

Returns:
Type  Description 

An expression of the output


range_change(percentage=True)
Returns the range (max  min) over mean of the time series.
Parameters:
Name  Type  Description  Default 

percentage 
bool

Compute the percentage if set to True 
True

Returns:
Type  Description 

An expression of the output


range_count(lower, upper, closed='left')
Computes values of input expression that is between lower (inclusive) and upper (exclusive).
Parameters:
Name  Type  Description  Default 

lower 
float

The lower bound, inclusive 
required 
upper 
float

The upper bound, exclusive 
required 
closed 
ClosedInterval

Whether or not the boundaries should be included/excluded 
'left'

Returns:
Type  Description 

An expression of the output


range_over_mean()
Returns the range (max  min) over mean of the time series.
Returns:
Type  Description 

An expression of the output


ratio_beyond_r_sigma(ratio=0.25)
Returns the ratio of values in the series that is beyond r*std from mean on both sides.
Parameters:
Name  Type  Description  Default 

ratio 
float

The scaling factor for std 
0.25

Returns:
Type  Description 

An expression of the output


ratio_n_unique_to_length()
Calculate the ratio of the number of unique values to the length of the timeseries.
Returns:
Type  Description 

An expression of the output


root_mean_square()
Calculate the root mean square.
Returns:
Type  Description 

An expression of the output


streak_length_stats(above, threshold)
Returns some statistics of the length of the streaks of the time series. Note that the streaks here are about the changes for consecutive values in the time series, not the individual values.
The statistics include: min length, max length, average length, std of length, 10percentile length, median length, 90percentile length, and mode of the length. If input is Series, a dictionary will be returned. If input is an expression, the expression will evaluate to a struct with the fields ordered by the statistics.
Parameters:
Name  Type  Description  Default 

above 
bool

Above (>=) or below (<=) the given threshold 
required 
threshold 
float

The threshold for the change (x_t+1  x_t) to be counted 
required 
Returns:
Type  Description 

An expression of the output


sum_reoccurring_points()
Returns the sum of all data points that are present in the time series more than once.
For example, sum_reoccurring_points(pl.Series([2, 2, 2, 2, 1]))
returns 8, as 2 is a reoccurring value, so all 2's
are summed up.
This is in contrast to the sum_reoccurring_values
function, where each reoccuring value is only counted once.
Returns:
Type  Description 

An expression of the output


sum_reoccurring_values()
Returns the sum of all values that are present in the time series more than once.
For example, sum_reoccurring_values(pl.Series([2, 2, 2, 2, 1]))
returns 2, as 2 is a reoccurring value, so it is
summed up with all other reoccuring values (there is none), so the result is 2.
This is in contrast to the sum_reoccurring_points
function, where each reoccuring value is only counted as often
as it is present in the data.
Returns:
Type  Description 

An expression of the output


symmetry_looking(ratio=0.25)
Check if the distribution of x looks symmetric.
A distribution is considered symmetric if:  mean(X)median(X)  < ratio * (max(X)min(X))
Parameters:
Name  Type  Description  Default 

ratio 
float

Multiplier on distance between max and min. 
0.25

Returns:
Type  Description 

An expression of the output


time_reversal_asymmetry_statistic(n_lags)
Returns the time reversal asymmetry statistic.
Parameters:
Name  Type  Description  Default 

n_lags 
int

The lag that should be used in the calculation of the feature. 
required 
Returns:
Type  Description 

An expression of the output


var_gt_std(ddof=1)
Is the variance >= std? In other words, is var >= 1?
Parameters:
Name  Type  Description  Default 

ddof 
int

Delta Degrees of Freedom used when computing var/std. 
1

Returns:
Type  Description 

An expression of the output


variation_coefficient()
Calculate the coefficient of variation (CV).
Returns:
Type  Description 

An expression of the output


absolute_energy(x)
Compute the absolute energy of a time series.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


absolute_maximum(x)
Compute the absolute maximum of a time series.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


absolute_sum_of_changes(x)
Compute the absolute sum of changes of a time series.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


approximate_entropy(x, run_length, filtering_level, scale_by_std=True)
Approximate sample entropies of a time series given the filtering level. This only works for Series input right now.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
run_length 
int

Length of compared run of data. This is 
required 
filtering_level 
float

Filtering level, must be positive. This is 
required 
scale_by_std 
bool

Whether to scale filter level by std of data. In most applications, this is the default behavior, but not in some other cases. 
True

Returns:
Type  Description 

float


augmented_dickey_fuller(x, n_lags)
Calculates the Augmented DickeyFuller (ADF) test statistic. This only works for Series input right now.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
n_lags 
int

The number of lags to include in the test. 
required 
Returns:
Type  Description 

float


autocorrelation(x, n_lags)
Calculate the autocorrelation for a specified lag.
The autocorrelation measures the linear dependence between a timeseries and a lagged version of itself.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
n_lags 
int

The lag at which to calculate the autocorrelation. Must be a nonnegative integer. 
required 
Returns:
Type  Description 

float  Expr

Autocorrelation at the given lag. Returns None, if lag is less than 0. 
autoregressive_coefficients(x, n_lags)
Computes coefficients for an AR(n_lags
) process. This only works for Series input
right now. Caution: Any Null Value in Series will replaced by 0!
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
n_lags 
int

The number of lags in the autoregressive process. 
required 
Returns:
Type  Description 

list of float


benford_correlation(x)
Returns the correlation between the first digit distribution of the input time series and the NewcombBenford's Law distribution.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


binned_entropy(x, bin_count=10)
Calculates the entropy of a binned histogram for a given time series.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
bin_count 
int

The number of bins to use in the histogram. Default is 10. 
10

Returns:
Type  Description 

float  Expr


c3(x, n_lags)
Measure of nonlinearity in the time series using c3 statistics.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
n_lags 
int

The lag that should be used in the calculation of the feature. 
required 
Returns:
Type  Description 

float  Expr


change_quantiles(x, q_low, q_high, is_abs)
First fixes a corridor given by the quantiles ql and qh of the distribution of x. It will return a list of changes coming from consecutive values that both lie within the quantile range. The user may optionally get abssolute value of the changes, and compute stats from these changes. If q_low >= q_high, it will return null.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

A single timeseries. 
required 
q_low 
float

The lower quantile of the corridor. Must be less than 
required 
q_high 
float

The upper quantile of the corridor. Must be greater than 
required 
is_abs 
bool

If True, takes absolute difference. 
required 
Returns:
Type  Description 

list of float  Expr


cid_ce(x, normalize=False)
Computes estimate of timeseries complexity[^1].
A more complex time series has more peaks and valleys. This feature is calculated by:
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

A single timeseries. 
required 
normalize 
bool

If True, znormalizes the timeseries before computing the feature. Default is False. 
False

Returns:
Type  Description 

float  Expr


count_above(x, threshold=0.0)
Calculate the percentage of values above or equal to a threshold.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
threshold 
float

The threshold value for comparison. 
0.0

Returns:
Type  Description 

float  Expr


count_above_mean(x)
Count the number of values that are above the mean.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

int  Expr


count_below(x, threshold=0.0)
Calculate the percentage of values below or equal to a threshold.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
threshold 
float

The threshold value for comparison. 
0.0

Returns:
Type  Description 

float  Expr


count_below_mean(x)
Count the number of values that are below the mean.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

int  Expr


cwt_coefficients(x, widths=(2, 5, 10, 20), n_coefficients=14)
Calculates a Continuous wavelet transform for the Ricker wavelet.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
widths 
Sequence[int]

The widths of the Ricker wavelet to use for the CWT. Default is (2, 5, 10, 20). 
(2, 5, 10, 20)

n_coefficients 
int

The number of CWT coefficients to return. Default is 14. 
14

Returns:
Type  Description 

list of float


energy_ratios(x, n_chunks=10)
Calculates sum of squares over the whole series for n_chunks
equally segmented parts of the timeseries.
E.g. if n_chunks = 10, values are [0, 1, 2, 3, .. , 999], the first chunk will be [0, .. , 99].
Parameters:
Name  Type  Description  Default 

x 
list of float

The timeseries to be segmented and analyzed. 
required 
n_chunks 
int

The number of equally segmented parts to divide the timeseries into. Default is 10. 
10

Returns:
Type  Description 

list of float  Expr


fft_coefficients(x)
Calculates Fourier coefficients and phase angles of the the 1D discrete Fourier Transform. This only works for Series input right now.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input time series. 
required 
n_threads 
int

Number of threads to use. If None, uses all threads available. Defaults to None. 
required 
Returns:
Type  Description 

dict of list of floats  Expr


first_location_of_maximum(x)
Returns the first location of the maximum value of x. The position is calculated relatively to the length of x.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


first_location_of_minimum(x)
Returns the first location of the minimum value of x. The position is calculated relatively to the length of x.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


fourier_entropy(x, n_bins=10)
Calculate the Fourier entropy of a time series. This only works for Series input right now.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
n_bins 
int

The number of bins to use for the entropy calculation. Default is 10. 
10

Returns:
Type  Description 

float


friedrich_coefficients(x, polynomial_order=3, n_quantiles=30)
Calculate the Friedrich coefficients of a time series.
Parameters:
Name  Type  Description  Default 

x 
TIME_SERIES_T

The time series to calculate the Friedrich coefficients of. 
required 
polynomial_order 
int

The order of the polynomial to fit to the quantile means. Default is 3. 
3

n_quantiles 
int

The number of quantiles to use for the calculation. Default is 30. 
30

Returns:
Type  Description 

list of float


harmonic_mean(x)
Returns the harmonic mean of the of the time series.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input time series. 
required 
Returns:
Type  Description 

float  Expr


has_duplicate(x)
Check if the timeseries contains any duplicate values.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

bool  Expr


has_duplicate_max(x)
Check if the timeseries contains any duplicate values equal to its maximum value.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

bool  Expr


has_duplicate_min(x)
Check if the timeseries contains duplicate values equal to its minimum value.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

bool  Expr


index_mass_quantile(x, q)
Calculates the relative index i of time series x where q% of the mass of x lies left of i. For example for q = 50% this feature calculator will return the mass center of the time series.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
q 
float

The quantile. 
required 
Returns:
Type  Description 

float  Expr


large_standard_deviation(x, ratio=0.25)
Checks if the timeseries has a large standard deviation: std(x) > r * (max(X)min(X))
.
As a heuristic, the standard deviation should be a forth of the range of the values.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
ratio 
float

The ratio of the interval to compare with. 
0.25

Returns:
Type  Description 

bool  Expr


last_location_of_maximum(x)
Returns the last location of the maximum value of x. The position is calculated relatively to the length of x.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


last_location_of_minimum(x)
Returns the last location of the minimum value of x. The position is calculated relatively to the length of x.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


lempel_ziv_complexity(x, threshold, as_ratio=True)
Calculate a complexity estimate based on the LempelZiv compression algorithm. The implementation here is currently a Rust rewrite of Lilian Besson'code. See the reference section below. Instead of return the complexity value, we return a ratio w.r.t the length of the input series. If null is encountered, it will be interpreted as 0 in the bit sequence.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
threshold 
Union[float, Expr]

Either a number, or an expression representing a comparable quantity. If x > value, then it will be binarized as 1 and 0 otherwise. If x is eager, then value must also be eager as well. 
required 
as_ratio 
bool

If true, return the complexity / length of sequence 
True

Returns:
Type  Description 

float


Reference
https://github.com/Naereen/LempelZiv_Complexity/tree/master https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv_complexity
linear_trend(x)
Compute the slope, intercept, and RSS of the linear trend.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

Mapping[str, float]  Expr


longest_losing_streak(x)
Returns the longest losing streak of the time series. A loss is counted when (x_t+1  x_t) <= 0
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input time series. 
required 
Returns:
Type  Description 

float  Expr


longest_streak_above(x, threshold)
Returns the longest streak of changes >= threshold of the time series. A change is counted when (x_t+1  x_t) >= threshold. Note that the streaks here are about the changes for consecutive values in the time series, not the individual values.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input time series. 
required 
threshold 
float

The threshold value for comparison. 
required 
Returns:
Type  Description 

float  Expr


longest_streak_above_mean(x)
Returns the length of the longest consecutive subsequence in x that is > mean of x. If all values in x are null, 0 will be returned. Note: this does not measure consecutive changes in time series, only counts the streak based on the original time series, not the differences.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

int  Expr


longest_streak_below(x, threshold)
Returns the longest streak of changes <= threshold of the time series. A change is counted when (x_t+1  x_t) <= threshold. Note that the streaks here are about the changes for consecutive values in the time series, not the individual values.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input time series. 
required 
threshold 
float

The threshold value for comparison. 
required 
Returns:
Type  Description 

float  Expr


longest_streak_below_mean(x)
Returns the length of the longest consecutive subsequence in x that is < mean of x. If all values in x are null, 0 will be returned. Note: this does not measure consecutive changes in time series, only counts the streak based on the original time series, not the differences.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

int  Expr


longest_winning_streak(x)
Returns the longest winning streak of the time series. A win is counted when (x_t+1  x_t) >= 0
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input time series. 
required 
Returns:
Type  Description 

float  Expr


max_abs_change(x)
Compute the maximum absolute change from X_t to X_t+1.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

A single timeseries. 
required 
Returns:
Type  Description 

float  Expr


mean_abs_change(x)
Compute mean absolute change.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

A single timeseries. 
required 
Returns:
Type  Description 

float  Expr


mean_change(x)
Compute mean change.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

A single timeseries. 
required 
Returns:
Type  Description 

float  Expr


mean_n_absolute_max(x, n_maxima)
Calculates the arithmetic mean of the n absolute maximum values of the time series.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
n_maxima 
int

The number of maxima to consider. 
required 
Returns:
Type  Description 

float  Expr


mean_second_derivative_central(x)
Returns the mean value of a central approximation of the second derivative.
Parameters:
Name  Type  Description  Default 

x 
Series

A time series to calculate the feature of. 
required 
Returns:
Type  Description 

Series


number_crossings(x, crossing_value=0.0)
Calculates the number of crossings of x on m, where m is the crossing value.
A crossing is defined as two sequential values where the first value is lower than m and the next is greater, or viceversa. If you set m to zero, you will get the number of zero crossings.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

A single timeseries. 
required 
crossing_value 
float

The crossing value. Defaults to 0.0. 
0.0

Returns:
Type  Description 

float  Expr


number_cwt_peaks(x, max_width=5)
Number of different peaks in x.
To estimate the numbers of peaks, x is smoothed by a ricker wavelet for widths ranging from 1 to n. This feature calculator returns the number of peaks that occur at enough width scales and with sufficiently high SignaltoNoiseRatio (SNR)
Parameters:
Name  Type  Description  Default 

x 
Series

A single timeseries. 
required 
max_width 
int

maximum width to consider 
5

Returns:
Type  Description 

float


number_peaks(x, support)
Calculates the number of peaks of at least support n in the time series x. A peak of support n is defined as a subsequence of x where a value occurs, which is bigger than its n neighbours to the left and to the right.
Hence in the sequence
x = [3, 0, 0, 4, 0, 0, 13]
4 is a peak of support 1 and 2 because in the subsequences
[0, 4, 0] [0, 0, 4, 0, 0]
4 is still the highest value. Here, 4 is not a peak of support 3 because 13 is the 3th neighbour to the right of 4 and its bigger than 4.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
support 
int

Support of the peak 
required 
Returns:
Type  Description 

int  Expr


percent_reoccurring_points(x)
Returns the percentage of nonunique data points in the time series. Nonunique data points are those that occur more than once in the time series.
The percentage is calculated as follows:
# of data points occurring more than once / # of all data points
This means the ratio is normalized to the number of data points in the time series, in contrast to the
percent_reoccuring_values
function.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float


percent_reoccurring_values(x)
Returns the percentage of values that are present in the time series more than once.
The percentage is calculated as follows:
# (distinct values occurring more than once) / # of distinct values
This means the percentage is normalized to the number of unique values in the time series, in contrast to the
percent_reoccurring_points
function.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


permutation_entropy(x, tau=1, n_dims=3, base=math.e)
Computes permutation entropy.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
tau 
int

The embedding time delay which controls the number of time periods between elements of each of the new column vectors. 
1

n_dims 
int, > 1

The embedding dimension which controls the length of each of the new column vectors 
3

base 
float

The base for log in the entropy computation 
e

Returns:
Type  Description 

float  Expr


range_change(x, percentage=True)
Returns the maximum value range. If percentage is true, will compute (max  min) / min, which only makes sense when x is always positive.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input time series. 
required 
percentage 
bool

compute the percentage if set to True 
True

Returns:
Type  Description 

float  Expr


range_count(x, lower, upper, closed='left')
Computes values of input expression that is between lower (inclusive) and upper (exclusive).
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
lower 
float

The lower bound, inclusive 
required 
upper 
float

The upper bound, exclusive 
required 
closed 
ClosedInterval

Whether or not the boundaries should be included/excluded 
'left'

Returns:
Type  Description 

int  Expr


range_over_mean(x)
Returns the range (max  min) over mean of the time series.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input time series. 
required 
Returns:
Type  Description 

float  Expr


ratio_beyond_r_sigma(x, ratio=0.25)
Returns the ratio of values in the series that is beyond r*std from mean on both sides.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
ratio 
float

The scaling factor for std 
0.25

Returns:
Type  Description 

float  Expr


ratio_n_unique_to_length(x)
Calculate the ratio of the number of unique values to the length of the timeseries.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


root_mean_square(x)
Calculate the root mean square.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


sample_entropy(x, ratio=0.2, m=2)
Calculate the sample entropy of a time series. This only works for Series input right now.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

The input time series. 
required 
ratio 
float

The tolerance parameter. Default is 0.2. 
0.2

m 
int

Length of a run of data. Most common run length is 2. 
2

Returns:
Type  Description 

float  Expr


spkt_welch_density(x, n_coeffs=None)
This estimates the cross power spectral density of the time series x at different frequencies. This only works for Series input right now.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

The input time series. 
required 
n_coeffs 
Optional[int]

The number of coefficients you want to take. If none, will take all, which will be a list as long as the input time series. 
None

Returns:
Type  Description 

list of floats


streak_length_stats(x, above, threshold)
Returns some statistics of the length of the streaks of the time series. Note that the streaks here are about the changes for consecutive values in the time series, not the individual values.
The statistics include: min length, max length, average length, std of length, 10percentile length, median length, 90percentile length, and mode of the length. If input is Series, a dictionary will be returned. If input is an expression, the expression will evaluate to a struct with the fields ordered by the statistics.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input time series. 
required 
above 
bool

Above (>=) or below (<=) the given threshold 
required 
threshold 
float

The threshold for the change (x_t+1  x_t) to be counted 
required 
Returns:
Type  Description 

float  Expr


sum_reoccurring_points(x)
Returns the sum of all data points that are present in the time series more than once.
For example, sum_reoccurring_points(pl.Series([2, 2, 2, 2, 1]))
returns 8, as 2 is a reoccurring value, so all 2's
are summed up.
This is in contrast to the sum_reoccurring_values
function, where each reoccuring value is only counted once.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


sum_reoccurring_values(x)
Returns the sum of all values that are present in the time series more than once.
For example, sum_reoccurring_values(pl.Series([2, 2, 2, 2, 1]))
returns 2, as 2 is a reoccurring value, so it is
summed up with all other reoccuring values (there is none), so the result is 2.
This is in contrast to the sum_reoccurring_points
function, where each reoccuring value is only counted as often as it is present in the data.
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input timeseries. 
required 
Returns:
Type  Description 

float  Expr


symmetry_looking(x, ratio=0.25)
Check if the distribution of x looks symmetric.
A distribution is considered symmetric if:  mean(X)median(X)  < ratio * (max(X)min(X))
Parameters:
Name  Type  Description  Default 

x 
Series

Input timeseries. 
required 
ratio 
float

Multiplier on distance between max and min. 
0.25

Returns:
Type  Description 

bool  Expr


time_reversal_asymmetry_statistic(x, n_lags)
Returns the time reversal asymmetry statistic.
Parameters:
Name  Type  Description  Default 

x 
Series

Input timeseries. 
required 
n_lags 
int

The lag that should be used in the calculation of the feature. 
required 
Returns:
Type  Description 

float  Expr


var_gt_std(x, ddof=1)
Is the variance >= std? In other words, is var >= 1?
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input time series. 
required 
ddof 
int

Delta Degrees of Freedom used when computing var. 
1

Returns:
Type  Description 

bool  Expr


variation_coefficient(x)
Calculate the coefficient of variation (CV).
Parameters:
Name  Type  Description  Default 

x 
Expr  Series

Input time series. 
required 
Returns:
Type  Description 

float  Expr

