preprocessing package

Submodules

preprocessing.preprocessors

preprocessing.preprocessors.add_technical_indicator(df)

Calculate technical indicators using the stockstats package. Adds MACD, RSI, CCI, and ADX indicators to the dataframe.

Parameters:

df (DataFrame) – pandas DataFrame containing stock data.

Return type:

DataFrame

Returns:

pandas DataFrame with added technical indicators.

preprocessing.preprocessors.add_turbulence(df)

Add turbulence index to the dataframe based on precalculated turbulence.

Parameters:

df (DataFrame) – pandas DataFrame containing stock data.

Return type:

DataFrame

Returns:

pandas DataFrame with added turbulence index.

preprocessing.preprocessors.calculate_turbulence(df)

Calculate the turbulence index based on historical stock prices. Uses the covariance matrix of historical prices to calculate turbulence.

Parameters:

df (DataFrame) – pandas DataFrame containing stock data with columns

[‘datadate’, ‘tic’, ‘adjcp’]. :rtype: DataFrame :return: pandas DataFrame containing turbulence index.

preprocessing.preprocessors.data_split(df, start, end)

Splits the dataset into a subset based on a date range.

Parameters:
  • df (DataFrame) – pandas DataFrame containing the data.

  • start (str) – Start date (inclusive) as a string (e.g., ‘2022-01-01’).

  • end (str) – End date (exclusive) as a string (e.g., ‘2023-01-01’).

Return type:

DataFrame

Returns:

Filtered pandas DataFrame sorted by ‘datadate’ and ‘tic’.

preprocessing.preprocessors.load_dataset(file_name)

Load a CSV dataset from a file path and return as a pandas DataFrame.

Parameters:

file_name (str) – Path to the CSV file (str).

Return type:

DataFrame

Returns:

pandas DataFrame containing the data from the CSV file.

preprocessing.preprocessors.preprocess_data()

Data preprocessing pipeline that loads, filters, and processes stock data. - Loads the dataset. - Filters data after 2009. - Calculates adjusted prices. - Adds technical indicators. - Fills missing values.

Return type:

DataFrame

Returns:

pandas DataFrame with preprocessed data.

Module contents