preprocessing package
Submodules
preprocessing.preprocessors
- preprocessing.preprocessors.add_technical_indicator(df)
Calculate technical indicators using the stockstats package. Adds MACD, RSI, CCI, and ADX indicators to the dataframe.
- Parameters:
df (
DataFrame) – pandas DataFrame containing stock data.- Return type:
DataFrame- Returns:
pandas DataFrame with added technical indicators.
- preprocessing.preprocessors.add_turbulence(df)
Add turbulence index to the dataframe based on precalculated turbulence.
- Parameters:
df (
DataFrame) – pandas DataFrame containing stock data.- Return type:
DataFrame- Returns:
pandas DataFrame with added turbulence index.
- preprocessing.preprocessors.calculate_turbulence(df)
Calculate the turbulence index based on historical stock prices. Uses the covariance matrix of historical prices to calculate turbulence.
- Parameters:
df (
DataFrame) – pandas DataFrame containing stock data with columns
[‘datadate’, ‘tic’, ‘adjcp’]. :rtype:
DataFrame:return: pandas DataFrame containing turbulence index.
- preprocessing.preprocessors.data_split(df, start, end)
Splits the dataset into a subset based on a date range.
- Parameters:
df (
DataFrame) – pandas DataFrame containing the data.start (
str) – Start date (inclusive) as a string (e.g., ‘2022-01-01’).end (
str) – End date (exclusive) as a string (e.g., ‘2023-01-01’).
- Return type:
DataFrame- Returns:
Filtered pandas DataFrame sorted by ‘datadate’ and ‘tic’.
- preprocessing.preprocessors.load_dataset(file_name)
Load a CSV dataset from a file path and return as a pandas DataFrame.
- Parameters:
file_name (
str) – Path to the CSV file (str).- Return type:
DataFrame- Returns:
pandas DataFrame containing the data from the CSV file.
- preprocessing.preprocessors.preprocess_data()
Data preprocessing pipeline that loads, filters, and processes stock data. - Loads the dataset. - Filters data after 2009. - Calculates adjusted prices. - Adds technical indicators. - Fills missing values.
- Return type:
DataFrame- Returns:
pandas DataFrame with preprocessed data.