Features summary

Generating a set of features¶

import pandas as pd
from spoef.feature_generation import feature_generation
from spoef.utils import count_occurences_features

# Generating the data
data = pd.DataFrame([
    ['John', 1, '2021-01-03', 1000, 1000],
    ['John', 1, '2021-02-03', 1000, 2000],
    ['John', 1, '2021-03-03', -3000, -1000],
    ['Jane', 0, '2021-01-03', 1000, 1000],
    ['Jane', 0, '2021-02-03', 5000, 6000],
    ['Jane', 0, '2021-03-03', 2000, 8000],
    ],
    columns=['name', 'label', 'date', 'transaction', 'balance']
    )
# Make the date into datetime object.
data.date = pd.to_datetime(data.date, format="%Y-%m-%d")


# Setting up which features to generate.
list_featuretypes = ["Basic", "FourierComplete", "FourierNLargest", "WaveletComplete", "WaveletBasic"]


# Generating features over 1 quarter.

# For the transactions:
transaction_features_quarterly = feature_generation(
    data=data[["name", "date", "transaction"]],
    grouper="name",
    combine_fill_method="transaction",
    time_window='quarter',
    list_featuretypes=list_featuretypes,
    observation_length=1
)

Then, the summary¶

overview = count_occurences_features(transaction_features_quarterly, print_head=5)

This returns a dataframe with counts for each type of datatype, time window, feature type and several other details. This gives us an insight into what features were generated.