Feature engineering library that helps you keep track of feature dependencies, documentation and schema

Itayazolay, updated 🕥 2022-01-21 13:35:15

featureclass

Feature engineering library that helps you keep track of feature dependencies, documentation and schema

Installation

Using pip

bash pip install featureclass

Motivation

This library helps define a featureclass.
featureclass is inspired by dataclass, and is meant to provide alternative way to define features engineering classes.

I have noticed that the below code is pretty common when doing feature engineering:

```python from statistics import variance from math import sqrt class MyFeatures: def calc_all(self, datapoint): out = {} out['var'] = self.calc_var(datapoint), out['stdev'] = self.calc_std(out['var']) return out

def calc_var(self, data) -> float:
    return variance(data)

def calc_stdev(self, var) -> float:
    return sqrt(var)

```

Some things were missing for me from this type of implementation:
1. Implicit dependencies between features
2. No simple schema
3. No documentation for features
4. Duplicate declaration of the same feature - once as a function and one as a dict key

This is why I created this library.
I turned the above code into this:
```python from featureclass import feature, featureclass, feature_names, feature_annotations, asdict, as_dataclass from statistics import variance from math import sqrt

@featureclass class MyFeatures: def init(self, datapoint): self.datapoint = datapoint

@feature()
def var(self) -> float:
    """Calc variance"""
    return variance(self.datapoint)

@feature()
def stdev(self) -> float:
    """Calc stdev"""
    return sqrt(self.var)

print(feature_names(MyFeatures)) # ('var', 'stdev') print(feature_annotations(MyFeatures)) # {'var': float, 'stdev': float} print(asdict(MyFeatures([1,2,3,4,5]))) # {'var': 2.5, 'stdev': 1.5811388300841898} print(as_dataclass(MyFeatures([1,2,3,4,5]))) # MyFeatures(stdev=1.5811388300841898, var=2.5) ```

The feature decorator is using cached_property to cache the feature calculation,
making sure that each feature is calculated once per datapoint

Releases

0.3.0 2022-01-19 12:11:28

What's Changed

    • rename asDict to as_dict and asDataclass to as_dataclass by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/4

Full Changelog: https://github.com/Itayazolay/featureclass/compare/0.2.1...0.3.0

0.2.1 2022-01-11 19:26:12

What's Changed

  • docs, formatting and tests by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/1
  • fine-tune mypy type ignore by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/3

New Contributors

  • @Itayazolay made their first contribution in https://github.com/Itayazolay/featureclass/pull/1

Full Changelog: https://github.com/Itayazolay/featureclass/compare/0.2.0...0.2.1

0.2.0 2022-01-09 20:20:44

  • fix cache
  • added asDict and asDataclass Full Changelog: https://github.com/Itayazolay/featureclass/compare/0.1.0...0.2.0

0.1.0 2022-01-09 12:17:25