python library for automated dataset normalization

alteryx, updated 🕥 2023-02-10 23:08:41

AutoNormalize

Tests

AutoNormalize is a Python library for automated datatable normalization. It allows you to build an EntitySet from a single denormalized table and generate features for machine learning using Featuretools.

Getting Started

Install

shell pip install featuretools[autonormalize]

Uninstall

shell pip uninstall autonormalize

Demos

API Reference

auto_entityset

shell auto_entityset(df, accuracy=0.98, index=None, name=None, time_index=None)

Creates a normalized entityset from a dataframe.

Arguments:

  • df (pd.Dataframe) : the dataframe containing data

  • accuracy (0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS)

  • index (str, optional) : name of column that is intended index of df

  • name (str, optional) : the name of created EntitySet

  • time_index (str, optional) : name of time column in the dataframe.

Returns:

  • entityset (ft.EntitySet) : created entity set

find_dependencies

shell find_dependencies(df, accuracy=0.98, index=None)

Finds dependencies within dataframe with the DFD search algorithm.

Returns:

  • dependencies (Dependencies) : the dependencies found in the data within the contraints provided

normalize_dataframe

shell normalize_dataframe(df, dependencies)

Normalizes dataframe based on the dependencies given. Keys for the newly created DataFrames can only be columns that are strings, ints, or categories. Keys are chosen according to the priority:

  1. shortest lenghts
  2. has "id" in some form in the name of an attribute
  3. has attribute furthest to left in the table

Returns:

  • new_dfs (list[pd.DataFrame]) : list of new dataframes


make_entityset

shell make_entityset(df, dependencies, name=None, time_index=None)

Creates a normalized EntitySet from dataframe based on the dependencies given. Keys are chosen in the same fashion as for normalize_dataframeand a new index will be created if any key has more than a single attribute.

Returns:

  • entityset (ft.EntitySet) : created EntitySet


normalize_entityset

shell normalize_entityset(es, accuracy=0.98)

Returns a new normalized EntitySet from an EntitySet with a single entity.

Arguments:

  • es (ft.EntitySet) : EntitySet with a single entity to normalize

Returns:

  • new_es (ft.EntitySet) : new normalized EntitySet


Built at Alteryx Innovation Labs

Alteryx Innovation Labs

Issues

Bump ipython from 7.16.3 to 8.10.0

opened on 2023-02-10 23:08:40 by dependabot[bot]

Bumps ipython from 7.16.3 to 8.10.0.

Release notes

Sourced from ipython's releases.

See https://pypi.org/project/ipython/

We do not use GitHub release anymore. Please see PyPI https://pypi.org/project/ipython/

Commits
  • 15ea1ed release 8.10.0
  • 560ad10 DOC: Update what's new for 8.10 (#13939)
  • 7557ade DOC: Update what's new for 8.10
  • 385d693 Merge pull request from GHSA-29gw-9793-fvw7
  • e548ee2 Swallow potential exceptions from showtraceback() (#13934)
  • 0694b08 MAINT: mock slowest test. (#13885)
  • 8655912 MAINT: mock slowest test.
  • a011765 Isolate the attack tests with setUp and tearDown methods
  • c7a9470 Add some regression tests for this change
  • fd34cf5 Swallow potential exceptions from showtraceback()
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/alteryx/autonormalize/network/alerts).

Updates

opened on 2022-12-15 15:56:33 by gsheni None

Bump nbconvert from 6.4.5 to 6.5.1

opened on 2022-08-23 18:57:33 by dependabot[bot]

Bumps nbconvert from 6.4.5 to 6.5.1.

Release notes

Sourced from nbconvert's releases.

Release 6.5.1

No release notes provided.

6.5.0

What's Changed

New Contributors

Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.5...6.5

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/alteryx/autonormalize/network/alerts).

Add autonormalize to conda forge

opened on 2022-03-10 18:29:38 by gsheni

AutoNormalize should be available for download via conda-forge.

Documentation on how to contribute a package: https://conda-forge.org/docs/maintainer/adding_pkgs.html Example PR of contributing a package: https://github.com/conda-forge/staged-recipes/pull/16033

Does autonormalize lose typing on input dataframe

opened on 2022-03-08 14:40:19 by dvreed77

If the input dataframe has been initialized with logical types, does autonormalize lose this typing information on the output dataframes?

Steps to check

  1. Create an input dataframe
  2. Add logical types to some columns
  3. Check if output dataframes have the same logical types

Add release.md that follows Featuretools, Woodwork release process

opened on 2021-11-22 17:38:10 by gsheni
  • We should add a release.md file that follows our other libraries
  • https://github.com/alteryx/woodwork/blob/main/release.md
  • https://github.com/alteryx/featuretools/blob/main/release.md
  • Some parts of this will apply to autonormalize and some parts will not

Releases

v2.0.1 2022-04-25 22:23:23

v2.0.1 Apr 25, 2022

* Changes
    * Remove python-dateutil dependency requirement (#48)
* Testing Changes
    * Add ``test_version.py`` and release notes updated CI check (#49)

Thanks to the following people for contributing to this release:
@rwedge, @thehomebrewnerd

v2.0.0 2022-03-14 18:40:22

v2.0.0 Mar 14, 2022

  • Fixes
    • Fix compatibility issues with featuretools (#41)
  • Changes
    • Rename normalize_entity to normalize_entityset (#41)

Thanks to the following people for contributing to this release: @dvreed77

v1.0.2 2022-01-07 21:40:30

v1.0.1 Jan 7, 2022

  • Documentation Changes
    • Update release notes and release format #37
    • Updated sphinx documentation and guides #35
  • Testing Changes
    • Updated tests to work with featuretools 1.0 #35

Thanks to the following people for contributing to this release: @gsheni, @tuethan1999

Release 2019-08-16 19:13:54

AutoNormalize 2019-08-15 18:25:28

EntitySet Normalization 2019-08-15 16:05:19

alteryx

Alteryx Open Source

GitHub Repository Homepage

automatic normalization automatic-normalization