Generate and Visualize Data Lineage from query history

tokern, updated 🕥 2023-03-23 18:36:35

Tokern Lineage Engine

CircleCI codecov PyPI image image

Tokern Lineage Engine is fast and easy to use application to collect, visualize and analyze column-level data lineage in databases, data warehouses and data lakes in AWS and GCP.

Tokern Lineage helps you browse column-level data lineage * visually using kedro-viz * analyze lineage graphs programmatically using the powerful networkx graph library

Resources

  • Demo of Tokern Lineage App

data-lineage

Quick Start

Install a demo of using Docker and Docker Compose

Download the docker-compose file from Github repository.

# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/tokern-lineage-engine.yml -o docker-compose.yml

Run docker-compose

docker-compose up -d

Check that the containers are running.

docker ps
CONTAINER ID   IMAGE                                    CREATED        STATUS       PORTS                    NAMES
3f4e77845b81   tokern/data-lineage-viz:latest   ...   4 hours ago    Up 4 hours   0.0.0.0:8000->80/tcp     tokern-data-lineage-visualizer
1e1ce4efd792   tokern/data-lineage:latest       ...   5 days ago     Up 5 days                             tokern-data-lineage
38be15bedd39   tokern/demodb:latest             ...   2 weeks ago    Up 2 weeks                            tokern-demodb

Try out Tokern Lineage App

Head to http://localhost:8000/ to open the Tokern Lineage app

Install Tokern Lineage Engine

# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/tokern-lineage-engine.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml -o tokern-lineage-engine.yml

Run docker-compose

docker-compose up -d

If you want to use an external Postgres database, change the following parameters in tokern-lineage-engine.yml:

  • CATALOG_HOST
  • CATALOG_USER
  • CATALOG_PASSWORD
  • CATALOG_DB

You can also override default values using environement variables.

CATALOG_HOST=... CATALOG_USER=... CATALOG_PASSWORD=... CATALOG_DB=... docker-compose -f ... up -d

For more advanced usage of environment variables with docker-compose, refer to docker-compose docs

Pro-tip

If you want to connect to a database in the host machine, set

CATALOG_HOST: host.docker.internal # For mac or windows
#OR
CATALOG_HOST: 172.17.0.1 # Linux

Supported Technologies

  • Postgres
  • AWS Redshift
  • Snowflake

Coming Soon

  • SparkSQL
  • Presto

Documentation

For advanced usage, please refer to data-lineage documentation

Survey

Please take this survey if you are a user or considering using data-lineage. Responses will help us prioritize features better.

Issues

arm64 support added and enhanced python version support

opened on 2023-03-23 18:36:34 by yjagdale None

Support ARM docker builds

opened on 2023-03-21 15:57:58 by hendrix04

This can't currently be run on ARM hardware without being in emulation mode. Given Mac M1 and AWS EC2 Graviton, it would be nice to have ARM support.

Update python base image

opened on 2023-03-21 15:56:46 by hendrix04

python:3.8.1-slim is old and has 21 security vulnerabilites.

At least consider swapping to python:3.8-slim.

chore(deps): Bump ipython from 7.31.1 to 8.10.0

opened on 2023-02-11 00:37:46 by dependabot[bot]

Bumps ipython from 7.31.1 to 8.10.0.

Commits
  • 15ea1ed release 8.10.0
  • 560ad10 DOC: Update what's new for 8.10 (#13939)
  • 7557ade DOC: Update what's new for 8.10
  • 385d693 Merge pull request from GHSA-29gw-9793-fvw7
  • e548ee2 Swallow potential exceptions from showtraceback() (#13934)
  • 0694b08 MAINT: mock slowest test. (#13885)
  • 8655912 MAINT: mock slowest test.
  • a011765 Isolate the attack tests with setUp and tearDown methods
  • c7a9470 Add some regression tests for this change
  • fd34cf5 Swallow potential exceptions from showtraceback()
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/tokern/data-lineage/network/alerts).

chore(deps): Bump cryptography from 3.4.7 to 39.0.1

opened on 2023-02-08 01:38:46 by dependabot[bot]

Bumps cryptography from 3.4.7 to 39.0.1.

Changelog

Sourced from cryptography's changelog.

39.0.1 - 2023-02-07


* **SECURITY ISSUE** - Fixed a bug where ``Cipher.update_into`` accepted Python
  buffer protocol objects, but allowed immutable buffers. **CVE-2023-23931**
* Updated Windows, macOS, and Linux wheels to be compiled with OpenSSL 3.0.8.

.. _v39-0-0:

39.0.0 - 2023-01-01

  • BACKWARDS INCOMPATIBLE: Support for OpenSSL 1.1.0 has been removed. Users on older version of OpenSSL will need to upgrade.
  • BACKWARDS INCOMPATIBLE: Dropped support for LibreSSL < 3.5. The new minimum LibreSSL version is 3.5.0. Going forward our policy is to support versions of LibreSSL that are available in versions of OpenBSD that are still receiving security support.
  • BACKWARDS INCOMPATIBLE: Removed the encode_point and from_encoded_point methods on :class:~cryptography.hazmat.primitives.asymmetric.ec.EllipticCurvePublicNumbers, which had been deprecated for several years. :meth:~cryptography.hazmat.primitives.asymmetric.ec.EllipticCurvePublicKey.public_bytes and :meth:~cryptography.hazmat.primitives.asymmetric.ec.EllipticCurvePublicKey.from_encoded_point should be used instead.
  • BACKWARDS INCOMPATIBLE: Support for using MD5 or SHA1 in :class:~cryptography.x509.CertificateBuilder, other X.509 builders, and PKCS7 has been removed.
  • BACKWARDS INCOMPATIBLE: Dropped support for macOS 10.10 and 10.11, macOS users must upgrade to 10.12 or newer.
  • ANNOUNCEMENT: The next version of cryptography (40.0) will change the way we link OpenSSL. This will only impact users who build cryptography from source (i.e., not from a wheel), and specify their own version of OpenSSL. For those users, the CFLAGS, LDFLAGS, INCLUDE, LIB, and CRYPTOGRAPHY_SUPPRESS_LINK_FLAGS environment variables will no longer be respected. Instead, users will need to configure their builds as documented here_.
  • Added support for :ref:disabling the legacy provider in OpenSSL 3.0.x<legacy-provider>.
  • Added support for disabling RSA key validation checks when loading RSA keys via :func:~cryptography.hazmat.primitives.serialization.load_pem_private_key, :func:~cryptography.hazmat.primitives.serialization.load_der_private_key, and :meth:~cryptography.hazmat.primitives.asymmetric.rsa.RSAPrivateNumbers.private_key. This speeds up key loading but is :term:unsafe if you are loading potentially attacker supplied keys.
  • Significantly improved performance for :class:~cryptography.hazmat.primitives.ciphers.aead.ChaCha20Poly1305

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/tokern/data-lineage/network/alerts).

chore(deps): Bump oauthlib from 3.1.1 to 3.2.2

opened on 2023-02-06 21:00:13 by dependabot[bot]

Bumps oauthlib from 3.1.1 to 3.2.2.

Release notes

Sourced from oauthlib's releases.

3.2.2

OAuth2.0 Provider:

  • CVE-2022-36087

3.2.1

In short

OAuth2.0 Provider:

  • #803 : Metadata endpoint support of non-HTTPS

OAuth1.0:

  • #818 : Allow IPv6 being parsed by signature

General:

  • Improved and fixed documentation warnings.
  • Cosmetic changes based on isort

What's Changed

New Contributors

Full Changelog: https://github.com/oauthlib/oauthlib/compare/v3.2.0...v3.2.1

3.2.0

Changelog

OAuth2.0 Client:

  • #795: Add Device Authorization Flow for Web Application
  • #786: Add PKCE support for Client
  • #783: Fallback to none in case of wrong expires_at format.

OAuth2.0 Provider:

  • #790: Add support for CORS to metadata endpoint.
  • #791: Add support for CORS to token endpoint.
  • #787: Remove comma after Bearer in WWW-Authenticate

OAuth2.0 Provider - OIDC:

  • #755: Call save_token in Hybrid code flow

... (truncated)

Changelog

Sourced from oauthlib's changelog.

3.2.2 (2022-10-17)

OAuth2.0 Provider:

  • CVE-2022-36087

3.2.1 (2022-09-09)

OAuth2.0 Provider:

  • #803: Metadata endpoint support of non-HTTPS

OAuth1.0:

  • #818: Allow IPv6 being parsed by signature

General:

  • Improved and fixed documentation warnings.
  • Cosmetic changes based on isort

3.2.0 (2022-01-29)

OAuth2.0 Client:

  • #795: Add Device Authorization Flow for Web Application
  • #786: Add PKCE support for Client
  • #783: Fallback to none in case of wrong expires_at format.

OAuth2.0 Provider:

  • #790: Add support for CORS to metadata endpoint.
  • #791: Add support for CORS to token endpoint.
  • #787: Remove comma after Bearer in WWW-Authenticate

OAuth2.0 Provider - OIDC:

  • #755: Call save_token in Hybrid code flow
  • #751: OIDC add support of refreshing ID Tokens with refresh_id_token
  • #751: The RefreshTokenGrant modifiers now take the same arguments as the AuthorizationCodeGrant modifiers (token, token_handler, request).

General:

  • Added Python 3.9, 3.10, 3.11
  • Improve Travis & Coverage
Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/tokern/data-lineage/network/alerts).

Releases

v0.8.5 2021-10-13 18:06:37

v0.8.5 (2021-10-13)

Chore

  • Prepare release 0.8.5
  • update dbcat to 0.7.2

v0.8.4 2021-10-13 09:56:08

v0.8.4 (2021-10-13)

Chore

  • Prepare release 0.8.4

Fix

  • Accept gunicorn parameters from environment variable

v0.8.3 2021-08-19 04:32:54

v0.8.3 (2021-08-19)

Chore

  • Pin some dependencies. Update dbcat to 0.7.1

v0.8.2 2021-08-17 17:02:36

v0.8.2 (2021-08-17)

Chore

  • Update docker version of deploy container

v0.8.0 2021-07-29 18:01:48

v0.8.0 (2021-07-29)

Feature

  • Update dbcat version to 0.6.1

Fix

  • Pick up latest viz docker for reconnection fix

v0.7.8 2021-07-17 06:22:40

v0.7.8 (2021-07-17)

Chore

  • Prepare release 0.7.8

Fix

  • Fix update_source semantics and errors
  • Connection lost errors on macos (#64)
Tokern

Automate Data Engineering Tasks with Column-Level Data Lineage

GitHub Repository Homepage

data-lineage data-governance python postgresql jupyter