Interpret-Text builds on Interpret, an open source python package for training interpretable models and helping to explain blackbox machine learning systems. We have added extensions to support text models.
This repository contains an SDK and example Jupyter notebooks to showcase its use.
Interpret-Text incorporates community developed interpretability techniques for NLP models and a visualization dashboard to view the results. Users can run their experiments across multiple state-of-the-art explainers and easily perform comparative analysis on them. Using these tools, users will be able to explain their machine learning models globally on each label or locally for each document. In particular, this open-source toolkit: 1. Actively incorporates innovative text interpretability techniques, and allows the community to further expand its offerings 2. Creates a common API across the integrated libraries 3. Provides an interactive visualization dashboard to empower its users to gain insights into their data
Developers/Data Scientists: Having all of the interpretability techniques in one place makes it easy for data scientists to experiment with different interpretability techniques and explain their model in a scalable and seamless manner. The set of rich interactive visualizations allow developers and data scientists to train and deploy more transparent machine learning models instead of wasting time and effort on generating customized visualizations, addressing scalability issues by optimizing third-party interpretability techniques, and adopting/operationalizing interpretability techniques.
Business Executives: The core logic and visualizations are beneficial for raising awareness among those involved in developing AI applications, allow them to audit model predictions for potential unfairness, and establish a strong governance framework around the use of AI applications.
Machine Learning Interpretability Researchers: Interpret's extension hooks make it easy to extend, meaning interpretability researchers who are interested in adding their own techniques can easily add them to the community repository and compare it to state-of-the-art and proven interpretability techniques and/or other community techniques.
This repository uses Anaconda to simplify package and environment management.
To setup on your local machine:
Currently this repository only provides support for the text classification scenario.
The following is a list of the explainers available in this repository: * Classical Text Explainer - (Default: Bag-of-words with Logistic Regression)
| | Classical Text Explainer | Unified Information Explainer | Introspective Rationale Explainer | |---------------|---------|:-------------------:|:----------------------------:| | Input model support | Scikit-learn linear models and tree-based models | PyTorch | PyTorch | | Explain BERT | No | Yes | Yes | | Explain RNN | No | No | Yes | | NLP pipeline support | Handles text pre-processing, encoding, training, hyperparameter tuning | Uses BERT tokenizer however user needs to supply trained/fine-tuned BERT model, and samples of trained data | Generator and predictor modules handle the required text pre-processing. | Sample notebook | Classical Text Explainer Sample Notebook | Unified Information Explainer Sample Notebook | Introspective Rationale Explainer Sample Notebook|
The ClassicalTextExplainer extends text interpretability to classical machine learning models. This notebook provides a step by step walkthrough of operationalizing the ClassicalTextExplainer in an ML pipeline.
The ClassicalTextExplainer serves as a high level wrapper for the entire NLP pipeline, by natively handling the text preprocessing, encoding, training and hyperparameter optimization process. This allows the user to supply the dataset in text form without need for any external processing, with the entire text pipeline process being handled by the explainer under the hood.
In its default configuration the preprocessing pipeline uses a 1-gram bag-of-words encoder implemented by sklearn's count-vectorizer. The utilities file contains the finer details of the preprocessing steps in the default pipeline.
The ClassicalTextExplainer natively supports 2 families of models:
In the absence of a user supplied model, the ClassicalTextExplainer defaults to sklearn's logistic regression. In addition to the above mentioned models, any model that follows the same API layout and is compatible with sparse representations as input will also be supported. Apart from Logistic regression, we have successfully tested the framework with LightGBM and Random Forests as well.
The ClassicalTextExplainer has been designed with explicit intent of being modular and extensible.
The API allows for users to swap out nearly every component including the preprocessor, tokenizer, model and training routine with varying levels of difficulty. The API is composed such that a modified explainer would still be able to leverage the rest of the tooling implemented within the package.
The text encoding and decoding components are both closely tied to each other. Should the user wish to use a custom encoding process, it has to come paired with its own custom decoding process.
The ClassicalTextExplainer offers a painfree API to surface explanations inherent to supported models. The natively supported linear models such as linear regression and logisitic regression are considered to be glass-box explainers. A glass-box explainer implies a model that is innately explainable, where the user can fully observe and dissect the process adopted by the model in making a prediction. The family of linear models such as logistic regression and ensemble methods like random forests can be considered to be under the umbrella of glass-box explainers. Neural networks and Kernel based models are usually not considered glass-box models.
By default, the ClassicalTextExplainer leverages this inherent explainability by exposing weights and importances over encoded tokens as explanations over each word in a document. In practice, these can be accessed through the visualization dashboard or the explanation object.
The explanations provided by the aforementioned glass-box methods serve as direct proxies for weights and parameters in the model, which make the final prediction. This allows us to have high confidence in the correctness of the explanation and strong belief in humans being able to understand the internal configuration of the trained machine learning model.
If the user supplies a custom model, the nature of their model explanability (glass-box , grey-box, black-box) will carry over to importances produced by the explainer as well.
The UnifiedInformationExplainer uses an information-based measure to provide unified and coherent explanations on the intermediate layers of deep NLP models. While this model can explain various deep NLP models, we only implement text interpretability for BERT here. This notebook provides an example of how to load and preprocess data and retrieve explanations for all the layers of BERT - the transformer layers, pooler, and classification layer.
The UnifiedInformationExplainer handles the required text pre-processing. Each sentence is tokenized using the BERT Tokenizer
.
The UnifiedInformationExplainer only supports BERT at this time. A user will need to supply a trained or fine-tuned BERT model, the training dataset (or a subset if it is too large) and the sentence or text to be explained. Future work can extend this implementation to support RNNs and LSTMs.
The IntrospectiveRationaleExplainer uses a generator-predictor framework to produce a comprehensive subset of text input features or rationales that are relevant for the classification task. This introspective model predicts the labels and incorporates the outcome into the rationale selection process. The outcome is a hard or soft selection of rationales (words that have useful information for the classification task) and anti-rationales (words that do not appear to have useful information).
This notebook provides an example of how to use the introspective rationale generator.
The IntrospectiveRationaleExplainer has generator and predictor modules that handle the required text pre-processing.
The IntrospectiveRationaleExplainer is designed to be modular and extensible. The API currently has support for RNN
and BERT
models. There are three different sets of modules that has been implemented in this API:
* Explain a BERT model (BERT is used in the generator and predictor modules),
* Explain an RNN model (RNNs are used in the generator and predictor modules), and
* Explain an RNN model with BERT as the generator (RNNs are used in the predictor module and BERT is used in the generator module)
The user can also explain a custom model. In this case, the user will have to provide the pre-processor, predictor and generator modulules to the API.
Train your model in a Jupyter notebook running on your local machine: For sample code on pre-processing and training, see nlp-recipes or our sample notebook.
Call the explainer: To initialize the explainers, you will need to pass either:
the dataset or
To initialize the UnifiedInformationExplainer
, pass the model, the dataset you used to train the model along with the CUDA device and the BERT target layer.
``` python from interpret_text.unified_information import UnifiedInformationExplainer
interpreter_unified = UnifiedInformationExplainer(model, train_dataset, device, target_layer) ```
If you intend to use the `ClassicalTextExplainer` with our default Linear Regression model, you can simply call the fit function with your dataset.
```python from sklearn.preprocessing import LabelEncoder from interpret_text.classical import ClassicalTextExplainer
explainer = ClassicalTextExplainer()
label_encoder = LabelEncoder()
classifier, best_params = explainer.fit(X_train, y_train)
Instead, if you want to use the `ClassicalTextExplainer` with your own sklearn model, you will need to initialize `ClassicalTextExplainer` with your model, preprocessor and the range of hyperparamaters.
python
from sklearn.preprocessing import LabelEncoder
from interpret_text.classical import ClassicalTextExplainer
from interpret_text.common.utils_classical import get_important_words, BOWEncoder
HYPERPARAM_RANGE = { "solver": ["saga"], "multi_class": ["multinomial"], "C": [10 ** 4], } preprocessor = BOWEncoder() explainer = ClassicalTextExplainer(preprocessor, model, HYPERPARAM_RANGE) ```
Get the local feature importance values: use the following function calls to explain an individual instance or a group of instances.
```python
local_explanation = explainer.explain_local(x_test[0])
sorted_local_importance_names = local_explanation.get_ranked_local_names() sorted_local_importance_values = local_explanation.get_ranked_local_values() ```
ExplanationDashboard
objectTo use the visualization dashboard, import the ExplanationDashboard
object from the package.
python
from interpret_text.widget import ExplanationDashboard
2. When initializing the ExplanationDashboard, you need to pass the local explanation object that is returned by our explainer.
python
ExplanationDashboard(local_explanantion)
Note: if you are not using one of our explainers, you need to create your own explanation object by passing the feature importance values
```python
from interpret_text.explanation.explanation import _create_local_explanation
local_explanantion = _create_local_explanation( classification=True, text_explanation=True, local_importance_values=feature_importance_values, method=name_of_model, model_task="classification", features=parsed_sentence_list, classes=list_of_classes, ) ```
The dashboard visualizes the local feature importances of the document with an interactive bar chart and text area with highlighting and underlining of important words in your document. Words associated with positive feature importance contributed to the classification of the document towards the label indicated on the dashboard, words associated with negative feature importance contributed against it. The cap on number of important words is decided by the total number words with non-zero feature importances. Hovering over either the bars in the chart or the highlighted/underlined words will reveal a tooltip with the numerical feature importance. In the chart tooltip, the context of the word shows both the word before and after to allow users a way to differentiate between the same words used multiple times.
We welcome contributions and suggestions! Most contributions require you to agree to the Github Developer Certificate of Origin, DCO. For details, please visit https://probot.github.io/apps/dco/.
The Developer Certificate of Origin (DCO) is a lightweight way for contributors to certify that they wrote or otherwise have the right to submit the code they are contributing to the project. Here is the full text of the DCO, reformatted for readability: ``` By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. Contributors sign-off that they adhere to these requirements by adding a Signed-off-by line to commit messages. This is my commit message
Signed-off-by: Random J Developer random@developer.example.org Git even has a -s command line option to append this automatically to your commit message: $ git commit -s -m 'This is my commit message' ``` When you submit a pull request, a DCO bot will automatically determine whether you need to certify. Simply follow the instructions provided by the bot.
This project has adopted the his project has adopted the GitHub Community Guidelines.
Security issues and bugs should be reported privately, via email, to the Microsoft Security Response Center (MSRC) at [email protected]. You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Further information, including the MSRC PGP key, can be found in the Security TechCenter.
Hi ! I have used !pip install 'interpret_text', but still cannot import it . What is the problem οΌ
Bumps json5 from 1.0.1 to 1.0.2.
Sourced from json5's changelog.
Unreleased [code, diff]
v2.2.3 [code, diff]
- Fix: [email protected] is now the 'latest' release according to npm instead of v1.0.2. (#299)
v2.2.2 [code, diff]
- Fix: Properties with the name
__proto__
are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).v2.2.1 [code, diff]
- Fix: Removed dependence on minimist to patch CVE-2021-44906. (#266)
v2.2.0 [code, diff]
- New: Accurate and documented TypeScript declarations are now included. There is no need to install
@types/json5
. (#236, #244)v2.1.3 [code, diff]
v2.1.2 [code, diff]
... (truncated)
a62db1e
1.0.2e0c23fe
docs: update CHANGELOG for v1.0.262a6540
fix: add proto to objects and arraysDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps json5 from 2.1.3 to 2.2.3.
Sourced from json5's releases.
v2.2.3
- Fix: [email protected] is now the 'latest' release according to npm instead of v1.0.2. (#299)
v2.2.2
- Fix: Properties with the name
__proto__
are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).v2.2.1
- Fix: Removed dependence on minimist to patch CVE-2021-44906. (#266)
v2.2.0
Sourced from json5's changelog.
v2.2.3 [code, diff]
- Fix: [email protected] is now the 'latest' release according to npm instead of v1.0.2. (#299)
v2.2.2 [code, diff]
- Fix: Properties with the name
__proto__
are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).v2.2.1 [code, diff]
- Fix: Removed dependence on minimist to patch CVE-2021-44906. (#266)
v2.2.0 [code, diff]
c3a7524
2.2.394fd06d
docs: update CHANGELOG for v2.2.33b8cebf
docs(security): use GitHub security advisoriesf0fd9e1
docs: publish a security policy6a91a05
docs(template): bug -> bug report14f8cb1
2.2.210cc7ca
docs: update CHANGELOG for v2.2.27774c10
fix: add proto to objects and arraysedde30a
Readme: slight tweak to intro97286f8
Improve example in readmeDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps json5 from 1.0.1 to 1.0.2.
Sourced from json5's changelog.
Unreleased [code, diff]
v2.2.3 [code, diff]
- Fix: [email protected] is now the 'latest' release according to npm instead of v1.0.2. (#299)
v2.2.2 [code, diff]
- Fix: Properties with the name
__proto__
are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).v2.2.1 [code, diff]
- Fix: Removed dependence on minimist to patch CVE-2021-44906. (#266)
v2.2.0 [code, diff]
- New: Accurate and documented TypeScript declarations are now included. There is no need to install
@types/json5
. (#236, #244)v2.1.3 [code, diff]
v2.1.2 [code, diff]
... (truncated)
a62db1e
1.0.2e0c23fe
docs: update CHANGELOG for v1.0.262a6540
fix: add proto to objects and arraysDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps express from 4.17.1 to 4.18.2.
Sourced from express's releases.
4.18.2
- Fix regression routing a large stack in a single route
- deps: [email protected]
- deps: [email protected]
- perf: remove unnecessary object clone
- deps: [email protected]
4.18.1
- Fix hanging on large stack of sync routes
4.18.0
- Add "root" option to
res.download
- Allow
options
withoutfilename
inres.download
- Deprecate string and non-integer arguments to
res.status
- Fix behavior of
null
/undefined
asmaxAge
inres.cookie
- Fix handling very large stacks of sync middleware
- Ignore
Object.prototype
values in settings throughapp.set
/app.get
- Invoke
default
with same arguments as types inres.format
- Support proper 205 responses using
res.send
- Use
http-errors
forres.format
error- deps: [email protected]
- Fix error message for json parse whitespace in
strict
- Fix internal error when inflated body exceeds limit
- Prevent loss of async hooks context
- Prevent hanging when request already read
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
- Add
priority
option- Fix
expires
option to reject invalid dates- deps: [email protected]
- Replace internal
eval
usage withFunction
constructor- Use instance methods on
process
to check for listeners- deps: [email protected]
- Remove set content headers that break response
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
- Prevent loss of async hooks context
- deps: [email protected]
- deps: [email protected]
- Fix emitted 416 error missing headers property
- Limit the headers removed for 304 response
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
... (truncated)
Sourced from express's changelog.
4.18.2 / 2022-10-08
- Fix regression routing a large stack in a single route
- deps: [email protected]
- deps: [email protected]
- perf: remove unnecessary object clone
- deps: [email protected]
4.18.1 / 2022-04-29
- Fix hanging on large stack of sync routes
4.18.0 / 2022-04-25
- Add "root" option to
res.download
- Allow
options
withoutfilename
inres.download
- Deprecate string and non-integer arguments to
res.status
- Fix behavior of
null
/undefined
asmaxAge
inres.cookie
- Fix handling very large stacks of sync middleware
- Ignore
Object.prototype
values in settings throughapp.set
/app.get
- Invoke
default
with same arguments as types inres.format
- Support proper 205 responses using
res.send
- Use
http-errors
forres.format
error- deps: [email protected]
- Fix error message for json parse whitespace in
strict
- Fix internal error when inflated body exceeds limit
- Prevent loss of async hooks context
- Prevent hanging when request already read
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
- Add
priority
option- Fix
expires
option to reject invalid dates- deps: [email protected]
- Replace internal
eval
usage withFunction
constructor- Use instance methods on
process
to check for listeners- deps: [email protected]
- Remove set content headers that break response
- deps: [email protected]
- deps: [email protected]
- deps: [email protected]
- Prevent loss of async hooks context
- deps: [email protected]
- deps: [email protected]
... (truncated)
8368dc1
4.18.261f4049
docs: replace Freenode with Libera Chatbb7907b
build: [email protected]f56ce73
build: [email protected]24b3dc5
deps: [email protected]689d175
deps: [email protected]340be0f
build: [email protected]33e8dc3
docs: use Node.js name style644f646
build: [email protected]ecd7572
build: [email protected]Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps qs from 6.5.2 to 6.5.3.
Sourced from qs's changelog.
6.5.3
- [Fix]
parse
: ignore__proto__
keys (#428)- [Fix]
utils.merge
: avoid a crash with a null target and a truthy non-array source- [Fix] correctly parse nested arrays
- [Fix]
stringify
: fix a crash withstrictNullHandling
and a customfilter
/serializeDate
(#279)- [Fix]
utils
:merge
: fix crash whensource
is a truthy primitive & no options are provided- [Fix] when
parseArrays
is false, properly handle keys ending in[]
- [Fix] fix for an impossible situation: when the formatter is called with a non-string value
- [Fix]
utils.merge
: avoid a crash with a null target and an array source- [Refactor]
utils
: reduce observable [[Get]]s- [Refactor] use cached
Array.isArray
- [Refactor]
stringify
: Avoid arr = arr.concat(...), push to the existing instance (#269)- [Refactor]
parse
: only need to reassign the var once- [Robustness]
stringify
: avoid relying on a globalundefined
(#427)- [readme] remove travis badge; add github actions/codecov badges; update URLs
- [Docs] Clean up license text so itβs properly detected as BSD-3-Clause
- [Docs] Clarify the need for "arrayLimit" option
- [meta] fix README.md (#399)
- [meta] add FUNDING.yml
- [actions] backport actions from main
- [Tests] always use
String(x)
overx.toString()
- [Tests] remove nonexistent tape option
- [Dev Deps] backport from main
298bfa5
v6.5.3ed0f5dc
[Fix] parse
: ignore __proto__
keys (#428)691e739
[Robustness] stringify
: avoid relying on a global undefined
(#427)1072d57
[readme] remove travis badge; add github actions/codecov badges; update URLs12ac1c4
[meta] fix README.md (#399)0338716
[actions] backport actions from main5639c20
Clean up license text so itβs properly detected as BSD-3-Clause51b8a0b
add FUNDING.yml45f6759
[Fix] fix for an impossible situation: when the formatter is called with a no...f814a7f
[Dev Deps] backport from mainDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Initial Release of Intepret-Text repository. Includes: - Classical Text Explainer - (Default: Bag-of-words with Logistic Regression) - Unified Information Explainer - Introspective Rationale Explainer - Dashboard for visualization
text-interpretability explainer nlp-models jupyter-notebook visualization-dashboard nlp-scenarios data-scientists glass-box-explainers linear-models grey-box-explainers black-box-explanations nlp text-classification data-analyst local-explanations python npm microsoft-azureml azure-sdk