Code for the paper "What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation" at RecSys'20

Guzpenha, updated 🕥 2022-06-22 02:35:34

Probing LMs for Conversational Recommendation

In our paper "What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation" we devise probing tasks to evaluate language models knowledge already stored in its parameters. We probe LMs (without any finetunning) for three types of knowledge: genre, search and recommendation.

Running probes

  1. Clone repo and install rec_probing in a python (>=3.6) virtual env:
    ``` git clone https://github.com/Guzpenha/ConvRecProbingBERT.git cd ConvRecProbingBERT

python3 -m venv env source env/bin/activate pip install -r requirements.txt

cd rec_probing pip install -e . ```

  1. Download required datasets and use scripts to preprocess them: ./download_data.sh ./run_datasets_creation.sh

This will download and preprocess a few datasets:

|| Recommendation | Search | Conversational Recommendation | |-------------|-------------|------------|------------| |Movies | ML25M: 25m movie ratings | Reviews crawled from IMDB | Conversations crawled from /r/moviesuggestions/ | Books | GoodReads: 200m book interactions | Reviews from GoodReads | Conversations crawled from /r/booksuggestions/ | | Music | Amazon-Music: 2.3m ratings/reviews | Reviews from Amazon-Music | Conversations crawled from /r/musicuggestions/ |

As well as categories information for items of the 3 domains.

  1. Use our python scripts to run probes:

```

Search and Recommendation

python run_probes.py \ --task $TASK \ --probe_type ${PROBE_TYPE} \ --input_folder $REPO_DIR/data/${PROBE_TYPE}/ \ --output_folder $REPO_DIR/data/output_data/probes/ \ --number_queries $NUMBER_PROBE_QUERIES \ --number_candidates 5 \ --batch_size 64 \ --probe_technique ${PROBE_TECHNIQUE} \ --bert_model 'bert-base-cased' ```

Where PROBE_TYPE can be ['recommendation', 'search'], PROBE_TECHNIQUE can be ['mean-sim', 'cls-sim', 'nsp'] and TASK can be ['ml25m' 'gr' 'music'] for the domains of movies, books and music respectivelly.

```

Genres

python run_mlm_probe.py \ --task $TASK \ --input_folder $REPO_DIR/data/recommendation/ \ --output_folder $REPO_DIR/data/output_data/probes/ \ --number_queries $NUMBER_PROBE_QUERIES \ --batch_size 32 \ --sentence_type ${SENTENCE_TYPE} \ --bert_model 'roberta-large' ``` Where SENTENCE_TYPE can be ['no-item', 'type-I', 'type-II'] and TASK can be ['ml25m' 'gr' 'music'] for the domains of movies, books and music respectivelly.

Running response ranking for reddit conv. recommendation data

In order to get the results from Table 7 of the paper, regarding models conversation response ranking results on the conversational recomendation reddit data, use:

cd list_wise_reformer pip install -e . cd list_wise_reformer/scripts ./run_all_dialogue_baselines.sh

Ignore that the package is named list_wise_reformer. It contains several baselines for dialogue, search and recommendation, including a prototype of a list wise Reformer model.

Infusing knowledge

We interleave the probing tasks with the response ranking task by creating a dataset with half instances from each task. We create the dataset using the script rec_probing/rec_probing/scripts/generate_data_for_mt.py. We then use the previous script to train the model on this data.

Experiments with ReDial Data

We use the same framework from the other tasks, the difference is that we need to create the adversarial test data. For that we use the script data/genereate_adversarial_test.py.

Reference @inproceedings{10.1145/3383313.3412249, author = {Penha, Gustavo and Hauff, Claudia}, title = {What Does BERT Know about Books, Movies and Music? Probing BERT for Conversational Recommendation}, year = {2020}, isbn = {9781450375832}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3383313.3412249}, doi = {10.1145/3383313.3412249}, booktitle = {Fourteenth ACM Conference on Recommender Systems}, pages = {388–397}, numpages = {10}, keywords = {conversational search, probing, conversational recommendation}, location = {Virtual Event, Brazil}, series = {RecSys '20} }

Issues

Bump numpy from 1.16.2 to 1.22.0 in /list_wise_reformer/list_wise_reformer/models/BERT4Rec-VAE-Pytorch

opened on 2022-06-22 02:35:33 by dependabot[bot]

Bumps numpy from 1.16.2 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

  • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
  • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
  • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
  • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
  • A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Guzpenha/ConvRecProbingBERT/network/alerts).

Bump numpy from 1.18.2 to 1.22.0

opened on 2022-06-22 02:32:46 by dependabot[bot]

Bumps numpy from 1.18.2 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

  • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
  • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
  • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
  • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
  • A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Guzpenha/ConvRecProbingBERT/network/alerts).

Bump ipython from 7.12.0 to 7.16.3

opened on 2022-01-21 20:33:19 by dependabot[bot]

Bumps ipython from 7.12.0 to 7.16.3.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Guzpenha/ConvRecProbingBERT/network/alerts).
Gustavo Penha

Researcher - IR - RecSys - ML - NLP. https://linktr.ee/guzpenha

GitHub Repository