Bosch solution to CHAMPS Kaggle competition

boschresearch, updated 🕥 2023-02-11 01:06:25

Hello!

Below you can find a outline of how to reproduce our solution for the CHAMPS competition. If you run into any trouble with the setup/code or have any questions please contact us at [email protected]

Copyright 2019 Robert Bosch GmbH

Code authors: Zico Kolter, Shaojie Bai, Devin Wilmott, Mordechai Kornbluth, Jonathan Mailoa, part of Bosch Research (CR).

Archive Contents

  • config/ : Configuration files
  • data/ : Raw data
  • models/ : Saved models
  • processed/ : Processed data
  • src/ : Source code for preprocessing, training, and predicting.
  • submission/ : Directory for the actual predictions

Hardware (The following specs were used to create the original solution)

The variety of models were trained on different machines, each running a Linux OS: - 5 machines had 4 GPUs, each a NVIDIA GeForce RTX 2080 Ti - 2 machines had 1 GPU NVIDIA Tesla V100 with 32 GB memory - 6 machines had 1 GPU NVIDIA Tesla V100 with 16 GB memory

Software

  • Python 3.5+
  • CUDA 10.1
  • NVIDIA APEX (Only available through the repo at this phase)

Python packages are detailed separately in requirements.txt.

Note: Though listed in requirements.txt, rdkit is not available with pip. We strongly suggest installing rdkit via conda: sh conda install -c rdkit rdkit

Data Setup

We use only the train.csv, test.csv, and structures.csv files of the competition. They should be (unzipped and) placed in the data/ directory. All of the commands below are executed from the src/ directory.

Data Processing

  1. cd src/
  2. python pipeline_pre.py 1 (This could take 1-2 hours)
  3. python pipeline_pre.py 2

(You may need to change the permission to the .csv files to use the two scripts above via chmod.)

Model Build - There are three options to produce the solution.

While in src/: 1. Very fast prediction: predictor.py fast to use the precomputed results for ensembling. 2. Ordinary prediction: predictor.py to use the precomputed checkpoints for predicting and ensembling. 3. Re-train models: train.py to train a new model from scratch. See train.py -h for allowed arguments, and config files for each model for the arguments used.

The config/models.json file contains the following important keys:

  • names: List of the names we will ensemble
  • output file: The name of the ensembled output file
  • num atom types, bond types, triplet types, quad types: These are arguments to pass to the GraphTransformer instantiator. Note that in the default setting, quadruplet information is not used by GTs.
  • model_dir: The directory in models/ associated with each model. Each directory must have 1) graph_transformer.py with a GraphTransformer class (and any modules it needs); 2) config file with the kwargs to instantiate the GraphTransformer class; 3) [MODEL_NAME].ckpt that can be loaded via load_state_dict(torch.load('[MODEL_NAME].ckpt').state_dict()) (to avoid PyTorch version conflict).

Notes on (Pre-trained) Model Loading

All pretrained models are stored in models/. However, different models may have slightly different architecture (e.g., some GT models are followed by a 2-layer grouped residual network, while some others only have one residual block). The training script (train.py), when initiated without the --debug flag, will automatically create a log folder in CHAMPS-GT/ that contains the code for the GT used. When loading the model, use the graph_transformer.py in that log folder (instead of the default one in src/).

Notes on Model Training

When trained from scratch, the default parameters should lead to a model achieving a score of around -3.06 to -3.07. Using --debug flag will prevent the program from creating a log folder.

Notes on Saving Memory

What if you got a CUDA out of memory error? We suggest a few solutions: - If you have a multi-GPU machine, use the --multi_gpu flag, and tune the --gpu0_bsz flag (which controls the minibatch size passed to GPU device 0). For instance, on a 4-GPU machine, you can do python train.py [...] --batch_size 47 --multi_gpu --gpu0_bsz 11, which assigns a batch size of 12 to GPU 1,2,3 and a batch size of 11 to GPU 0. - Use the --fp16 option, which applies NVIDIA APEX's mixed precision training. - Use the --batch_chunk option, which chunks a larger batch into a few smaller (equal) shares. The gradients from the smaller minibatches will accumulate, so the effective batch size is still the same as --batch_size. - Use fewer --n_layer, or smaller --batch_size :P

Issues

Bump ipython from 7.6.1 to 8.10.0

opened on 2023-02-11 01:06:21 by dependabot[bot]

Bumps ipython from 7.6.1 to 8.10.0.

Release notes

Sourced from ipython's releases.

See https://pypi.org/project/ipython/

We do not use GitHub release anymore. Please see PyPI https://pypi.org/project/ipython/

7.9.0

No release notes provided.

7.8.0

No release notes provided.

7.7.0

No release notes provided.

Commits
  • 15ea1ed release 8.10.0
  • 560ad10 DOC: Update what's new for 8.10 (#13939)
  • 7557ade DOC: Update what's new for 8.10
  • 385d693 Merge pull request from GHSA-29gw-9793-fvw7
  • e548ee2 Swallow potential exceptions from showtraceback() (#13934)
  • 0694b08 MAINT: mock slowest test. (#13885)
  • 8655912 MAINT: mock slowest test.
  • a011765 Isolate the attack tests with setUp and tearDown methods
  • c7a9470 Add some regression tests for this change
  • fd34cf5 Swallow potential exceptions from showtraceback()
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/boschresearch/BCAI_kaggle_CHAMPS/network/alerts).

Bump numpy from 1.16.4 to 1.22.0

opened on 2022-06-21 22:49:18 by dependabot[bot]

Bumps numpy from 1.16.4 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

  • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
  • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
  • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
  • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
  • A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/boschresearch/BCAI_kaggle_CHAMPS/network/alerts).

Bump notebook from 6.1.5 to 6.4.12

opened on 2022-06-16 23:28:46 by dependabot[bot]

Bumps notebook from 6.1.5 to 6.4.12.

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/boschresearch/BCAI_kaggle_CHAMPS/network/alerts).

src/pipeline_pre.py 2 Line:554, embedding lookup has key not found error.

opened on 2020-05-31 09:54:15 by lkfo415579

Command : python pipline_pre.py 2

It seems to be like, there is a new type of bond in testdata, which never appears in the train data.

what embedding index should it takes? using 0? (the "None" index?)

And it is like only one special key error appears.. "1JHN-H_1.0_1.0_1.0_0.0-N_2.0_2.0_1.0_1.0" in bonds['type_2']

``` Loading data... Sorting... Adding embeddings and scaling... Loading test data...

File "/home/kaggle/BCAI_kaggle_CHAMPS/src/pipeline_pre.py", line 554, in bonds["type_index_" + str(t)] = bonds["type_" + str(t)].apply(lambda x : embeddings[('bond',t)][x])

KeyError: '1JHN-H_1.0_1.0_1.0_0.0-N_2.0_2.0_1.0_1.0' ```

Bosch Research
GitHub Repository

bcai