Hello!
Below you can find a outline of how to reproduce our solution for the CHAMPS competition. If you run into any trouble with the setup/code or have any questions please contact us at [email protected]
Copyright 2019 Robert Bosch GmbH
Code authors: Zico Kolter, Shaojie Bai, Devin Wilmott, Mordechai Kornbluth, Jonathan Mailoa, part of Bosch Research (CR).
config/
: Configuration filesdata/
: Raw datamodels/
: Saved modelsprocessed/
: Processed datasrc/
: Source code for preprocessing, training, and predicting.submission/
: Directory for the actual predictionsThe variety of models were trained on different machines, each running a Linux OS: - 5 machines had 4 GPUs, each a NVIDIA GeForce RTX 2080 Ti - 2 machines had 1 GPU NVIDIA Tesla V100 with 32 GB memory - 6 machines had 1 GPU NVIDIA Tesla V100 with 16 GB memory
Python packages are detailed separately in requirements.txt
.
Note: Though listed in requirements.txt
, rdkit
is not available with pip
. We strongly suggest installing rdkit
via conda:
sh
conda install -c rdkit rdkit
We use only the train.csv
, test.csv
, and structures.csv
files of the competition. They should be (unzipped and) placed in the data/
directory. All of the commands below are executed from the src/
directory.
cd src/
python pipeline_pre.py 1
(This could take 1-2 hours)python pipeline_pre.py 2
(You may need to change the permission to the .csv
files to use the two scripts above via chmod
.)
While in src/
:
1. Very fast prediction: predictor.py fast
to use the precomputed results for ensembling.
2. Ordinary prediction: predictor.py
to use the precomputed checkpoints for predicting and ensembling.
3. Re-train models: train.py
to train a new model from scratch. See train.py -h
for allowed arguments, and config
files for each model for the arguments used.
The config/models.json
file contains the following important keys:
model_dir
: The directory in models/
associated with each model. Each directory must have
1) graph_transformer.py
with a GraphTransformer
class (and any modules it needs);
2) config
file with the kwargs to instantiate the GraphTransformer
class;
3) [MODEL_NAME].ckpt
that can be loaded via load_state_dict(torch.load('[MODEL_NAME].ckpt').state_dict())
(to avoid PyTorch version conflict).All pretrained models are stored in models/
. However, different models may have slightly different architecture (e.g., some GT models are followed by a 2-layer grouped residual network, while some others only have one residual block). The training script (train.py
), when initiated without the --debug
flag, will automatically create a log folder in CHAMPS-GT/
that contains the code for the GT used. When loading the model, use the graph_transformer.py
in that log folder (instead of the default one in src/
).
When trained from scratch, the default parameters should lead to a model achieving a score of around -3.06 to -3.07. Using --debug
flag will prevent the program from creating a log folder.
What if you got a CUDA out of memory
error? We suggest a few solutions:
- If you have a multi-GPU machine, use the --multi_gpu
flag, and tune the --gpu0_bsz
flag (which controls the minibatch size passed to GPU device 0). For instance, on a 4-GPU machine, you can do python train.py [...] --batch_size 47 --multi_gpu --gpu0_bsz 11
, which assigns a batch size of 12 to GPU 1,2,3
and a batch size of 11 to GPU 0
.
- Use the --fp16
option, which applies NVIDIA APEX's mixed precision training.
- Use the --batch_chunk
option, which chunks a larger batch into a few smaller (equal) shares. The gradients from the smaller minibatches will accumulate, so the effective batch size is still the same as --batch_size
.
- Use fewer --n_layer
, or smaller --batch_size
:P
Bumps ipython from 7.6.1 to 8.10.0.
Sourced from ipython's releases.
See https://pypi.org/project/ipython/
We do not use GitHub release anymore. Please see PyPI https://pypi.org/project/ipython/
7.9.0
No release notes provided.
7.8.0
No release notes provided.
7.7.0
No release notes provided.
15ea1ed
release 8.10.0560ad10
DOC: Update what's new for 8.10 (#13939)7557ade
DOC: Update what's new for 8.10385d693
Merge pull request from GHSA-29gw-9793-fvw7e548ee2
Swallow potential exceptions from showtraceback() (#13934)0694b08
MAINT: mock slowest test. (#13885)8655912
MAINT: mock slowest test.a011765
Isolate the attack tests with setUp and tearDown methodsc7a9470
Add some regression tests for this changefd34cf5
Swallow potential exceptions from showtraceback()Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps numpy from 1.16.4 to 1.22.0.
Sourced from numpy's releases.
v1.22.0
NumPy 1.22.0 Release Notes
NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:
- Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
- A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
- NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
- New methods for
quantile
,percentile
, and related functions. The new methods provide a complete set of the methods commonly found in the literature.- A new configurable allocator for use by downstream projects.
These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.
The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.
Expired deprecations
Deprecated numeric style dtype strings have been removed
Using the strings
"Bytes0"
,"Datetime64"
,"Str0"
,"Uint32"
, and"Uint64"
as a dtype will now raise aTypeError
.(gh-19539)
Expired deprecations for
loads
,ndfromtxt
, andmafromtxt
in npyio
numpy.loads
was deprecated in v1.15, with the recommendation that users usepickle.loads
instead.ndfromtxt
andmafromtxt
were both deprecated in v1.17 - users should usenumpy.genfromtxt
instead with the appropriate value for theusemask
parameter.(gh-19615)
... (truncated)
4adc87d
Merge pull request #20685 from charris/prepare-for-1.22.0-releasefd66547
REL: Prepare for the NumPy 1.22.0 release.125304b
wipc283859
Merge pull request #20682 from charris/backport-204165399c03
Merge pull request #20681 from charris/backport-20954f9c45f8
Merge pull request #20680 from charris/backport-20663794b36f
Update armccompiler.pyd93b14e
Update test_public_api.py7662c07
Update init.py311ab52
Update armccompiler.pyDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps notebook from 6.1.5 to 6.4.12.
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Command : python pipline_pre.py 2
It seems to be like, there is a new type of bond in testdata, which never appears in the train data.
what embedding index should it takes? using 0? (the "None" index?)
And it is like only one special key error appears.. "1JHN-H_1.0_1.0_1.0_0.0-N_2.0_2.0_1.0_1.0" in bonds['type_2']
``` Loading data... Sorting... Adding embeddings and scaling... Loading test data...
File "/home/kaggle/BCAI_kaggle_CHAMPS/src/pipeline_pre.py", line 554, in
KeyError: '1JHN-H_1.0_1.0_1.0_0.0-N_2.0_2.0_1.0_1.0' ```
bcai