AWS Neuron is a software development kit (SDK) enabling high-performance deep learning acceleration using AWS Inferentia and Trainium, AWS's custom designed machine learning accelerators. With Neuron, you can develop, profile, and deploy high-performance machine learning workloads on top of accelerated EC2 instances, e.g. Inf1 and Trn1.
Neuron includes a compiler, runtime driver, as well as debug and profiling utilities with a TensorBoard plugin for visualization, and is pre-integrated into popular machine learning frameworks like Pytorch, TensorFlow and MXNet, to provide a seamless machine learning acceleration workflow.
For full documentations including user guide, Howtos and Tutorials see Neuron SDK’s documentation
If none of the github and online resources have an answer to your question, checkout the AWS Neuron support forum.
When using optimum-neuron
, the following command leads to a compilation error:
bash
python examples/summarization/run_summarization.py --model_name_or_path facebook/bart-base --dataset_name cnn_dailymail --dataset_config_name 3.0.0 --do_train --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --max_source_length 1024 --max_target_length 128 --overwrite_output_dir --output_dir test_bart_base --bf16 --label_smoothing_factor 0.001
Stack trace:
```
To fix the error:
1. Raise the error with the AWS Neuron team.
2. If the error is already fixed, you can retry compilation by passing --retry_failed_compilation in NEURON_CC_FLAGS
2023-03-20 14:14:58.418509: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at tpu_execute_op.cc:266 : INTERNAL: neuronx-cc compilation failed.
0%| | 2/861339 [00:02<270:21:19, 1.13s/it$
2023-03-20 14:14:58.551489: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] StackTrace:
2023-03-20 14:14:58.551531: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] Begin stack trace
2023-03-20 14:14:58.551536: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] tsl::CurrentStackTrace()
2023-03-20 14:14:58.551546: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] xla::util::ReportComputationError(tsl::Status const&, absl::lts_20220623::Span
```
Current observations:
--label_smoothing_factor 0.001
argument, it worksMore on this issue here => https://github.com/huggingface/optimum-neuron/issues/19
OS unbuntu 20.04.5 LTS
dmlc-tvm 1.13.0.0+0
neuron-cc 1.13.5.0+7dcf000a6
torch 1.12.1
torch-neuron 1.12.1.2.5.8.0
torchvision 0.13.1
transformers 4.26.1
aws-neuronx-dkms/unknown,now 2.6.33.0 amd64 [installed,upgradable to: 2.7.33.0]
aws-neuronx-tools/unknown,now 2.6.1.0 amd64 [installed,upgradable to: 2.8.2.0]
Short version:
Could not export Python function call 'XSoftmax'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation?
from transformers import AutoModelForSequenceClassification
from optimum.exporters.neuron import export, validate_model_outputs from optimum.exporters.neuron.model_configs import DebertaNeuronConfig
model_id = "hf-internal-testing/tiny-random-DebertaModel" model = AutoModelForSequenceClassification.from_pretrained(model_id) reference_model = copy.deepcopy(model) neuron_config = DebertaNeuronConfig( config=model.config, task="sequence-classification", batch_size=2, sequence_length=18 ) output_path = Path("model.pt")
export( model=model, config=neuron_config, output=output_path, auto_cast="none", )
run = 10 value_failures = [] for _ in range(run): validate_model_outputs( config=neuron_config, reference_model=reference_model, neuron_model_path=output_path, neuron_named_outputs=["logits"], ) ```
It seems that the tracing failed for deberta’s masked softmax as it is a custom auto grad function. As with neuron-cc compiler is for inference and has no need for supporting backward, could replace it with:
Hacky fix ```python class XSoftmax(): @staticmethod def apply(input, mask, dim): rmask = ~(mask.to(torch.bool))
output = input.masked_fill(rmask, torch.tensor(torch.finfo(input.dtype).min))
output = torch.softmax(output, dim)
output.masked_fill_(rmask, 0)
return output
```
XDropout
has the same issue, but with I guess with model.eval()
it would be fine.
Possible solution
If it makes sense, we could add a model patcher for models with custom autograd functions in optimum-neuron
. Or does the Annapurna team has any better suggestion?
System information
OS unbuntu 20.04.5 LTS
aws-neuronx-runtime-discovery 2.9
libneuronxla 0.5.101
neuronx-cc 2.4.0.21+b7621be18
neuronx-hwm 2.4.0.1+90172456c
torch 1.13.1
torch-neuronx 1.13.0.1.4.0
torch-xla 1.13.0+torchneuron3
torchvision 0.14.1
transformers 4.26.1
aws-neuronx-collectives/unknown,now 2.11.47.0-36959342f amd64 [installed]
aws-neuronx-dkms/unknown,now 2.7.15.0 amd64 [installed,upgradable to: 2.7.33.0]
aws-neuronx-runtime-lib/unknown,now 2.11.43.0-45b29be2c amd64 [installed]
aws-neuronx-tools/unknown,now 2.7.2.0 amd64 [installed,upgradable to: 2.8.2.0]
Issue 1: ConvBERT - tracing failed on INF2 with neuronx-cc
1e-5
) difference on INF2 compared to PyTorch on CPUE AssertionError: xlm, sequence-classification -> The maximum absolute difference between the output of the reference model and the Neuron exported model is not within the set tolerance 0.0001:
E - logits: max diff = 0.09034809470176697
* Flaubert
E AssertionError: flaubert, sequence-classification -> The maximum absolute difference between the output of the reference model and the Neuron exported model is not within the set tolerance 0.0001:
E - logits: max diff = 0.04413249343633652
[Reproduction]
```python import copy from pathlib import Path
from transformers import AutoModelForSequenceClassification
from optimum.exporters.neuron import export, validate_model_outputs from optimum.exporters.neuron.model_configs import BertNeuronConfig, DistilBertNeuronConfig, FlaubertNeuronConfig, XLMNeuronConfig
model_id = "YituTech/conv-bert-base"
model = AutoModelForSequenceClassification.from_pretrained(model_id) reference_model = copy.deepcopy(model) neuron_config = BertNeuronConfig(config=model.config, task="sequence-classification", batch_size=2, sequence_length=18)
output_path = Path("model.pt")
export( model=model, config=neuron_config, output=output_path, )
run = 10 value_failures = [] for _ in range(run): validate_model_outputs( config=neuron_config, reference_model=reference_model, neuron_model_path=output_path, neuron_named_outputs=["logits"], )
```
For flaubert and xlm to get the warning of large difference, need to decrease the ATOL in optimum.exporters.neuron
exporter, as I set it quite high for the moment to keep it silent:
c.f. https://github.com/huggingface/optimum-neuron/blob/3a8d6d1771cfda60fb659691a6df8808ee300033/optimum/neuron/exporter/model_configs.py#L45-L46 https://github.com/huggingface/optimum-neuron/blob/3a8d6d1771cfda60fb659691a6df8808ee300033/optimum/neuron/exporter/model_configs.py#L57-L58
I'm trying to compile a GPT2 model and got an error message. The model compiled successfully in torchscript. I believe that it's rooted in some broadcasting issue.
Here's what I ran: ``` from transformers import GPT2Tokenizer, GPT2Model tokenizer = GPT2Tokenizer.from_pretrained(model_path) tokenizer.pad_token = tokenizer.eos_token model = GPT2Model.from_pretrained(model_path, torchscript=True, return_dict=False) model.eval()
neuroncore_pipeline_cores = 4 sequence_size = 1024
inputs = tokenizer(["This is a test"], return_tensors="pt", max_length=sequence_size, padding='max_length') neuron_pipeline_model = torch.neuron.trace(model, example_inputs = inputs['input_ids'], compiler_args = ['--neuroncore-pipeline-cores', str(neuroncore_pipeline_cores)], ) ```
Error message: ``` INFO:Neuron:There are 2 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/compiler/neuron-cc/neuron-cc-ops/neuron-cc-ops-pytorch.html) INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 2444, fused = 2433, percent fused = 99.55%
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torch_neuron/ops/aten.py:2277: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:Neuron:Compiler args type is
...
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: *********
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: An Internal Compiler Error has occurred
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: *********
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]:
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Error message: Traceback (most recent call last):
[bt] (8) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x8edf26) [0x7f1c32443f26]
[bt] (7) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x8f2392) [0x7f1c32448392]
[bt] (6) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x8e8bd1) [0x7f1c3243ebd1]
[bt] (5) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x8e673b) [0x7f1c3243c73b]
[bt] (4) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x8ebf11) [0x7f1c32441f11]
[bt] (3) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x9afb50) [0x7f1c32505b50]
[bt] (2) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0xaa6840) [0x7f1c325fc840]
[bt] (1) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0xac629b) [0x7f1c3261c29b]
[bt] (0) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x5edf75) [0x7f1c32143f75]
File "/opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.13.0.0/AL2_x86_64/generic-flavor/src/topi/include/topi/broadcast.h", line 32
TVMError: Check failed: output_shape.size() >= t->shape.size() (0 vs. 1) : Not a broadcast, output dimensionality smaller than input.
output: []
vs
input: Tensor(shape=[1], op.name=dram://placeholder15)
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]:
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Error class: TVMError
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Error location: Unknown
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Command line: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /tmp/tmpi916968s/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpi916968s/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 1024], "int64"], "1:0": [[], "int64"], "2:0": [[1, 1024, 1024], "float32"], "3:0": [[1, 1024, 1024], "float32"]}, "outputs": ["aten_view/Reshape:0", "GPT2Block_90/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_90/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_2/transpose:0"]}' --neuroncore-pipeline-cores 4 --verbose 35
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]:
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Internal details:
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/CommandDriver.py", line 224, in neuroncc.driver.CommandDriver.CommandDriver.run
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 576, in neuroncc.driver.commands.CompileCommand.CompileCommand.run
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 554, in neuroncc.driver.commands.CompileCommand.CompileCommand.runPipeline
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 558, in neuroncc.driver.commands.CompileCommand.CompileCommand.runPipeline
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 427, in neuroncc.driver.jobs.Frontend.Frontend.runSingleInput
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 377, in neuroncc.driver.jobs.Frontend.Frontend.runTVMFrontend
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 378, in neuroncc.driver.jobs.Frontend.Frontend.runTVMFrontend
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 382, in neuroncc.driver.jobs.Frontend.Frontend.runTVMFrontend
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/relay/build_module.py", line 731, in build_graph
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: func = pre_partitioning_passes(func, params)
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/relay/build_module.py", line 282, in pre_partitioning_passes
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: func = ir_pass.fold_constant(func)
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/relay/ir_pass.py", line 707, in fold_constant
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: return _ir_pass.FoldConstant(expr)
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/_ffi/_ctypes/function.py", line 190, in call
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: raise get_last_ffi_error()
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]:
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Version information:
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Neuron Compiler version 1.13.5.0+7dcf000a6
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]:
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: HWM version 1.13.0.0-0
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: NEFF version Dynamic
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: TVM version 1.13.0.0+0
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: NumPy version 1.18.5
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: MXNet not available
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: TF not available
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]:
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Artifacts stored in: /tmp/tmpi916968s
Compiler status ERROR
INFO:Neuron:Compile command returned: 1 WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$956; falling back to native python function call ERROR:Neuron:neuron-cc failed with the following command line call: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /tmp/tmpi916968s/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpi916968s/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 1024], "int64"], "1:0": [[], "int64"], "2:0": [[1, 1024, 1024], "float32"], "3:0": [[1, 1024, 1024], "float32"]}, "outputs": ["aten_view/Reshape:0", "GPT2Block_90/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_90/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_2/transpose:0"]}' --neuroncore-pipeline-cores 4 --verbose 35 Traceback (most recent call last): File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torch_neuron/convert.py", line 392, in op_converter item, inputs, compiler_workdir=sg_workdir, **kwargs) File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torch_neuron/decorators.py", line 229, in trace 'neuron-cc failed with the following command line call:\n{}'.format(command)) subprocess.SubprocessError: neuron-cc failed with the following command line call: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /tmp/tmpi916968s/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpi916968s/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 1024], "int64"], "1:0": [[], "int64"], "2:0": [[1, 1024, 1024], "float32"], "3:0": [[1, 1024, 1024], "float32"]}, "outputs": ["aten_view/Reshape:0", "GPT2Block_90/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_90/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_2/transpose:0"]}' --neuroncore-pipeline-cores 4 --verbose 35 INFO:Neuron:Number of arithmetic operators (post-compilation) before = 2444, compiled = 0, percent compiled = 0.0% INFO:Neuron:The neuron partitioner created 1 sub-graphs INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0% INFO:Neuron:Compiled these operators (and operator counts) to Neuron: INFO:Neuron:Not compiled operators (and operator counts) to Neuron: INFO:Neuron: => aten::Int: 557 [supported] INFO:Neuron: => aten::ScalarImplicit: 25 [supported] INFO:Neuron: => aten::add: 98 [supported] INFO:Neuron: => aten::addmm: 96 [supported] INFO:Neuron: => aten::arange: 1 [supported] INFO:Neuron: => aten::contiguous: 24 [supported] INFO:Neuron: => aten::div: 24 [supported] INFO:Neuron: => aten::dropout: 73 [supported] INFO:Neuron: => aten::embedding: 2 [not supported] INFO:Neuron: => aten::full: 48 [supported] INFO:Neuron: => aten::layer_norm: 49 [supported] INFO:Neuron: => aten::matmul: 48 [supported] INFO:Neuron: => aten::mul: 96 [supported] INFO:Neuron: => aten::permute: 96 [supported] INFO:Neuron: => aten::pow: 48 [supported] INFO:Neuron: => aten::size: 555 [supported] INFO:Neuron: => aten::slice: 96 [supported] INFO:Neuron: => aten::softmax: 24 [supported] INFO:Neuron: => aten::split: 24 [supported] INFO:Neuron: => aten::sub: 24 [supported] INFO:Neuron: => aten::tanh: 24 [supported] INFO:Neuron: => aten::to: 72 [supported] INFO:Neuron: => aten::transpose: 24 [supported] INFO:Neuron: => aten::unsqueeze: 1 [supported] INFO:Neuron: => aten::view: 291 [supported] INFO:Neuron: => aten::where: 24 [not supported]
RuntimeError Traceback (most recent call last)
~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torch_neuron/convert.py in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, single_fusion_ratio_threshold, _neuron_trace, compiler_args, optimizations, verbose, kwargs) 193 logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line())) 194 neuron_graph = cu.compile_fused_operators(neuron_graph, compile_kwargs) --> 195 cu.stats_post_compiler(neuron_graph) 196 197 # Wrap the compiled version of the model in a script module. Note that this is
~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torch_neuron/convert.py in stats_post_compiler(self, neuron_graph) 501 if succesful_compilations == 0 and not self.allow_no_ops_on_neuron: 502 raise RuntimeError( --> 503 "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") 504 505 if percent_operations_compiled < 50.0:
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! ```
Based on the broadcasting error message and some experiments, I narrowed it down to several lines of code in transformers.gpt2.modeling_gpt2.GPT2Attention._attn
where we try to divide by some broadcasted tensor:
if self.scale_attn_weights:
attn_weights = attn_weights / torch.full([], value.size(-1) ** 0.5, dtype=attn_weights.dtype, device=attn_weights.device
Changing the divisor into a scalar or vector using torch.full_like
seemed to work.
A minimal reproducible example in my environment:
def test_divide(inputs):
inputs = inputs / torch.full(
[], 2, dtype=inputs.dtype, device=inputs.device
)
return inputs
neuron_pipeline_model = torch.neuron.trace(test_divide,
example_inputs = torch.ones(2,2,512),
verbose="debug",
)
Tracing with torchscript worked fine:
traced_model = torch.jit.trace(test_divide, torch.ones(2,2,512))
My environment is a sagemaker notebook instance:
aws-neuron-dkms.noarch 2.2.8.0-dkms @neuron
aws-neuron-runtime-base.x86_64 1.6.21.0-1 @neuron
aws-neuron-tools.x86_64 2.0.790.0-1 @neuron
tensorflow-model-server-neuronx.x86_64 2.8.0.2.5.6.0-0 @neuron
torch 1.12.1
torch-neuron 1.12.1.2.5.8.0
torcheia 1.0.0
torchvision 0.13.1
neuron-cc 1.13.5.0+7dcf000a6
I compared the throughput of Inferentia(inf1.xlarge) and NVIDIA T4 GPU(g4dn.xlarge) with MobileNetV2, ResNet50, ResNet101 and here are the results.
ResNet101 : 481
T4 GPU throughput (inferences /sec)
We tested GPU on TensorFlow 2.3 models, and compiled the same models with tensorflow-neuron 2.3.4 to run on Inferentia. We compiled the models with Neuron batch size 4 and fast-math=none. When running the tests, each model occupied the entire device - for Inferentia a single model occupied all 4 NeuronCores. We spawned a separate process for each NeuronCore, resulting in 4 processes.
We found that while Inferentia showed similar performance for ResNet and MobileNetV2, the performance of GPU varied greatly depending on the model, ranging from 175 inferences per second for ResNet101 to 718 inferences per second for MobileNetV2.
What could have caused Inferentia to outperform GPU by a large margin on ResNet, while underperforming on MobileNetV2?
Could you please update torch module version to 1.13.1 for the next release in order to close critical security flaw?
CVE-2022-45907 | CWE-77
Arbitrary Code Execution: torch is vulnerable to arbitrary code execution. The vulnerability exists in annotations.py
because the existing eval not properly validated which allows an attacker to take over an existing account and execute malicious code into the system.
P.S. Please suggest any timelines for that.
This release adds support for EC2 Inf2 instances, introduces initial inference support with TensorFlow 2.x Neuron (tensorflow-neuronx
) on Trn1 and Inf2, and introduces minor enhancements and bug fixes.
This release introduces the following:
What’s New | Details -- | -- Support for EC2 Inf2 instances | Inference support for Inf2 instances in PyTorch Neuron (torch-neuronx) Inference support for Inf2 instances in TensorFlow 2.x Neuron (tensorflow-neuronx) Overall documentation update to include Inf2 instances TensorFlow 2.x Neuron (tensorflow-neuronx) support | This releases introduces initial inference support with TensorFlow 2.x Neuron (tensorflow-neuronx) on Trn1 and Inf2 New Neuron GitHub samples | New sample scripts for deploying LLM models with transformer-neuronx under aws-neuron-samples GitHub repository. New sample scripts for deploying models with torch-neuronx under aws-neuron-samples repository GitHub repository. Minor enhancements and bug fixes. | See Neuron Components Release Notes Release included packages | see Release Content
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
This release introduces new capabilities and libraries, as well as features and tools that improves usability. This release introduces the following:
What’s New | Details -- | -- PyTorch 1.13 | Support of PyTorch 1.13 version for PyTorch Neuron (torch-neuronx). For resources see PyTorch Neuron PyTorch DistributedDataParallel (DDP) API | Support of PyTorch DistributedDataParallel (DDP) API in PyTorch Neuron (torch-neuronx). For resources how to use PyTorch DDP API with Neuron, please check Distributed Data Parallel Training Tutorial. Inference support in torch-neuronx | For more details please visit pytorch-neuronx-main` page. You can also try Neuron Inference samples https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx in the aws-neuron-samples GitHub repo. Neuron Custom C++ Operators[Experimental] | Initial support for Neuron Custom C++ Operators [Experimental] , with Neuron Custom C++ Operators (“CustomOps”) you can now write CustomOps that run on NeuronCore-v2 chips. For more resources please check Neuron Custom C++ Operators [Experimental] section. transformers-neuronx [Experimental] | transformers-neuronx is a new library enabling LLM model inference. It contains models that are checkpoint-compatible with HuggingFace Transformers, and currently supports Transformer Decoder models like GPT2, GPT-J and OPT. Please check aws-neuron-samples repository Neuron sysfs filesystem | Neuron sysfs filesystem exposes Neuron Devices under /sys/devices/virtual/neuron_device providing visibility to Neuron Driver and Runtime at the system level. By performing several simple CLIs such as reading or writing to a sysfs file, you can get information such as Neuron Runtime status, memory usage, Driver info etc. For resources about Neuron sysfs filesystem visit Neuron Sysfs User Guide. TFLOPS support in Neuron System Tools | Neuron System Tools now also report model actual TFLOPs rate in both neuron-monitor and neuron-top. More details can be found in the Neuron Tools documentation. New sample scripts for training | This release adds multiple new sample scripts for training models with torch-neuronx, Please check aws-neuron-samples repository New sample scripts for inference | This release adds multiple new sample scripts for deploying models with torch-neuronx, Please check aws-neuron-samples repository Neuron GitHub samples repository for Amazon EKS | A new AWS Neuron GitHub samples repository for Amazon EKS, Please check aws-neuron-samples repository
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
This release introduces new capabilities and libraries, as well as features and tools that improves usability. This release introduces the following:
This release introduces the support of PyTorch 1.12 version, and introduces PyTorch Neuron (torch-neuronx) profiling through Neuron Plugin for TensorBoard. Pytorch Neuron (torch-neuronx) users can now profile their models through the following TensorBoard views:
Operator Framework View
Operator HLO View
Operator Trace View
This release introduces the support of LAMB optimizer for FP32 mode, and adds support for capturing snapshots of inputs, outputs and graph HLO for debugging.
In addition, this release introduces the support of new operators and resolves issues that improve stability for Trn1 customers.
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
This is a major release which introduces new features and resolves issues that improve stability for Inf1 customers.
Component | New in this release -- | -- PyTorch Neuron (torch-neuron) | PyTorch 1.12 support Python 3.8 support LSTM support on Inf1 R-CNN support on Inf1 Support for new API for core placement Support for improved logging Improved torch_neuron.trace() performance when using large graphs Reduced host memory usage of loaded models in libtorchneuron.so Additional operators support TensorFlow Neuron (tensorflow-neuron) | tf-neuron-auto-multicore tool to enable automatic data parallel on multiple NeuronCores. Experimental support for tracing models larger than 2GB using extract-weights flag (TF2.x only), see TensorFlow 2.x (tensorflow-neuron) Tracing API tfn.auto_multicore Python API to enable automatic data parallel (TF2.x only)
This Neuron release is the last release that will include torch-neuron
versions 1.7 and 1.8, and that will include tensorflow-neuron
versions 2.5 and 2.6.
In addition, this release introduces changes to the Neuron packaging and installation instructions for Inf1 customers, see Introducing Neuron packaging and installation changes for Inf1 customers for more information.
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
This is a major release which introduces new features and resolves issues that improve stability for Inf1 customers.
This release introduces new features and resolves issues that improve stability. The release introduces "memory utilization breakdown" feature in both :ref:Neuron Monitor <neuron-monitor-ug>
and :ref:Neuron Top <neuron-top-ug>
system tools. The release introduces support for "NeuronCore Based Sheduling" capability to the Neuron Kubernetes Scheduler and introduces new operators support in :ref:Neuron Compiler <neuronx-cc>
and :ref:PyTorch Neuron <torch-neuronx-rn>
. This release introduces also additional eight (8) samples of models' fine tuning using PyTorch Neuron. The new samples can be found in the AWS Neuron Samples GitHub repository.
This Neuron 2.x release extends Neuron 1.x and adds support for the new AWS Trainium powered Amazon EC2 Trn1 instances. With this release, you can now run deep learning training workloads on Trn1 instances to save training costs by up to 50% over equivalent GPU-based EC2 instances, while getting the highest training performance in AWS cloud for popular NLP models.
Supported instances: Trn1
Supported Frameworks: PyTorch Neuron (torch-neuronx)
Supported Data-types
FP32, BF16
Supported Rounding Modes
Stochastic Rounding (SR)
Round Nearest ties to Even (RNE)
Supported Automatic Casting Methods
Neuron automatic casting of FP32 tensors / weights / operations to BF16 - Default mode
PyTorch automatic casting
Full BF16 automatic casting (via XLA_USE_BF16=1 environment variable)