Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services

aws-neuron, updated 🕥 2023-03-21 17:48:48

neuron

AWS Neuron

Neuron SDK Overview

AWS Neuron is a software development kit (SDK) enabling high-performance deep learning acceleration using AWS Inferentia and Trainium, AWS's custom designed machine learning accelerators. With Neuron, you can develop, profile, and deploy high-performance machine learning workloads on top of accelerated EC2 instances, e.g. Inf1 and Trn1.

Neuron includes a compiler, runtime driver, as well as debug and profiling utilities with a TensorBoard plugin for visualization, and is pre-integrated into popular machine learning frameworks like Pytorch, TensorFlow and MXNet, to provide a seamless machine learning acceleration workflow.

Neuron SDK’s documentation

For full documentations including user guide, Howtos and Tutorials see Neuron SDK’s documentation

Support

If none of the github and online resources have an answer to your question, checkout the AWS Neuron support forum.

Issues

[Optimum Neuron] Compilation error when using a label smoothing factor

opened on 2023-03-20 14:31:23 by michaelbenayoun

When using optimum-neuron, the following command leads to a compilation error:

bash python examples/summarization/run_summarization.py --model_name_or_path facebook/bart-base --dataset_name cnn_dailymail --dataset_config_name 3.0.0 --do_train --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --max_source_length 1024 --max_target_length 128 --overwrite_output_dir --output_dir test_bart_base --bf16 --label_smoothing_factor 0.001

Stack trace: ``` To fix the error: 1. Raise the error with the AWS Neuron team. 2. If the error is already fixed, you can retry compilation by passing --retry_failed_compilation in NEURON_CC_FLAGS 2023-03-20 14:14:58.418509: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at tpu_execute_op.cc:266 : INTERNAL: neuronx-cc compilation failed. 0%| | 2/861339 [00:02<270:21:19, 1.13s/it$ 2023-03-20 14:14:58.551489: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] StackTrace: 2023-03-20 14:14:58.551531: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] Begin stack trace 2023-03-20 14:14:58.551536: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] tsl::CurrentStackTrace() 2023-03-20 14:14:58.551546: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] xla::util::ReportComputationError(tsl::Status const&, absl::lts_20220623::Span, absl::lts_20220623::Span) 2023-03-20 14:14:58.551553: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] xla::XrtComputationClient::ExecuteComputation(xla::ComputationClient::Computation const&, absl::lts_2$ 220623::Span const>, std::string const&, xla::ComputationClient::ExecuteComputationOptions const&) 2023-03-20 14:14:58.551562: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] 2023-03-20 14:14:58.551567: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] xla::util::MultiWait::Complete(std::function const&) 2023-03-20 14:14:58.551573: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] 2023-03-20 14:14:58.551578: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] 2023-03-20 14:14:58.551585: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] 2023-03-20 14:14:58.551593: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] clone 2023-03-20 14:14:58.551608: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] End stack trace 2023-03-20 14:14:58.551613: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] 2023-03-20 14:14:58.551622: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] Status: INTERNAL: From /job:localservice/replica:0/task:0: 2023-03-20 14:14:58.551627: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] 2 root error(s) found. 2023-03-20 14:14:58.551632: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] (0) INTERNAL: neuronx-cc compilation failed. 2023-03-20 14:14:58.551637: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] [[{{node XRTExecute}}]] 2023-03-20 14:14:58.551643: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] [[XRTExecute_G15]] 2023-03-20 14:14:58.551648: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] (1) INTERNAL: neuronx-cc compilation failed. 2023-03-20 14:14:58.551656: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] [[{{node XRTExecute}}]] 2023-03-20 14:14:58.551660: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] 0 successful operations. 2023-03-20 14:14:58.551669: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] 0 derived errors ignored. 2023-03-20 14:14:58.551673: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] Recent warning and error logs: 2023-03-20 14:14:58.551682: E tensorflow/compiler/xla/xla_client/xla_util.cc:90] OP_REQUIRES failed at tpu_execute_op.cc:266 : INTERNAL: neuronx-cc compilation failed.

```

Current observations:

  • It fails at the second step of training
  • The same command line works on GPU so it seems to be neuron-dependent
  • When removing the --label_smoothing_factor 0.001 argument, it works

More on this issue here => https://github.com/huggingface/optimum-neuron/issues/19

[Hugging Face] neuron compiler fails on tracing DeBERTa v1 and v2 models on INF1

opened on 2023-03-19 20:22:47 by JingyaHuang
  • System information OS unbuntu 20.04.5 LTS dmlc-tvm 1.13.0.0+0 neuron-cc 1.13.5.0+7dcf000a6 torch 1.12.1 torch-neuron 1.12.1.2.5.8.0 torchvision 0.13.1 transformers 4.26.1 aws-neuronx-dkms/unknown,now 2.6.33.0 amd64 [installed,upgradable to: 2.7.33.0] aws-neuronx-tools/unknown,now 2.6.1.0 amd64 [installed,upgradable to: 2.8.2.0]
Error log ``` Some weights of DebertaForSequenceClassification were not initialized from the model checkpoint at hf-internal-testing/tiny-random-DebertaModel and are newly initialized: ['pooler.dense.weight', 'classifier.weight', 'pooler.dense.bias', 'classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/models/deberta/modeling_deberta.py:664: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. scale = torch.sqrt(torch.tensor(query_layer.size(-1), dtype=torch.float) * scale_factor) /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/models/deberta/modeling_deberta.py:664: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). scale = torch.sqrt(torch.tensor(query_layer.size(-1), dtype=torch.float) * scale_factor) /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/models/deberta/modeling_deberta.py:121: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. output = input.masked_fill(rmask, torch.tensor(torch.finfo(input.dtype).min)) WARNING:Neuron: Found Python static code known as PythonOp. Static methods are not torch operators and hence aren't compilable by torch_neuron which causes graph partitioning and limits performance optimizations by neuron. WARNING:Neuron: Your PythonOp is in: /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/models/deberta/modeling_deberta.py(678): forward WARNING:Neuron: Consider replacing this static code with a non-pythonOp code to potentially improve performance. WARNING:Neuron: Found Python static code known as PythonOp. Static methods are not torch operators and hence aren't compilable by torch_neuron which causes graph partitioning and limits performance optimizations by neuron. WARNING:Neuron: Your PythonOp is in: /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/models/deberta/modeling_deberta.py(678): forward WARNING:Neuron: Consider replacing this static code with a non-pythonOp code to potentially improve performance. WARNING:Neuron: Found Python static code known as PythonOp. Static methods are not torch operators and hence aren't compilable by torch_neuron which causes graph partitioning and limits performance optimizations by neuron. WARNING:Neuron: Your PythonOp is in: /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/models/deberta/modeling_deberta.py(678): forward WARNING:Neuron: Consider replacing this static code with a non-pythonOp code to potentially improve performance. WARNING:Neuron: Found Python static code known as PythonOp. Static methods are not torch operators and hence aren't compilable by torch_neuron which causes graph partitioning and limits performance optimizations by neuron. WARNING:Neuron: Your PythonOp is in: /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/models/deberta/modeling_deberta.py(678): forward WARNING:Neuron: Consider replacing this static code with a non-pythonOp code to potentially improve performance. WARNING:Neuron: Found Python static code known as PythonOp. Static methods are not torch operators and hence aren't compilable by torch_neuron which causes graph partitioning and limits performance optimizations by neuron. WARNING:Neuron: Your PythonOp is in: /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/models/deberta/modeling_deberta.py(678): forward WARNING:Neuron: Consider replacing this static code with a non-pythonOp code to potentially improve performance. INFO:Neuron:There are 8 ops of 2 different types in the TorchScript that are not compiled by neuron-cc: prim::PythonOp, aten::embedding, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/compiler/neuron-cc/neuron-cc-ops/neuron-cc-ops-pytorch.html) INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 416, fused = 413, percent fused = 99.28% WARNING:tensorflow:From /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuron/ops/aten.py:2277: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where INFO:Neuron:Compiler args type is value is ['--fast-math', 'none'] INFO:Neuron:Compiling function _NeuronGraph$294 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ubuntu/Install/aws_neuron_venv_pytorch/bin/neuron-cc compile /tmp/tmpnzrrbpjs/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpnzrrbpjs/graph_def.neff --io-config {"inputs": {"0:0": [[2, 18], "int64"]}, "outputs": ["DebertaModel_4/DebertaEmbeddings_2/aten_slice_3/StridedSlice:0"]} --fast-math none --verbose 35' . Compiler status PASS INFO:Neuron:Compiler args type is value is ['--fast-math', 'none'] INFO:Neuron:Compiling function _NeuronGraph$295 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ubuntu/Install/aws_neuron_venv_pytorch/bin/neuron-cc compile /tmp/tmptvy9tiit/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmptvy9tiit/graph_def.neff --io-config {"inputs": {"0:0": [[2, 18, 32], "float32"], "inp.1:0": [[1, 18, 32], "float32"], "inp:0": [[2, 18, 32], "float32"], "3:0": [[2, 18], "int64"]}, "outputs": ["DebertaModel_4/DebertaEmbeddings_2/aten_mul/mul:0", "DebertaModel_4/DebertaEncoder_3/DebertaLayer_24/DebertaAttention_3/DisentangledSelfAttention_2/aten_chunk/split:2", "DebertaModel_4/DebertaEncoder_3/DebertaLayer_24/DebertaAttention_3/DisentangledSelfAttention_2/aten_matmul/MatMul:0"]} --fast-math none --verbose 35' . Compiler status PASS INFO:Neuron:Compiler args type is value is ['--fast-math', 'none'] INFO:Neuron:Compiling function _NeuronGraph$296 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ubuntu/Install/aws_neuron_venv_pytorch/bin/neuron-cc compile /tmp/tmpm6w1k8ko/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpm6w1k8ko/graph_def.neff --io-config {"inputs": {"0:0": [[2, 18], "int64"]}, "outputs": ["DebertaModel_4/DebertaEncoder_3/aten_to/Cast:0"]} --fast-math none --verbose 35' . Compiler status PASS INFO:Neuron:Compiler args type is value is ['--fast-math', 'none'] INFO:Neuron:Compiling function _NeuronGraph$297 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ubuntu/Install/aws_neuron_venv_pytorch/bin/neuron-cc compile /tmp/tmp9gswgsn5/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9gswgsn5/graph_def.neff --io-config {"inputs": {"0:0": [[2, 4, 18, 8], "float32"], "1:0": [[2, 4, 18, 18], "float32"], "2:0": [[2, 18, 32], "float32"]}, "outputs": ["DebertaModel_4/DebertaEncoder_3/DebertaLayer_24/DebertaOutput_5/DebertaLayerNorm_7/aten_add_1/add:0", "DebertaModel_4/DebertaEncoder_3/DebertaLayer_25/DebertaAttention_3/DisentangledSelfAttention_2/aten_chunk/split:2", "DebertaModel_4/DebertaEncoder_3/DebertaLayer_25/DebertaAttention_3/DisentangledSelfAttention_2/aten_matmul/MatMul:0"]} --fast-math none --verbose 35' . Compiler status PASS INFO:Neuron:Compiler args type is value is ['--fast-math', 'none'] INFO:Neuron:Compiling function _NeuronGraph$298 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ubuntu/Install/aws_neuron_venv_pytorch/bin/neuron-cc compile /tmp/tmpn8e551ay/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpn8e551ay/graph_def.neff --io-config {"inputs": {"0:0": [[2, 4, 18, 8], "float32"], "1:0": [[2, 4, 18, 18], "float32"], "2:0": [[2, 18, 32], "float32"]}, "outputs": ["DebertaModel_4/DebertaEncoder_3/DebertaLayer_25/DebertaOutput_5/DebertaLayerNorm_7/aten_add_1/add:0", "DebertaModel_4/DebertaEncoder_3/DebertaLayer_26/DebertaAttention_3/DisentangledSelfAttention_2/aten_chunk/split:2", "DebertaModel_4/DebertaEncoder_3/DebertaLayer_26/DebertaAttention_3/DisentangledSelfAttention_2/aten_matmul/MatMul:0"]} --fast-math none --verbose 35' . Compiler status PASS INFO:Neuron:Compiler args type is value is ['--fast-math', 'none'] INFO:Neuron:Compiling function _NeuronGraph$299 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ubuntu/Install/aws_neuron_venv_pytorch/bin/neuron-cc compile /tmp/tmpogvmifyw/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpogvmifyw/graph_def.neff --io-config {"inputs": {"0:0": [[2, 4, 18, 8], "float32"], "1:0": [[2, 4, 18, 18], "float32"], "2:0": [[2, 18, 32], "float32"]}, "outputs": ["DebertaModel_4/DebertaEncoder_3/DebertaLayer_26/DebertaOutput_5/DebertaLayerNorm_7/aten_add_1/add:0", "DebertaModel_4/DebertaEncoder_3/DebertaLayer_27/DebertaAttention_3/DisentangledSelfAttention_2/aten_chunk/split:2", "DebertaModel_4/DebertaEncoder_3/DebertaLayer_27/DebertaAttention_3/DisentangledSelfAttention_2/aten_matmul/MatMul:0"]} --fast-math none --verbose 35' . Compiler status PASS INFO:Neuron:Compiler args type is value is ['--fast-math', 'none'] INFO:Neuron:Compiling function _NeuronGraph$300 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ubuntu/Install/aws_neuron_venv_pytorch/bin/neuron-cc compile /tmp/tmpqjn427zi/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpqjn427zi/graph_def.neff --io-config {"inputs": {"0:0": [[2, 4, 18, 8], "float32"], "1:0": [[2, 4, 18, 18], "float32"], "2:0": [[2, 18, 32], "float32"]}, "outputs": ["DebertaModel_4/DebertaEncoder_3/DebertaLayer_27/DebertaOutput_5/DebertaLayerNorm_7/aten_add_1/add:0", "DebertaModel_4/DebertaEncoder_3/DebertaLayer_28/DebertaAttention_3/DisentangledSelfAttention_2/aten_chunk/split:2", "DebertaModel_4/DebertaEncoder_3/DebertaLayer_28/DebertaAttention_3/DisentangledSelfAttention_2/aten_matmul/MatMul:0"]} --fast-math none --verbose 35' . Compiler status PASS INFO:Neuron:Compiler args type is value is ['--fast-math', 'none'] INFO:Neuron:Compiling function _NeuronGraph$301 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ubuntu/Install/aws_neuron_venv_pytorch/bin/neuron-cc compile /tmp/tmpilyduutw/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpilyduutw/graph_def.neff --io-config {"inputs": {"0:0": [[2, 4, 18, 8], "float32"], "1:0": [[2, 4, 18, 18], "float32"], "2:0": [[2, 18, 32], "float32"]}, "outputs": ["Linear_7/aten_linear/Add:0"]} --fast-math none --verbose 35' . Compiler status PASS INFO:Neuron:Number of arithmetic operators (post-compilation) before = 416, compiled = 413, percent compiled = 99.28% INFO:Neuron:The neuron partitioner created 8 sub-graphs INFO:Neuron:Neuron successfully compiled 8 sub-graphs, Total fused subgraphs = 8, Percent of model sub-graphs successfully compiled = 100.0% INFO:Neuron:Compiled these operators (and operator counts) to Neuron: INFO:Neuron: => aten::Int: 41 INFO:Neuron: => aten::add: 42 INFO:Neuron: => aten::add_: 2 INFO:Neuron: => aten::chunk: 5 INFO:Neuron: => aten::contiguous: 5 INFO:Neuron: => aten::detach: 10 INFO:Neuron: => aten::div: 16 INFO:Neuron: => aten::gelu: 6 INFO:Neuron: => aten::linear: 22 INFO:Neuron: => aten::matmul: 10 INFO:Neuron: => aten::mean: 22 INFO:Neuron: => aten::mul: 18 INFO:Neuron: => aten::permute: 20 INFO:Neuron: => aten::pow: 11 INFO:Neuron: => aten::select: 1 INFO:Neuron: => aten::size: 46 INFO:Neuron: => aten::slice: 13 INFO:Neuron: => aten::sqrt: 16 INFO:Neuron: => aten::squeeze: 1 INFO:Neuron: => aten::sub: 22 INFO:Neuron: => aten::to: 35 INFO:Neuron: => aten::transpose: 5 INFO:Neuron: => aten::unsqueeze: 24 INFO:Neuron: => aten::view: 20 INFO:Neuron:Not compiled operators (and operator counts) to Neuron: INFO:Neuron: => aten::embedding: 3 [not supported] Traceback (most recent call last): File "test_min.py", line 18, in export( File "/home/ubuntu/optimum-neuron/optimum/neuron/exporter/convert.py", line 150, in export export_neuron(model, config, output) File "/home/ubuntu/optimum-neuron/optimum/neuron/exporter/convert.py", line 271, in export_neuron neuron_model.save(output) File "/home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/jit/_script.py", line 714, in save return self._c.save(str(f), **kwargs) RuntimeError: Could not export Python function call 'XSoftmax'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation? If this is a nn.ModuleList, add it to __constants__: /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuron/native_ops/prim.py(46): PythonOp /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuron/graph.py(342): __call__ /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuron/graph.py(209): run_op /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuron/graph.py(198): __call__ /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuron/runtime.py(69): forward /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py(1118): _slow_forward /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py(1130): _call_impl /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/jit/_trace.py(967): trace_module /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/jit/_trace.py(750): trace /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuron/tensorboard.py(324): tb_parse /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuron/tensorboard.py(550): tb_graph /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuron/decorators.py(491): maybe_generate_tb_graph_def /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuron/convert.py(552): maybe_determine_names_from_tensorboard /home/ubuntu/Install/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuron/convert.py(211): trace /home/ubuntu/optimum-neuron/optimum/neuron/exporter/convert.py(270): export_neuron /home/ubuntu/optimum-neuron/optimum/neuron/exporter/convert.py(150): export test_min.py(18): ```

Short version: Could not export Python function call 'XSoftmax'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation?

  • Reproduction ```python import copy import os from pathlib import Path

from transformers import AutoModelForSequenceClassification

from optimum.exporters.neuron import export, validate_model_outputs from optimum.exporters.neuron.model_configs import DebertaNeuronConfig

model_id = "hf-internal-testing/tiny-random-DebertaModel" model = AutoModelForSequenceClassification.from_pretrained(model_id) reference_model = copy.deepcopy(model) neuron_config = DebertaNeuronConfig( config=model.config, task="sequence-classification", batch_size=2, sequence_length=18 ) output_path = Path("model.pt")

export( model=model, config=neuron_config, output=output_path, auto_cast="none", )

run = 10 value_failures = [] for _ in range(run): validate_model_outputs( config=neuron_config, reference_model=reference_model, neuron_model_path=output_path, neuron_named_outputs=["logits"], ) ```

It seems that the tracing failed for deberta’s masked softmax as it is a custom auto grad function. As with neuron-cc compiler is for inference and has no need for supporting backward, could replace it with:

Hacky fix ```python class XSoftmax(): @staticmethod def apply(input, mask, dim): rmask = ~(mask.to(torch.bool))

    output = input.masked_fill(rmask, torch.tensor(torch.finfo(input.dtype).min))
    output = torch.softmax(output, dim)
    output.masked_fill_(rmask, 0)

    return output

```

XDropout has the same issue, but with I guess with model.eval() it would be fine.

Possible solution

If it makes sense, we could add a model patcher for models with custom autograd functions in optimum-neuron. Or does the Annapurna team has any better suggestion?

[Hugging Face] neuronx compiler unusual behaviors on ConvBERT / XLM / FlauBERT inference

opened on 2023-03-18 15:43:09 by JingyaHuang
  • System information OS unbuntu 20.04.5 LTS aws-neuronx-runtime-discovery 2.9 libneuronxla 0.5.101 neuronx-cc 2.4.0.21+b7621be18 neuronx-hwm 2.4.0.1+90172456c torch 1.13.1 torch-neuronx 1.13.0.1.4.0 torch-xla 1.13.0+torchneuron3 torchvision 0.14.1 transformers 4.26.1 aws-neuronx-collectives/unknown,now 2.11.47.0-36959342f amd64 [installed] aws-neuronx-dkms/unknown,now 2.7.15.0 amd64 [installed,upgradable to: 2.7.33.0] aws-neuronx-runtime-lib/unknown,now 2.11.43.0-45b29be2c amd64 [installed] aws-neuronx-tools/unknown,now 2.7.2.0 amd64 [installed,upgradable to: 2.8.2.0]

  • Issue 1: ConvBERT - tracing failed on INF2 with neuronx-cc

Error log
``` 03/01/2023 01:28:13 PM ERROR 59059 [Tensorizer]: Transformation error on operator: _dot.1235 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: *************************************************************** 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: An Internal Compiler Error has occurred 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: *************************************************************** 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: Error message: 13824i_0+768i_1+k 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: Error class: AssertionError 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: Error location: Unknown 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: Command line: /home/ubuntu/aws_neuron_venv_pytorch/bin/neuronx-cc compile /tmp/tmptflduq47/model --framework XLA --target trn1 --output /tmp/tmptflduq47/graph.neff 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: Internal details: 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/CommandDriver.py", line 235, in neuronxcc.driver.CommandDriver.CommandDriver.run 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/commands/CompileCommand.py", line 1014, in neuronxcc.driver.commands.CompileCommand.CompileCommand.run 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/commands/CompileCommand.py", line 965, in neuronxcc.driver.commands.CompileCommand.CompileCommand.runPipeline 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/commands/CompileCommand.py", line 990, in neuronxcc.driver.commands.CompileCommand.CompileCommand.runPipeline 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/commands/CompileCommand.py", line 994, in neuronxcc.driver.commands.CompileCommand.CompileCommand.runPipeline 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/Job.py", line 300, in neuronxcc.driver.Job.SingleInputJob.run 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/Job.py", line 326, in neuronxcc.driver.Job.SingleInputJob.runOnState 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/Pipeline.py", line 30, in neuronxcc.driver.Pipeline.Pipeline.runSingleInput 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/Job.py", line 300, in neuronxcc.driver.Job.SingleInputJob.run 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/Job.py", line 326, in neuronxcc.driver.Job.SingleInputJob.runOnState 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/jobs/Frontend.py", line 591, in neuronxcc.driver.jobs.Frontend.Frontend.runSingleInput 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/driver/jobs/Frontend.py", line 387, in neuronxcc.driver.jobs.Frontend.Frontend.runXLAFrontend 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/Frontend.py", line 168, in neuronxcc.starfish.penguin.Frontend.tensorizeXla 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/Frontend.py", line 243, in neuronxcc.starfish.penguin.Frontend.tensorizeXlaImpl 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/Frontend.py", line 244, in neuronxcc.starfish.penguin.Frontend.tensorizeXlaImpl 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/Frontend.py", line 266, in neuronxcc.starfish.penguin.Frontend.tensorizeXlaImpl 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/Compile.py", line 183, in neuronxcc.starfish.penguin.Compile.compile_cu 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/Compile.py", line 185, in neuronxcc.starfish.penguin.Compile.compile_cu 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/Compile.py", line 195, in neuronxcc.starfish.penguin.Compile.compile_cu 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 432, in neuronxcc.starfish.penguin.DotTransform.PassManager.transformFunction 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 442, in neuronxcc.starfish.penguin.DotTransform.PassManager.transformFunction 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 152, in neuronxcc.starfish.penguin.DotTransform.DotTransform.run_or_rollback 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 196, in neuronxcc.starfish.penguin.DotTransform.DotTransform.run_with_exception_handling 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 178, in neuronxcc.starfish.penguin.DotTransform.DotTransform.run_with_exception_handling 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 208, in neuronxcc.starfish.penguin.DotTransform.DotTransform.timed_run_ 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 210, in neuronxcc.starfish.penguin.DotTransform.DotTransform.timed_run_ 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 211, in neuronxcc.starfish.penguin.DotTransform.DotTransform.timed_run_ 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 381, in neuronxcc.starfish.penguin.DotTransform.IterativeTransform.run_ 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 382, in neuronxcc.starfish.penguin.DotTransform.IterativeTransform.run_ 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 396, in neuronxcc.starfish.penguin.DotTransform.IterativeTransform.iterate 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 334, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transformFunction 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 329, in neuronxcc.starfish.penguin.DotTransform.DotTransform.runTransforms 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 318, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transformStmts 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 133, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transform 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 366, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transformBasicBlock 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 369, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transformBasicBlock 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 133, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transform 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 356, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transformStmt 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 133, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transform 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 356, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transformStmt 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 133, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transform 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 356, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transformStmt 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 133, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transform 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 356, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transformStmt 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/DotTransform.py", line 133, in neuronxcc.starfish.penguin.DotTransform.DotTransform.transform 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/targets/tonga/passes/CommuteConcat.py", line 770, in neuronxcc.starfish.penguin.targets.tonga.passes.CommuteConcat.CommuteConcat.transformTensorContractOp 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/targets/tonga/passes/CommuteConcat.py", line 703, in neuronxcc.starfish.penguin.targets.tonga.passes.CommuteConcat.should_rewrite_access_optimize_tc_operand 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/targets/tonga/passes/CommuteConcat.py", line 446, in neuronxcc.starfish.penguin.targets.tonga.passes.CommuteConcat.extract_weight_contract_dim 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: File "neuronxcc/starfish/penguin/ir/AffineExpr.py", line 74, in neuronxcc.starfish.penguin.ir.AffineExpr.AsTrivialExpr 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: Version information: 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: NeuronX Compiler version 2.4.0.21+b7621be18 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: HWM version 2.4.0.1-90172456c 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: NEFF version Dynamic 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: TVM not available 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: NumPy version 1.20.3 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: MXNet not available 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: 03/01/2023 01:28:13 PM ERROR 59059 [neuronx-cc]: Artifacts stored in: /home/ubuntu/optimum-neuron/neuronxcc-n8epgszg ```
  • Issue 2 : XLM and flauBERT have higher than usual(1e-5) difference on INF2 compared to PyTorch on CPU
    • xlm

E AssertionError: xlm, sequence-classification -> The maximum absolute difference between the output of the reference model and the Neuron exported model is not within the set tolerance 0.0001: E - logits: max diff = 0.09034809470176697

* Flaubert

E AssertionError: flaubert, sequence-classification -> The maximum absolute difference between the output of the reference model and the Neuron exported model is not within the set tolerance 0.0001: E - logits: max diff = 0.04413249343633652

[Reproduction]

```python import copy from pathlib import Path

from transformers import AutoModelForSequenceClassification

from optimum.exporters.neuron import export, validate_model_outputs from optimum.exporters.neuron.model_configs import BertNeuronConfig, DistilBertNeuronConfig, FlaubertNeuronConfig, XLMNeuronConfig

model_id = "YituTech/conv-bert-base"

model_id = "hf-internal-testing/tiny-random-flaubert"

model_id = "hf-internal-testing/tiny-random-XLMModel"

model = AutoModelForSequenceClassification.from_pretrained(model_id) reference_model = copy.deepcopy(model) neuron_config = BertNeuronConfig(config=model.config, task="sequence-classification", batch_size=2, sequence_length=18)

neuron_config = FlaubertNeuronConfig(config=model.config, task="sequence-classification", batch_size=2, sequence_length=18)

neuron_config = XLMNeuronConfig(config=model.config, task="sequence-classification", batch_size=2, sequence_length=18)

output_path = Path("model.pt")

export( model=model, config=neuron_config, output=output_path, )

Validation

run = 10 value_failures = [] for _ in range(run): validate_model_outputs( config=neuron_config, reference_model=reference_model, neuron_model_path=output_path, neuron_named_outputs=["logits"], )

```

For flaubert and xlm to get the warning of large difference, need to decrease the ATOL in optimum.exporters.neuron exporter, as I set it quite high for the moment to keep it silent:

c.f. https://github.com/huggingface/optimum-neuron/blob/3a8d6d1771cfda60fb659691a6df8808ee300033/optimum/neuron/exporter/model_configs.py#L45-L46 https://github.com/huggingface/optimum-neuron/blob/3a8d6d1771cfda60fb659691a6df8808ee300033/optimum/neuron/exporter/model_configs.py#L57-L58

Internal Compiler Error when compiling GPT2

opened on 2023-03-17 17:23:16 by gnawpaul

I'm trying to compile a GPT2 model and got an error message. The model compiled successfully in torchscript. I believe that it's rooted in some broadcasting issue.

Here's what I ran: ``` from transformers import GPT2Tokenizer, GPT2Model tokenizer = GPT2Tokenizer.from_pretrained(model_path) tokenizer.pad_token = tokenizer.eos_token model = GPT2Model.from_pretrained(model_path, torchscript=True, return_dict=False) model.eval()

neuroncore_pipeline_cores = 4 sequence_size = 1024

inputs = tokenizer(["This is a test"], return_tensors="pt", max_length=sequence_size, padding='max_length') neuron_pipeline_model = torch.neuron.trace(model, example_inputs = inputs['input_ids'], compiler_args = ['--neuroncore-pipeline-cores', str(neuroncore_pipeline_cores)], ) ```

Error message: ``` INFO:Neuron:There are 2 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/compiler/neuron-cc/neuron-cc-ops/neuron-cc-ops-pytorch.html) INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 2444, fused = 2433, percent fused = 99.55%

WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torch_neuron/ops/aten.py:2277: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where

INFO:Neuron:Compiler args type is value is ['--neuroncore-pipeline-cores', '4'] INFO:Neuron:Compiling function _NeuronGraph$956 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /tmp/tmpi916968s/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpi916968s/graph_def.neff --io-config {"inputs": {"0:0": [[1, 1024], "int64"], "1:0": [[], "int64"], "2:0": [[1, 1024, 1024], "float32"], "3:0": [[1, 1024, 1024], "float32"]}, "outputs": ["aten_view/Reshape:0", "GPT2Block_90/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_90/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_2/transpose:0"]} --neuroncore-pipeline-cores 4 --verbose 35'

... 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: ********* 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: An Internal Compiler Error has occurred 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: ********* 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Error message: Traceback (most recent call last): [bt] (8) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x8edf26) [0x7f1c32443f26] [bt] (7) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x8f2392) [0x7f1c32448392] [bt] (6) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x8e8bd1) [0x7f1c3243ebd1] [bt] (5) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x8e673b) [0x7f1c3243c73b] [bt] (4) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x8ebf11) [0x7f1c32441f11] [bt] (3) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x9afb50) [0x7f1c32505b50] [bt] (2) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0xaa6840) [0x7f1c325fc840] [bt] (1) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0xac629b) [0x7f1c3261c29b] [bt] (0) /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/libtvm.so(+0x5edf75) [0x7f1c32143f75] File "/opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.13.0.0/AL2_x86_64/generic-flavor/src/topi/include/topi/broadcast.h", line 32 TVMError: Check failed: output_shape.size() >= t->shape.size() (0 vs. 1) : Not a broadcast, output dimensionality smaller than input. output: [] vs input: Tensor(shape=[1], op.name=dram://placeholder15) 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Error class: TVMError 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Error location: Unknown 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Command line: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /tmp/tmpi916968s/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpi916968s/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 1024], "int64"], "1:0": [[], "int64"], "2:0": [[1, 1024, 1024], "float32"], "3:0": [[1, 1024, 1024], "float32"]}, "outputs": ["aten_view/Reshape:0", "GPT2Block_90/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_90/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_2/transpose:0"]}' --neuroncore-pipeline-cores 4 --verbose 35 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Internal details: 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/CommandDriver.py", line 224, in neuroncc.driver.CommandDriver.CommandDriver.run 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 576, in neuroncc.driver.commands.CompileCommand.CompileCommand.run 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 554, in neuroncc.driver.commands.CompileCommand.CompileCommand.runPipeline 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 558, in neuroncc.driver.commands.CompileCommand.CompileCommand.runPipeline 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 427, in neuroncc.driver.jobs.Frontend.Frontend.runSingleInput 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 377, in neuroncc.driver.jobs.Frontend.Frontend.runTVMFrontend 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 378, in neuroncc.driver.jobs.Frontend.Frontend.runTVMFrontend 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 382, in neuroncc.driver.jobs.Frontend.Frontend.runTVMFrontend 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/relay/build_module.py", line 731, in build_graph 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: func = pre_partitioning_passes(func, params) 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/relay/build_module.py", line 282, in pre_partitioning_passes 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: func = ir_pass.fold_constant(func) 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/relay/ir_pass.py", line 707, in fold_constant 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: return _ir_pass.FoldConstant(expr) 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/tvm/_ffi/_ctypes/function.py", line 190, in call 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: raise get_last_ffi_error() 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Version information: 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Neuron Compiler version 1.13.5.0+7dcf000a6 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]:
03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: HWM version 1.13.0.0-0 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: NEFF version Dynamic 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: TVM version 1.13.0.0+0 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: NumPy version 1.18.5 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: MXNet not available 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: TF not available 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: 03/17/2023 12:00:36 AM ERROR 12345 [neuron-cc]: Artifacts stored in: /tmp/tmpi916968s

Compiler status ERROR

INFO:Neuron:Compile command returned: 1 WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$956; falling back to native python function call ERROR:Neuron:neuron-cc failed with the following command line call: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /tmp/tmpi916968s/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpi916968s/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 1024], "int64"], "1:0": [[], "int64"], "2:0": [[1, 1024, 1024], "float32"], "3:0": [[1, 1024, 1024], "float32"]}, "outputs": ["aten_view/Reshape:0", "GPT2Block_90/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_90/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_2/transpose:0"]}' --neuroncore-pipeline-cores 4 --verbose 35 Traceback (most recent call last): File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torch_neuron/convert.py", line 392, in op_converter item, inputs, compiler_workdir=sg_workdir, **kwargs) File "/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torch_neuron/decorators.py", line 229, in trace 'neuron-cc failed with the following command line call:\n{}'.format(command)) subprocess.SubprocessError: neuron-cc failed with the following command line call: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /tmp/tmpi916968s/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpi916968s/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 1024], "int64"], "1:0": [[], "int64"], "2:0": [[1, 1024, 1024], "float32"], "3:0": [[1, 1024, 1024], "float32"]}, "outputs": ["aten_view/Reshape:0", "GPT2Block_90/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_90/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_92/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_94/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_96/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_98/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_100/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_102/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_104/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_106/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_108/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_110/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_112/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_114/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_116/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_118/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_120/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_122/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_124/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_126/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_128/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_130/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_132/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_134/GPT2Attention_5/aten_permute_2/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_1/transpose:0", "GPT2Block_136/GPT2Attention_5/aten_permute_2/transpose:0"]}' --neuroncore-pipeline-cores 4 --verbose 35 INFO:Neuron:Number of arithmetic operators (post-compilation) before = 2444, compiled = 0, percent compiled = 0.0% INFO:Neuron:The neuron partitioner created 1 sub-graphs INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0% INFO:Neuron:Compiled these operators (and operator counts) to Neuron: INFO:Neuron:Not compiled operators (and operator counts) to Neuron: INFO:Neuron: => aten::Int: 557 [supported] INFO:Neuron: => aten::ScalarImplicit: 25 [supported] INFO:Neuron: => aten::add: 98 [supported] INFO:Neuron: => aten::addmm: 96 [supported] INFO:Neuron: => aten::arange: 1 [supported] INFO:Neuron: => aten::contiguous: 24 [supported] INFO:Neuron: => aten::div: 24 [supported] INFO:Neuron: => aten::dropout: 73 [supported] INFO:Neuron: => aten::embedding: 2 [not supported] INFO:Neuron: => aten::full: 48 [supported] INFO:Neuron: => aten::layer_norm: 49 [supported] INFO:Neuron: => aten::matmul: 48 [supported] INFO:Neuron: => aten::mul: 96 [supported] INFO:Neuron: => aten::permute: 96 [supported] INFO:Neuron: => aten::pow: 48 [supported] INFO:Neuron: => aten::size: 555 [supported] INFO:Neuron: => aten::slice: 96 [supported] INFO:Neuron: => aten::softmax: 24 [supported] INFO:Neuron: => aten::split: 24 [supported] INFO:Neuron: => aten::sub: 24 [supported] INFO:Neuron: => aten::tanh: 24 [supported] INFO:Neuron: => aten::to: 72 [supported] INFO:Neuron: => aten::transpose: 24 [supported] INFO:Neuron: => aten::unsqueeze: 1 [supported] INFO:Neuron: => aten::view: 291 [supported] INFO:Neuron: => aten::where: 24 [not supported]


RuntimeError Traceback (most recent call last) in

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torch_neuron/convert.py in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, single_fusion_ratio_threshold, _neuron_trace, compiler_args, optimizations, verbose, kwargs) 193 logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line())) 194 neuron_graph = cu.compile_fused_operators(neuron_graph, compile_kwargs) --> 195 cu.stats_post_compiler(neuron_graph) 196 197 # Wrap the compiled version of the model in a script module. Note that this is

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torch_neuron/convert.py in stats_post_compiler(self, neuron_graph) 501 if succesful_compilations == 0 and not self.allow_no_ops_on_neuron: 502 raise RuntimeError( --> 503 "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") 504 505 if percent_operations_compiled < 50.0:

RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! ```

Based on the broadcasting error message and some experiments, I narrowed it down to several lines of code in transformers.gpt2.modeling_gpt2.GPT2Attention._attn where we try to divide by some broadcasted tensor: if self.scale_attn_weights: attn_weights = attn_weights / torch.full([], value.size(-1) ** 0.5, dtype=attn_weights.dtype, device=attn_weights.device

Changing the divisor into a scalar or vector using torch.full_like seemed to work. A minimal reproducible example in my environment: def test_divide(inputs): inputs = inputs / torch.full( [], 2, dtype=inputs.dtype, device=inputs.device ) return inputs neuron_pipeline_model = torch.neuron.trace(test_divide, example_inputs = torch.ones(2,2,512), verbose="debug", ) Tracing with torchscript worked fine: traced_model = torch.jit.trace(test_divide, torch.ones(2,2,512))

My environment is a sagemaker notebook instance: aws-neuron-dkms.noarch 2.2.8.0-dkms @neuron aws-neuron-runtime-base.x86_64 1.6.21.0-1 @neuron aws-neuron-tools.x86_64 2.0.790.0-1 @neuron tensorflow-model-server-neuronx.x86_64 2.8.0.2.5.6.0-0 @neuron

torch 1.12.1 torch-neuron 1.12.1.2.5.8.0 torcheia 1.0.0 torchvision 0.13.1 neuron-cc 1.13.5.0+7dcf000a6

Inferentia outperforms GPU on ResNet while showing worse performance on MobileNet

opened on 2023-03-17 05:07:58 by pEAceATLast53

I compared the throughput of Inferentia(inf1.xlarge) and NVIDIA T4 GPU(g4dn.xlarge) with MobileNetV2, ResNet50, ResNet101 and here are the results.

  • Inferentia throughput (inferences / sec)
  • MobileNetV2 : 557
  • ResNet50 : 555
  • ResNet101 : 481

  • T4 GPU throughput (inferences /sec)

  • MobileNetV2 : 718
  • ResNet50 : 314
  • ResNet101 : 175

We tested GPU on TensorFlow 2.3 models, and compiled the same models with tensorflow-neuron 2.3.4 to run on Inferentia. We compiled the models with Neuron batch size 4 and fast-math=none. When running the tests, each model occupied the entire device - for Inferentia a single model occupied all 4 NeuronCores. We spawned a separate process for each NeuronCore, resulting in 4 processes.

We found that while Inferentia showed similar performance for ResNet and MobileNetV2, the performance of GPU varied greatly depending on the model, ranging from 175 inferences per second for ResNet101 to 718 inferences per second for MobileNetV2.

What could have caused Inferentia to outperform GPU by a large margin on ResNet, while underperforming on MobileNetV2?

Update torch module version to 1.13.1

opened on 2023-03-02 17:23:28 by YuryShchanouskiTR

Could you please update torch module version to 1.13.1 for the next release in order to close critical security flaw?

CVE-2022-45907 | CWE-77 Arbitrary Code Execution: torch is vulnerable to arbitrary code execution. The vulnerability exists in annotations.py because the existing eval not properly validated which allows an attacker to take over an existing account and execute malicious code into the system.

P.S. Please suggest any timelines for that.

Releases

Neuron SDK Release - February 24, 2023 2023-02-24 23:49:30

What’s New

This release adds support for EC2 Inf2 instances, introduces initial inference support with TensorFlow 2.x Neuron (tensorflow-neuronx) on Trn1 and Inf2, and introduces minor enhancements and bug fixes.

This release introduces the following:

What’s New | Details -- | -- Support for EC2 Inf2 instances | Inference support for Inf2 instances in PyTorch Neuron (torch-neuronx) Inference support for Inf2 instances in TensorFlow 2.x Neuron (tensorflow-neuronx) Overall documentation update to include Inf2 instances TensorFlow 2.x Neuron (tensorflow-neuronx) support | This releases introduces initial inference support with TensorFlow 2.x Neuron (tensorflow-neuronx) on Trn1 and Inf2 New Neuron GitHub samples | New sample scripts for deploying LLM models with transformer-neuronx under aws-neuron-samples GitHub repository. New sample scripts for deploying models with torch-neuronx under aws-neuron-samples repository GitHub repository. Minor enhancements and bug fixes. | See Neuron Components Release Notes Release included packages | see Release Content

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

What’s New

Neuron SDK Release - February 08, 2023 2023-02-09 03:13:04

This release introduces new capabilities and libraries, as well as features and tools that improves usability. This release introduces the following:

What’s New | Details -- | -- PyTorch 1.13 | Support of PyTorch 1.13 version for PyTorch Neuron (torch-neuronx). For resources see PyTorch Neuron PyTorch DistributedDataParallel (DDP) API | Support of PyTorch DistributedDataParallel (DDP) API in PyTorch Neuron (torch-neuronx). For resources how to use PyTorch DDP API with Neuron, please check Distributed Data Parallel Training Tutorial. Inference support in torch-neuronx | For more details please visit pytorch-neuronx-main` page. You can also try Neuron Inference samples https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx in the aws-neuron-samples GitHub repo. Neuron Custom C++ Operators[Experimental] | Initial support for Neuron Custom C++ Operators [Experimental] , with Neuron Custom C++ Operators (“CustomOps”) you can now write CustomOps that run on NeuronCore-v2 chips. For more resources please check Neuron Custom C++ Operators [Experimental] section. transformers-neuronx [Experimental] | transformers-neuronx is a new library enabling LLM model inference. It contains models that are checkpoint-compatible with HuggingFace Transformers, and currently supports Transformer Decoder models like GPT2, GPT-J and OPT. Please check aws-neuron-samples repository Neuron sysfs filesystem | Neuron sysfs filesystem exposes Neuron Devices under /sys/devices/virtual/neuron_device providing visibility to Neuron Driver and Runtime at the system level. By performing several simple CLIs such as reading or writing to a sysfs file, you can get information such as Neuron Runtime status, memory usage, Driver info etc. For resources about Neuron sysfs filesystem visit Neuron Sysfs User Guide. TFLOPS support in Neuron System Tools | Neuron System Tools now also report model actual TFLOPs rate in both neuron-monitor and neuron-top. More details can be found in the Neuron Tools documentation. New sample scripts for training | This release adds multiple new sample scripts for training models with torch-neuronx, Please check aws-neuron-samples repository New sample scripts for inference | This release adds multiple new sample scripts for deploying models with torch-neuronx, Please check aws-neuron-samples repository Neuron GitHub samples repository for Amazon EKS | A new AWS Neuron GitHub samples repository for Amazon EKS, Please check aws-neuron-samples repository

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

This release introduces new capabilities and libraries, as well as features and tools that improves usability. This release introduces the following:

Neuron SDK Release - December 12, 2022 2022-12-13 21:58:55

This release introduces the support of PyTorch 1.12 version, and introduces PyTorch Neuron (torch-neuronx) profiling through Neuron Plugin for TensorBoard. Pytorch Neuron (torch-neuronx) users can now profile their models through the following TensorBoard views:

Operator Framework View

Operator HLO View

Operator Trace View

This release introduces the support of LAMB optimizer for FP32 mode, and adds support for capturing snapshots of inputs, outputs and graph HLO for debugging.

In addition, this release introduces the support of new operators and resolves issues that improve stability for Trn1 customers.

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

Neuron SDK Release - November 23, 2022 2022-11-23 22:22:55

This is a major release which introduces new features and resolves issues that improve stability for Inf1 customers.

Component | New in this release -- | -- PyTorch Neuron (torch-neuron) | PyTorch 1.12 support Python 3.8 support LSTM support on Inf1 R-CNN support on Inf1 Support for new API for core placement Support for improved logging Improved torch_neuron.trace() performance when using large graphs Reduced host memory usage of loaded models in libtorchneuron.so Additional operators support TensorFlow Neuron (tensorflow-neuron) | tf-neuron-auto-multicore tool to enable automatic data parallel on multiple NeuronCores. Experimental support for tracing models larger than 2GB using extract-weights flag (TF2.x only), see TensorFlow 2.x (tensorflow-neuron) Tracing API tfn.auto_multicore Python API to enable automatic data parallel (TF2.x only)

This Neuron release is the last release that will include torch-neuron versions 1.7 and 1.8, and that will include tensorflow-neuron versions 2.5 and 2.6.

In addition, this release introduces changes to the Neuron packaging and installation instructions for Inf1 customers, see Introducing Neuron packaging and installation changes for Inf1 customers for more information.

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

This is a major release which introduces new features and resolves issues that improve stability for Inf1 customers.

Neuron SDK Release - October 27, 2022 2022-10-28 01:57:58

This release introduces new features and resolves issues that improve stability. The release introduces "memory utilization breakdown" feature in both :ref:Neuron Monitor <neuron-monitor-ug> and :ref:Neuron Top <neuron-top-ug> system tools. The release introduces support for "NeuronCore Based Sheduling" capability to the Neuron Kubernetes Scheduler and introduces new operators support in :ref:Neuron Compiler <neuronx-cc> and :ref:PyTorch Neuron <torch-neuronx-rn>. This release introduces also additional eight (8) samples of models' fine tuning using PyTorch Neuron. The new samples can be found in the AWS Neuron Samples GitHub repository.

Neuron SDK Release - October 10, 2022 2022-10-10 19:24:27

This Neuron 2.x release extends Neuron 1.x and adds support for the new AWS Trainium powered Amazon EC2 Trn1 instances. With this release, you can now run deep learning training workloads on Trn1 instances to save training costs by up to 50% over equivalent GPU-based EC2 instances, while getting the highest training performance in AWS cloud for popular NLP models.

New features and capabilities

Supported instances: Trn1

Supported Frameworks: PyTorch Neuron (torch-neuronx)

Supported Data-types

    FP32, BF16

Supported Rounding Modes

    Stochastic Rounding (SR)

    Round Nearest ties to Even (RNE)

Supported Automatic Casting Methods

    Neuron automatic casting of FP32 tensors / weights / operations to BF16 - Default mode

    PyTorch automatic casting

    Full BF16 automatic casting (via XLA_USE_BF16=1 environment variable)

More Info