openpilot_comma/selfdrive/modeld/runners/ort_helpers.py

import onnx
import onnxruntime as ort
import numpy as np
import itertools

ORT_TYPES_TO_NP_TYPES = {'tensor(float16)': np.float16, 'tensor(float)': np.float32, 'tensor(uint8)': np.uint8}

def attributeproto_fp16_to_fp32(attr):
  float32_list = np.frombuffer(attr.raw_data, dtype=np.float16)
  attr.data_type = 1
  attr.raw_data = float32_list.astype(np.float32).tobytes()

def convert_fp16_to_fp32(model):
  for i in model.graph.initializer:
    if i.data_type == 10:
      attributeproto_fp16_to_fp32(i)
  for i in itertools.chain(model.graph.input, model.graph.output):
    if i.type.tensor_type.elem_type == 10:
      i.type.tensor_type.elem_type = 1
  for i in model.graph.node:
    if i.op_type == 'Cast' and i.attribute[0].i == 10:
      i.attribute[0].i = 1
    for a in i.attribute:
      if hasattr(a, 't'):
        if a.t.data_type == 10:
          attributeproto_fp16_to_fp32(a.t)
  return model.SerializeToString()


def make_onnx_cpu_runner(model_path):
  options = ort.SessionOptions()
  options.intra_op_num_threads = 4
  options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
  options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
  model_data = convert_fp16_to_fp32(onnx.load(model_path))
  return ort.InferenceSession(model_data,  options, providers=['CPUExecutionProvider'])
Tinygrad runner (#34171) * squash * bump tg * bump tg * debump tinygrad * bump tinygrad * bump tg * Skip init iteration * fixes * cleanups * skip first test sample * typos * linter unhappy * update cpu usage * OPENCL just zeros for now * imports * Try printing * Runs again, but slower * unused import * Allow more buffer with tg and all on gpu * bump tinygrad * seems ok * stricter timings for driving looser for dm * try llvm * check nvidia * More timeout for now * make test pass * Revert "try llvm" This reverts commit ef136e478320101fea262bae3579e558da991902. * small fixes * whitespace * revert test timeout * No model runners * Always CPU always fast * No onnx runtime GPU * more cores * cleanup * Is this faster * Is this faster * at least runs * FP32 is faster than 16 * fix deps * whitespace * comment --------- Co-authored-by: Adeeb Shihadeh <adeebshihadeh@gmail.com> 11 months ago			`import onnx`
			`import onnxruntime as ort`
			`import numpy as np`
			`import itertools`

			`ORT_TYPES_TO_NP_TYPES = {'tensor(float16)': np.float16, 'tensor(float)': np.float32, 'tensor(uint8)': np.uint8}`

			`def attributeproto_fp16_to_fp32(attr):`
			`float32_list = np.frombuffer(attr.raw_data, dtype=np.float16)`
			`attr.data_type = 1`
			`attr.raw_data = float32_list.astype(np.float32).tobytes()`

load model before calling convert_fp16_to_fp32 11 months ago			`def convert_fp16_to_fp32(model):`
Tinygrad runner (#34171) * squash * bump tg * bump tg * debump tinygrad * bump tinygrad * bump tg * Skip init iteration * fixes * cleanups * skip first test sample * typos * linter unhappy * update cpu usage * OPENCL just zeros for now * imports * Try printing * Runs again, but slower * unused import * Allow more buffer with tg and all on gpu * bump tinygrad * seems ok * stricter timings for driving looser for dm * try llvm * check nvidia * More timeout for now * make test pass * Revert "try llvm" This reverts commit ef136e478320101fea262bae3579e558da991902. * small fixes * whitespace * revert test timeout * No model runners * Always CPU always fast * No onnx runtime GPU * more cores * cleanup * Is this faster * Is this faster * at least runs * FP32 is faster than 16 * fix deps * whitespace * comment --------- Co-authored-by: Adeeb Shihadeh <adeebshihadeh@gmail.com> 11 months ago			`for i in model.graph.initializer:`
			`if i.data_type == 10:`
			`attributeproto_fp16_to_fp32(i)`
			`for i in itertools.chain(model.graph.input, model.graph.output):`
			`if i.type.tensor_type.elem_type == 10:`
			`i.type.tensor_type.elem_type = 1`
			`for i in model.graph.node:`
			`if i.op_type == 'Cast' and i.attribute[0].i == 10:`
			`i.attribute[0].i = 1`
			`for a in i.attribute:`
			`if hasattr(a, 't'):`
			`if a.t.data_type == 10:`
			`attributeproto_fp16_to_fp32(a.t)`
			`return model.SerializeToString()`


			`def make_onnx_cpu_runner(model_path):`
			`options = ort.SessionOptions()`
			`options.intra_op_num_threads = 4`
			`options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL`
			`options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL`
load model before calling convert_fp16_to_fp32 11 months ago			`model_data = convert_fp16_to_fp32(onnx.load(model_path))`
Tinygrad runner (#34171) * squash * bump tg * bump tg * debump tinygrad * bump tinygrad * bump tg * Skip init iteration * fixes * cleanups * skip first test sample * typos * linter unhappy * update cpu usage * OPENCL just zeros for now * imports * Try printing * Runs again, but slower * unused import * Allow more buffer with tg and all on gpu * bump tinygrad * seems ok * stricter timings for driving looser for dm * try llvm * check nvidia * More timeout for now * make test pass * Revert "try llvm" This reverts commit ef136e478320101fea262bae3579e558da991902. * small fixes * whitespace * revert test timeout * No model runners * Always CPU always fast * No onnx runtime GPU * more cores * cleanup * Is this faster * Is this faster * at least runs * FP32 is faster than 16 * fix deps * whitespace * comment --------- Co-authored-by: Adeeb Shihadeh <adeebshihadeh@gmail.com> 11 months ago			`return ort.InferenceSession(model_data, options, providers=['CPUExecutionProvider'])`