openpilot_comma

openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for over 200 supported car makes and models.
10 KiB

Raw Blame History

kernel driver: AppleH11ANEInterface
  requires entitlement: com.apple.ane.iokit-user-access
	compiler is run in ANE_ProgramCreate_gated

2 helper processes:
  /usr/libexec/aned
  ANECompilerService

Espresso:
	Contains ANECompilerEngine

AppleNeuralEngine: Objective-C interface called by Espresso
  ANEServices: communication with the device
  ANECompiler: compile plist into hwx file
		com.apple.ANECompilerService.allow in AppleNeuralEngine?
		Called from ANECompilerService.xpc in AppleNeuralEngine.framework

== Model Flow ==

    Keras/ONNX model
          |
          |   1_build
          | (coremltools, open source)
          v
      CoreML model
          |
          |   TODO: automate this
          |   Grabbed plist from lldbing ANECompilerService during 1_build
          | (Espresso)
          v
       net.plist
          |
          |   2_compile
          | (AppleNeuralEngine, ANECompiler)
          v
       model.hwx
          |
          |   3_run
          | (AppleNeuralEngine, ANEServices, AppleH11ANEInterface)
          v
 <run on neural engine>

TODO: Write a nice plist grabber
DONE: Write a call to the compiler with plist+weights
DONE: Write an hwx runner

== Tracing the Compiler ==

ANECCompileProcedure
  ZinAneCreateIr
    ZinParseNeuronUnit
  ZinAneCoreCompile
    ZinAneCodeGeneration
      ZinIrCodegenHandleKernels
      ZinIrTargetH13::CodegenTds
        ZinIrCacheHintTable
        ZinIrCodegenHandleTds_v7
          ZinIrCodegenHandleTdsMakeList<7u>
            ZinAneInstruction
            ZinAneTd<7u>::HandleEngineLayer
              ZinAneInstruction::HandleTdHeader
              HandleNELayer<7u>
                ZinAneInstruction::HandleCommonConfig
                  ZinAneInstruction::HandleCommonConfigCommonOpcodes
        ZinIrCodegenHandleTds<7u>
          0x1bb93ae00 <-- this is the store of the first byte in the hwx
          CalculateSizeInBytesFromRegCount (x*4+4)
            0xf   @ 0x128-0x168  (base 0x1003047b0)
            0x1b  @ 0x16c-0x1dc
            0x11  @ 0x1e0-0x228
            0x3   @ 0x22c-0x23c
            0x4   @ 0x240-0x254
            0x6   @ 0x258-0x274(end)
          AddReloc (this is gold! x4 goes in the hwx)
          ZinAneTd<7u>::HandleEngineLayer

rbreak ^ZinAneInstruction*

weeee ZinIrRegBitPrintOutDebug_7u_
print (void)debugregs(0, 0x0000000100211030+8, 3)

== min.plist == 

Types: GOC, Conv, Broadcast, ScaledElementWise, Reshape, InputView, Neuron, Concat


ops have length 0x300, seems like one basic op repeated

header 0x0-0x1c

u32 0x1c = next op offset
u16 0x20 = output address?

== section break 0x2c (weights) ==
reloc 0x2c-0x74 = K2DBE6976FEB616E6867A2E3853FC37D0F101C4C51BA4A80C103359643338C0C1_ne_0
                  K2DBE6976FEB616E6867A2E3853FC37D0F101C4C51BA4A80C103359643338C0C1_ne_1

16 output channel parallel:
u32[16] 0x34-0x74 = 0x80 | 1 if used
u32[16] 0x74-0xB4 = <channel data offset>
u32[16] 0xB4-0xF4 = <channel data length>

== section break 0x128 (conv) ==
u16 0x128 = InputWidth
u16 0x12A = InputHeight
u16 0x12C = InputDepth

u32 0x130 = (OutputType * 0x10) | InputType

u32 0x134 = InputChannels
u32 0x138 = OutputChannels

u16 0x13C = OutputWidth
u16 0x13E = OutputHeight
u16 0x140 = OutputDepth

u16 0x144 = 0xa000 | (KernelHeight * 0x20) | KernelWidth
u16 0x146 = 0x5000 | (PadTop * 0x40) | (PadLeft * 2)

u16 0x14C = BatchSize
u32 0x150 = OutputHeight?

== section break 0x16c (input) ==
reloc 0x16c-0x174 = image

u32 0x178 = InputRowStride
u32 0x17C = InputPlaneStride
u32 0x180 = InputDepthStride
u32 0x184 = InputBatchStride

u8 0x1A7 = InputInterleave

== section break 0x1e0 ==
u8 0x1E5 = InputInterleave

u32 0x1F4 = InputChannels * 0x10
u32 0x1F8 = InputDepth * InputChannels * 0x10

u8 0x211 = OutputInterleave

u32 0x220 = OutputChannels * 0x10
u32 0x224 = OutputDepth * OutputChannels * 0x10

== section break 0x22c (scaling) ==
u16 0x230 = BiasScalar
u16 0x232 = ScaleScalar

== section break 0x240 ==
u8 0x240 = 0x80 | KernelType
u8 0x241 = 4 * hasbatch
u16 0x246 = 0x10 | 2 * neuron?

== section break 0x258 (output) ==
reloc 0x258-0x25c = probs@output/src

u32 0x260 = OutputRowStride
u32 0x264 = OutputPlaneStride
u32 0x268 = OutputDepthStride
u32 0x26C = OutputBatchStride

u8 0x273 = OutputInterleave

== Zin Constants ==

kZinIrOpCodeConv = 0?
kZinIrOpCodePool = 1
kZinIrOpCodeElementWiseOp = 6
kZinIrOpCodeConcat = 8
kZinIrOpCodeFlattenComposite
kZinIrOpCodeNEConvOp
kZinIrOpCodeTranspose

 0: CONV
 1: POOL
 2: SCALE_BIAS
 3: TERNARY_DYNAMIC_GOC
 4: BINARY_DYNAMIC_GOC
 5: ACTIVATION
 6: EW
 7: SCALED_EW
 8: CONCAT
 9: SPLIT
10: COPY
11: FLATTEN
12: UNFLATTEN
13: CROSS_CORRELATION
14: KERNEL_RASTERIZER
15: ARG_MIN_MAX
16: MATRIX_MULT
17: BROADCAST
18: FLATTEN_COMPOSITE
19: UNFLATTEN_COMPOSITE
20: KERNEL_RASTERIZER_COMPOSITE
21: CROSS_CORRELATION_COMPOSITE
22: LIVE_IN
23: CONST_IN
24: LIVE_OUT
25: REDUCTION
26: ALIAS
27: Typecast
28: RESHAPE
29: VIEW
30: TRANSPOSE
31: SPACE_TO_BATCH
32: BATCH_TO_SPACE
33: SOFTMAX
34: INSTANCE_NORM
35: L2_NORM
36: MINMAX_NORM
37: LRN
38: COST_VOLUME
39: PIXEL_SHUFFLE
40: FPS
41: RS
42: PEFUSED_ELEMENTWISE
43: PEFUSED_POOL
44: PEFUSED_GOC
45: NEFUSED_CONV
46: NEFUSED_POOL
47: NEFUSED_EW
48: NEFUSED_BYPASS

# guessing from the hwx
kZinTensorFormatUInt8 = 0
kZinTensorFormatInt8 = 1
kZinTensorFormatFloat16 = 2
kZinTensorFormatInvalid

Zin (plist format) ---(ZinAneCoreCompile)---> Mir (hwx format)?
  ZinAneCodeGeneration?

ZinIrStatus GetKernelFormat<6u>(ZinKernelFormat param_1,ane_ne_kernel_cfg_kernel_fmt *param_2)
  List of allowed numbers

NeuronTypes (changes the LUT):
  Tanh
  Log2
  Exp2
  Sign = ZinMirActivationV7::GetSignLut
  ...many more in ANECompiler

Investigate:
  ZinMirActivationV7::PrintLut(ZinMirActivationV7 *this,ane_nonlinear_lut_v7up_t *param_1

 0: NONE
 1: RELU
 2: SIGMOID
 3: SIGMOID_HIGH_PRECISION
 4: TANH
 5: CLAMPED_RELU
 6: PRELU
 7: DIRAC
 8: INT
 9: FRAC
10: SQRT
11: RSQRT
12: INV
13: SQR
14: LOG2
15: EXP2
16: ELU
17: SIGN
18: EQUAL_ZERO
19: NON_ZERO
20: LESS_THAN_ZERO
21: LESS_EQUAL_ZERO
22: GREATER_EQUAL_ZERO
23: GREATER_THAN_ZERO
24: CUSTOM_LUT
25: C_DIM_CONCAT
26: C_DIM_STRIDED_CONCAT
27: H_DIM_CONCAT
28: W_DIM_CONCAT
29: D_DIM_CONCAT
30: N_DIM_CONCAT
31: H_DIM_STRIDED_CONCAT

CacheHint
0: ALLOC
1: NOALLOC
2: DROP
3: DEPRI

conv kinds
0: regular
1: channelwise
2: grouped

== plist exploration ==

Float16 -> UInt8 for output works, Float32 doesn't
Same for input
All weights must be float

Index 0: model.espresso.weights @ 192 is weights
Index 1: net.additional.weights @ 0 is bias

Float16 -> Float32 for bias works

It's possible the compiler is Float32 -> Float16 converting, and the engine only supports Float16 + UInt8

== call to the compiler (in dmesg!) ==

[54476.282258]: H11ANEIn: ANE_ProgramCreate_gated:, ZinComputeProgramMake, get Mcache size: 0x0
[54476.282259]: H11ANEIn: ANE_ProgramCreate_gated:,Program Identifier:ANEC v1
zin_ane_compiler v4.2.1
	-t h13
	--fdram-allocator=ffreuse
	--fdram-tensor-priority=sizethenliverange
	--fl2-allocator=ffreuse
	--fl3-allocator=ffreuse
	--fl2-cache-mode=resident
	--fsignature=ident
	--memcache-strategy=
[54476.282262]: 	--memcache-size=4194304
	--fspatial-split=disabled
	--fkernel-rewind=enabled
	--Wl-undefined=fvmlib
	-i /Library/Caches/com.apple.aned/tmp/Python/DB7E897E7F4D5D27501A998428B6D3863AFD96CEA82DAF2207A75394E6BAC44C/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/net.plist
	-o /Library/Caches/com.apple.aned/20A2411/Python/C9981871BC59572E74AFA3014B183EA37567EE9A2A08328446CE4A2B754E109D/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/model.hwx.tmp

== ANECCompile (in ANECompiler framework) ==
  ANECCompile(__CFDictionary *param_1, __CFDictionary *param_2, unsigned long param_3)

param_1:
{   
    InputNetworks =     (
                {
            NetworkPlistName = "net.plist";
            NetworkPlistPath = "/Library/Caches/com.apple.aned/tmp/run/A2ACB9D5AA31B301563A4F62885BA379E62B0E1240E95C6902A93900FE0A9B54/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/";
        }
    );
    OutputFileName = "model.hwx.tmp";
    OutputFilePath = "/Library/Caches/com.apple.aned/20A2411/run/E68910CD1994681121EEDAFAE1BC524AA8E84CF80C42AFC0C7DE2C082C58BDFD/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/";
}

param_2:
{
    TargetArchitecture = h13;
}

== Backtrace of device open ==

  * frame #0: 0x00000001a68fac54 ANEServices`H11ANEDeviceOpen
    frame #1: 0x00000001a78405b8 AppleNeuralEngine`__29-[_ANEDeviceController start]_block_invoke + 436
    frame #2: 0x0000000193c84420 libdispatch.dylib`_dispatch_client_callout + 20
    frame #3: 0x0000000193c92a98 libdispatch.dylib`_dispatch_lane_barrier_sync_invoke_and_complete + 60
    frame #4: 0x00000001a78403e8 AppleNeuralEngine`-[_ANEDeviceController start] + 136
    ...
    frame #23: 0x00000001a64a4f38 Espresso`Espresso::ANERuntimeEngine::compiler::build_segment(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, Espresso::net_compiler_segment_based::segment_t const&) + 2080
    ...
    frame #31: 0x000000019ab6099c CoreML`-[MLNeuralNetworkEngine rebuildPlan:] + 1640

== Backtrace of run? ==

  * frame #0: 0x00000001a68f9108 ANEServices`H11ANEProgramProcessRequestDirect
    frame #1: 0x00000001a7839694 AppleNeuralEngine`-[_ANEProgramForEvaluation processRequest:qos:qIndex:modelStringID:options:error:] + 1904
    frame #2: 0x00000001a7843ba4 AppleNeuralEngine`-[_ANEClient doEvaluateDirectWithModel:options:request:qos:error:] + 1236
    frame #3: 0x00000001a7842034 AppleNeuralEngine`-[_ANEClient evaluateWithModel:options:request:qos:error:] + 104
    frame #4: 0x00000001a64a2988 Espresso`Espresso::ANERuntimeEngine::compiler::__forward_segment(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, Espresso::net_compiler_segment_based::segment_t const&) + 2008
    frame #5: 0x00000001a6414548 Espresso`Espresso::net_compiler_segment_based::__forward(std::__1::shared_ptr<Espresso::abstract_batch> const&) + 992
    frame #6: 0x00000001a67e2e3c Espresso`EspressoLight::espresso_plan::dispatch_task_on_compute_batch(std::__1::shared_ptr<Espresso::abstract_batch> const&, std::__1::shared_ptr<EspressoLight::plan_task_t> const&) + 612
    frame #7: 0x00000001a67ebab0 Espresso`EspressoLight::espresso_plan::execute_sync() + 356
    frame #8: 0x00000001a67f26fc Espresso`espresso_plan_execute_sync + 120
    frame #9: 0x000000019ab674b8 CoreML`-[MLNeuralNetworkEngine executePlan:error:] + 136
    frame #10: 0x000000019ab6799c CoreML`-[MLNeuralNetworkEngine evaluateInputs:bufferIndex:options:error:] + 368
10 KiB Raw Blame History

10 KiB

Raw Blame History