* mse err from 0.028070712 -> 5.8073703e-05
* build with weights fixup
* need thneed lib also
* don't break for binaries
* static analysis says i need init
* check the bias
* load_dlc_weights
* nicer scons
* tested scons
* fix static
* pylint issue
* new ref
* a few more asserts
Co-authored-by: Harald Schafer <harald.the.engineer@gmail.com>
* one conv with defines
* add conv
* building works on C3
* this is num_outputs too, process replay is so useful
Co-authored-by: Comma Device <device@comma.ai>
* Added wide cam vipc client and bigmodel transform logic
* Added wide_frame to ModelState, should still work normally
* Refactored image input into addImage method, should still work normally
* Updated thneed/compile.cc
* Bigmodel, untested: 44f83118-b375-4d4c-ae12-2017124f0cf4/200
* Have to initialize extra buffer in SNPEModel
* Default paramater value in the wrong place I think
* Move USE_EXTRA to SConscript
* New model: 6c34d59a-acc3-4877-84bd-904c10745ba6/250
* move use extra check to runtime, not on C2
* this is always true
* more C2 checks
* log if frames are out of sync
* more logging on no frame
* store in pointer
* print sof
* add sync logic
* log based on sof difference as well
* keep both models
* less assumptions
* define above thneed
* typo
* simplify
* no need for second client is main is already wide
* more comments update
* no optional reference
* more logging to debug lags
* add to release files
* both defines
* New model: 6831a77f-2574-4bfb-8077-79b0972a2771/950
* Path offset no longer relevant
* Remove duplicate execute
* Moved bigmodel back to big_supercombo.dlc
* add wide vipc stream
* Tici must be tici
* Needs state too
* add wide cam support to model replay
* handle syncing better
* ugh, c2
* print that
* handle ecam lag
* skip first one
* so close
* update refs
Co-authored-by: mitchellgoffpc <mitchellgoffpc@gmail.com>
Co-authored-by: Harald Schafer <harald.the.engineer@gmail.com>
Co-authored-by: Adeeb Shihadeh <adeebshihadeh@gmail.com>
Co-authored-by: Comma Device <device@comma.ai>
* add thneed optimizer
* local work group opt
* kernels and final mods
* release files
* build system touchups
* fix kernel path, rand inputs for self test
* broken since extra is gone
* update model replay ref
Co-authored-by: Comma Device <device@comma.ai>
* completely untested
* it builds now
* bug fixes, save 1ms
* using a kernel to copy works
* more sane API to loadyuv
Co-authored-by: Comma Device <device@comma.ai>
* use cstring instead of string.h
* use cstdio instead of stdio.h
* remove inttypes.h
* use cstdlib instead of stdlib.h
* use cstdint instead of stdint.h
* #include <cstddef>
* cstdlib
* use cmath
* remove stddef.h
* use cassert
* use csignal
* use ctime
* use cerror
* rebase master
* start thneed load/save
* compiling
* fix loading
* build thneed model in scons
* don't hardcode /data/openpilot
* release files
* those too
* support for loading/saving binary kernels
* save binaries out of json band
* make binary a command line flag to the compiler
* need include assert
* fix shadowed common in SConscript
* cleanup run.h
* hmm, the recurrent buffer wasn't 0ed
* ugh, unique ptr
* remove power constraint, refactor record
* Revert "remove power constraint, refactor record"
This reverts commit bb6fa52db6df59cd9d6420a6f630430e35af8a5e.
* print on thneed stop
* fingers crossed for this one
* recorded
* just curious
* okay okay, pass tests?
* cleanups
* refactor wait
Co-authored-by: Comma Device <device@comma.ai>
Co-authored-by: Adeeb Shihadeh <adeebshihadeh@gmail.com>
* enable Wunused, first pass
* unused stuff in snpe model
* these are used on phone
* handle sigint and sigterm in modeld
* fix phone build
* camera qcom
* QCOM build works
* delete unused camerad vars
Co-authored-by: Comma Device <device@comma.ai>
* remove the clCreateProgramWithSource interceptor
* that's old code, thneed is better
* label them thneed_, we shouldn't need to touch CL for anything not SNPE related
* bigmodel
* more debug print
* debugging bigmodel
* remove the tanh, debugging
* print images/buffers
* disassemble the command queues
* decompiler
* dump the shaders
* full disasm
* support patching kernel and fixing convolution_horizontal_reduced_reads_1x1
* microbenchmark
* 42 GFLOPS, 1 GB/s
* gemm benchmark
* 75 GFLOPS vs 42 GFLOPS
* 115 GFLOPS
* oops, never mind
* gemm image is slow
* this is pretty hopeless
* gemm image gets 62 GFLOPS
* this is addictive and still a waste of time
* cleanup cleanup
* that hook was dumb
* tabbing
* more tabbing
Co-authored-by: Comma Device <device@comma.ai>
* thneed runs the model
* thneed is doing the hooking
* set kernel args
* thneeding the bufferS
* print the images well
* thneeds with better buffers
* includes
* disasm adreno
* parse packets
* disasm works
* disasm better
* more thneeding
* much thneeding
* much more thneeding
* thneed works i think
* thneed is patient
* thneed works
* 7.7%
* gpuobj sync
* yay, it mallocs now
* cleaning it up, Thneed
* sync objs and set power
* thneed needs inputs and outputs
* thneed in modeld
* special modeld runs
* can't thneed the DSP
* test is weird
* thneed modeld uses 6.4% CPU
* add thneed to release
* move to debug
* delete some junk from the pr
* always track the timestamp
* timestamp hacks in thneed
* create a new command queue
* fix timestamp
* pretty much back to what we had, you can't use SNPE with thneed
* improve thneed test
* disable save log
Co-authored-by: Comma Device <device@comma.ai>