* delete unused stuff
* remove CL interceptor from thneed since we don't use SNPE anymore
* remove dead files from release
* that's removed
* oops, didn't save
old-commit-hash: 6c39382d71
* compiling, won't work yet
* running with inputs and outputs
* there's some magic chance this works
* no more dlc, include onnx
* yolo tests plz
* bump tinygrad
* files_common + delete dlc
* tinygrad_repo -> tinygrad
* pre commit config
* llops needed
* extra in files_common
* bump tinygrad
* fix indent
* tinygrad/nn/__init__
* tinygrad_repo
* bump tinygrad repo
* bump tinygrad
* bump with native_exp, match maybe
* native_explog is argument
* pyopencl no cache
* 5% chance this matches
* work in float32?
* bump tinygrad
* fix build
* no __init__
* fix recip
* dumb hack
* adding thneed PC support
* fix pc segfault
* pc thneed is working
* to_image
* prints stuff with debug=2
* it sort of works
* copy host ptr is simpler
* bug fix
* build on c3
* this correct?
* reenable float16
* fix private, fixup copy_inputs internal
* bump tinygrad and update ref commit
* fix OPTWG on PC
* maybe fix non determinism
* revert model replay ref commit
* comments, init zeroed out buffers
* upd ref commit
* bump tinygrad to fix initial image
* try this ref
Co-authored-by: Comma Device <device@comma.ai>
old-commit-hash: 40d6f4b65c
* pc thneed prereqs
* ugh, out of date
* that can stay private
* memcpy here is fine in SNPE variant
* release files
* thneed docs don't work anymore. they didn't look too useful
Co-authored-by: Comma Device <device@comma.ai>
old-commit-hash: b6e355a933
* add thneed optimizer
* local work group opt
* kernels and final mods
* release files
* build system touchups
* fix kernel path, rand inputs for self test
* broken since extra is gone
* update model replay ref
Co-authored-by: Comma Device <device@comma.ai>
old-commit-hash: 90beaebefb
* completely untested
* it builds now
* bug fixes, save 1ms
* using a kernel to copy works
* more sane API to loadyuv
Co-authored-by: Comma Device <device@comma.ai>
old-commit-hash: 83ff9ca331
* use cstring instead of string.h
* use cstdio instead of stdio.h
* remove inttypes.h
* use cstdlib instead of stdlib.h
* use cstdint instead of stdint.h
* #include <cstddef>
* cstdlib
* use cmath
* remove stddef.h
* use cassert
* use csignal
* use ctime
* use cerror
* rebase master
old-commit-hash: c53cb5d570
* start thneed load/save
* compiling
* fix loading
* build thneed model in scons
* don't hardcode /data/openpilot
* release files
* those too
* support for loading/saving binary kernels
* save binaries out of json band
* make binary a command line flag to the compiler
* need include assert
* fix shadowed common in SConscript
* cleanup run.h
* hmm, the recurrent buffer wasn't 0ed
* ugh, unique ptr
* remove power constraint, refactor record
* Revert "remove power constraint, refactor record"
This reverts commit bb6fa52db6df59cd9d6420a6f630430e35af8a5e.
* print on thneed stop
* fingers crossed for this one
* recorded
* just curious
* okay okay, pass tests?
* cleanups
* refactor wait
Co-authored-by: Comma Device <device@comma.ai>
Co-authored-by: Adeeb Shihadeh <adeebshihadeh@gmail.com>
old-commit-hash: 59fac9fdc6
* bigmodel
* more debug print
* debugging bigmodel
* remove the tanh, debugging
* print images/buffers
* disassemble the command queues
* decompiler
* dump the shaders
* full disasm
* support patching kernel and fixing convolution_horizontal_reduced_reads_1x1
* microbenchmark
* 42 GFLOPS, 1 GB/s
* gemm benchmark
* 75 GFLOPS vs 42 GFLOPS
* 115 GFLOPS
* oops, never mind
* gemm image is slow
* this is pretty hopeless
* gemm image gets 62 GFLOPS
* this is addictive and still a waste of time
* cleanup cleanup
* that hook was dumb
* tabbing
* more tabbing
Co-authored-by: Comma Device <device@comma.ai>
old-commit-hash: 78a352a8ca
* thneed runs the model
* thneed is doing the hooking
* set kernel args
* thneeding the bufferS
* print the images well
* thneeds with better buffers
* includes
* disasm adreno
* parse packets
* disasm works
* disasm better
* more thneeding
* much thneeding
* much more thneeding
* thneed works i think
* thneed is patient
* thneed works
* 7.7%
* gpuobj sync
* yay, it mallocs now
* cleaning it up, Thneed
* sync objs and set power
* thneed needs inputs and outputs
* thneed in modeld
* special modeld runs
* can't thneed the DSP
* test is weird
* thneed modeld uses 6.4% CPU
* add thneed to release
* move to debug
* delete some junk from the pr
* always track the timestamp
* timestamp hacks in thneed
* create a new command queue
* fix timestamp
* pretty much back to what we had, you can't use SNPE with thneed
* improve thneed test
* disable save log
Co-authored-by: Comma Device <device@comma.ai>
old-commit-hash: 302d06ee70