openpilot_comma/selfdrive/modeld/models/README.md

## Neural networks in openpilot
To view the architecture of the ONNX networks, you can use [netron](https://netron.app/)

## Supercombo
### Supercombo input format (Full size: 799906 x float32)
* **image stream**
  * Two consecutive images (256 * 512 * 3 in RGB) recorded at 20 Hz : 393216 = 2 * 6 * 128 * 256
    * Each 256 * 512 image is represented in YUV420 with 6 channels : 6 * 128 * 256
      * Channels 0,1,2,3 represent the full-res Y channel and are represented in numpy as Y[::2, ::2], Y[::2, 1::2], Y[1::2, ::2], and Y[1::2, 1::2]
      * Channel 4 represents the half-res U channel
      * Channel 5 represents the half-res V channel
* **wide image stream**
  * Two consecutive images (256 * 512 * 3 in RGB) recorded at 20 Hz : 393216 = 2 * 6 * 128 * 256
    * Each 256 * 512 image is represented in YUV420 with 6 channels : 6 * 128 * 256
      * Channels 0,1,2,3 represent the full-res Y channel and are represented in numpy as Y[::2, ::2], Y[::2, 1::2], Y[1::2, ::2], and Y[1::2, 1::2]
      * Channel 4 represents the half-res U channel
      * Channel 5 represents the half-res V channel
* **desire**
  * one-hot encoded buffer to command model to execute certain actions, bit needs to be sent for the past 5 seconds (at 20FPS) : 100 * 8
* **traffic convention**
  * one-hot encoded vector to tell model whether traffic is right-hand or left-hand traffic : 2
* **feature buffer**
  * A buffer of intermediate features that gets appended to the current feature to form a 5 seconds temporal context (at 20FPS) : 99 * 512


### Supercombo output format (Full size: XXX x float32)
Read [here](https://github.com/commaai/openpilot/blob/90af436a121164a51da9fa48d093c29f738adf6a/selfdrive/modeld/models/driving.h#L236) for more.


## Driver Monitoring Model
* .onnx model can be run with onnx runtimes
* .dlc file is a pre-quantized model and only runs on qualcomm DSPs

### input format
* single image W = 1440 H = 960 luminance channel (Y) from the planar YUV420 format:
  * full input size is 1440 * 960 = 1382400
  * normalized ranging from 0.0 to 1.0 in float32 (onnx runner) or ranging from 0 to 255 in uint8 (snpe runner)
* camera calibration angles (roll, pitch, yaw) from liveCalibration: 3 x float32 inputs

### output format
* 84 x float32 outputs = 2 + 41 * 2 ([parsing example](https://github.com/commaai/openpilot/blob/22ce4e17ba0d3bfcf37f8255a4dd1dc683fe0c38/selfdrive/modeld/models/dmonitoring.cc#L33))
  * for each person in the front seats (2 * 41)
    * face pose: 12 = 6 + 6
      * face orientation [pitch, yaw, roll] in camera frame: 3
      * face position [dx, dy] relative to image center: 2
      * normalized face size: 1
      * standard deviations for above outputs: 6
    * face visible probability: 1
    * eyes: 20 = (8 + 1) + (8 + 1) + 1 + 1
      * eye position and size, and their standard deviations: 8
      * eye visible probability: 1
      * eye closed probability: 1
    * wearing sunglasses probability: 1
    * face occluded probability: 1
    * touching wheel probability: 1
    * paying attention probability: 1
    * (deprecated) distracted probabilities: 2
    * using phone probability: 1
    * distracted probability: 1
  * common outputs 2
    * poor camera vision probability: 1
    * left hand drive probability: 1
Sphinx docs generation (#22697) * add sphinx * switch theme * Experiment: sphinx docs generation updated (#22708) * moved build to root gitignore, added .gitkeep * Improved makefile doc build process - Removed auto-generated docs from source control - Moved apidoc.sh into Makefile - Removed make.bat (can add back if Windows support desired) - Added sphinx viewcode and markdown extensions - Added feature to source /docs in build, so any .rst file in /docs will override the respective file during the build process - Added feature to copy all markdown/rst files from source into /build/ during build process so they can be easily referenced while writing docs (see examples in index.md) - Wrote basic starter index.md file TODO: Add new dependencies to Pipfile [dev-packages] * Revert accidental modification to Pipfile * fix command substitution * exclude xx * improve docs * dont include all docs in release build * Add dockerfile * update title * include normal readme * build container in CI * use buildkit * add login Co-authored-by: Chad Bailey <chadbailey904@gmail.com> 4 years ago			`## Neural networks in openpilot`
Models README (#22523) * first commit * Update README.md 4 years ago			`To view the architecture of the ONNX networks, you can use [netron](https://netron.app/)`

			`## Supercombo`
update modeld/models readme (#27564) * update readme with new model design * typo * space 3 years ago			`### Supercombo input format (Full size: 799906 x float32)`
Models README (#22523) * first commit * Update README.md 4 years ago			`* image stream`
Update README.md 4 years ago			`* Two consecutive images (256 * 512 * 3 in RGB) recorded at 20 Hz : 393216 = 2 * 6 * 128 * 256`
			`* Each 256 * 512 image is represented in YUV420 with 6 channels : 6 * 128 * 256`
			`* Channels 0,1,2,3 represent the full-res Y channel and are represented in numpy as Y[::2, ::2], Y[::2, 1::2], Y[1::2, ::2], and Y[1::2, 1::2]`
			`* Channel 4 represents the half-res U channel`
			`* Channel 5 represents the half-res V channel`
			`* wide image stream`
Models README (#22523) * first commit * Update README.md 4 years ago			`* Two consecutive images (256 * 512 * 3 in RGB) recorded at 20 Hz : 393216 = 2 * 6 * 128 * 256`
Update README.md 4 years ago			`* Each 256 * 512 image is represented in YUV420 with 6 channels : 6 * 128 * 256`
Models README (#22523) * first commit * Update README.md 4 years ago			`* Channels 0,1,2,3 represent the full-res Y channel and are represented in numpy as Y[::2, ::2], Y[::2, 1::2], Y[1::2, ::2], and Y[1::2, 1::2]`
			`* Channel 4 represents the half-res U channel`
[Docs] Models: README: Fix channel number (#22530) Fix the channel number for the V-channel of the supercombo input image stream 4 years ago			`* Channel 5 represents the half-res V channel`
Models README (#22523) * first commit * Update README.md 4 years ago			`* desire`
update modeld/models readme (#27564) * update readme with new model design * typo * space 3 years ago			`* one-hot encoded buffer to command model to execute certain actions, bit needs to be sent for the past 5 seconds (at 20FPS) : 100 * 8`
Models README (#22523) * first commit * Update README.md 4 years ago			`* traffic convention`
Update README.md 4 years ago			`* one-hot encoded vector to tell model whether traffic is right-hand or left-hand traffic : 2`
update modeld/models readme (#27564) * update readme with new model design * typo * space 3 years ago			`* feature buffer`
modeld: parsing and publishing in python (#30273) * WIP try modeld all in python * fix plan * add lane lines stds * fix lane lines prob * add lead prob * add meta * simplify plan parsing * add hard brake pred * add confidence * fix desire state and desire pred * check this file for now * rm prints * rm debug * add todos * add plan_t_idxs * same as cpp * removed cython * add wfd width - rm cpp code * add new files rm old files * get metadata at compile time * forgot this file * now uses more CPU * not used * update readme * lint * copy this too * simplify disengage probs * update model replay ref commit * update again * confidence: remove if statemens * use publish_state.enqueue * Revert "use publish_state.enqueue" This reverts commit d8807c8348338a1f773a8de00fd796abb8181404. * confidence: better shape defs * use ModelConstants class * fix confidence * Parser * slightly more power too * no inline ifs :( * confidence: just use if statements 2 years ago			`* A buffer of intermediate features that gets appended to the current feature to form a 5 seconds temporal context (at 20FPS) : 99 * 512`
Models README (#22523) * first commit * Update README.md 4 years ago

Update README.md 3 years ago			`### Supercombo output format (Full size: XXX x float32)`
Modeld readme replaced with driving.h 3 years ago			`Read [here](https://github.com/commaai/openpilot/blob/90af436a121164a51da9fa48d093c29f738adf6a/selfdrive/modeld/models/driving.h#L236) for more.`
Models README (#22523) * first commit * Update README.md 4 years ago
Add DM model readme (#22801) * add dm model * link to parsing 4 years ago
			`## Driver Monitoring Model`
			`* .onnx model can be run with onnx runtimes`
			`* .dlc file is a pre-quantized model and only runs on qualcomm DSPs`

			`### input format`
models/README: DM input is only luma (#29188) dm input = only luma 2 years ago			`* single image W = 1440 H = 960 luminance channel (Y) from the planar YUV420 format:`
update DM model README (#28211) * update DM model readme * rename + permalink + correct types and ranges 2 years ago			`* full input size is 1440 * 960 = 1382400`
			`* normalized ranging from 0.0 to 1.0 in float32 (onnx runner) or ranging from 0 to 255 in uint8 (snpe runner)`
update dm model README (#28238) add calib dm input in readme 2 years ago			`* camera calibration angles (roll, pitch, yaw) from liveCalibration: 3 x float32 inputs`
Add DM model readme (#22801) * add dm model * link to parsing 4 years ago
			`### output format`
update DM model README (#28211) * update DM model readme * rename + permalink + correct types and ranges 2 years ago			`* 84 x float32 outputs = 2 + 41 * 2 ([parsing example](https://github.com/commaai/openpilot/blob/22ce4e17ba0d3bfcf37f8255a4dd1dc683fe0c38/selfdrive/modeld/models/dmonitoring.cc#L33))`
			`* for each person in the front seats (2 * 41)`
			`* face pose: 12 = 6 + 6`
			`* face orientation [pitch, yaw, roll] in camera frame: 3`
			`* face position [dx, dy] relative to image center: 2`
			`* normalized face size: 1`
			`* standard deviations for above outputs: 6`
			`* face visible probability: 1`
			`* eyes: 20 = (8 + 1) + (8 + 1) + 1 + 1`
			`* eye position and size, and their standard deviations: 8`
			`* eye visible probability: 1`
			`* eye closed probability: 1`
			`* wearing sunglasses probability: 1`
			`* face occluded probability: 1`
			`* touching wheel probability: 1`
			`* paying attention probability: 1`
			`* (deprecated) distracted probabilities: 2`
			`* using phone probability: 1`
			`* distracted probability: 1`
			`* common outputs 2`
			`* poor camera vision probability: 1`
			`* left hand drive probability: 1`