openpilot_comma

openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for over 200 supported car makes and models.

History

Mitchell Goff 42591a19a7 New model: 224877e7-dfcf-4b28-9482-371655918fe1/950 (#23170 ) * New model: 224877e7-dfcf-4b28-9482-371655918fe1/950 * Updated model replay ref old-commit-hash: `1d119999c2`		4 years ago
..
README.md	Add DM model readme (#22801 )	4 years ago
dmonitoring_model.current	…
dmonitoring_model.onnx	…
dmonitoring_model_q.dlc	…
supercombo.dlc	New model: 224877e7-dfcf-4b28-9482-371655918fe1/950 (#23170 )	4 years ago
supercombo.onnx	New model: 224877e7-dfcf-4b28-9482-371655918fe1/950 (#23170 )	4 years ago

README.md

Unescape Escape

Neural networks in openpilot

To view the architecture of the ONNX networks, you can use netron

Supercombo

Supercombo input format (Full size: 393738 x float32)

image stream
- Two consecutive images (256 * 512 * 3 in RGB) recorded at 20 Hz : 393216 = 2 * 6 * 128 * 256
  - Each 256 * 512 image is represented in YUV420 with 6 channels : 6 * 128 * 256
    - Channels 0,1,2,3 represent the full-res Y channel and are represented in numpy as Y[::2, ::2], Y[::2, 1::2], Y[1::2, ::2], and Y[1::2, 1::2]
    - Channel 4 represents the half-res U channel
    - Channel 5 represents the half-res V channel
desire
- one-hot encoded vector to command model to execute certain actions, bit only needs to be sent for 1 frame : 8
traffic convention
- one-hot encoded vector to tell model whether traffic is right-hand or left-hand traffic : 2
recurrent state
- The recurrent state vector that is fed back into the GRU for temporal context : 512

Supercombo output format (Full size: 6472 x float32)

plan
- 5 potential desired plan predictions : 4955 = 5 * 991
  - predicted mean and standard deviation of the following values at 33 timesteps : 990 = 2 * 33 * 15
    - x,y,z position in current frame (meters)
    - x,y,z velocity in local frame (meters/s)
    - x,y,z acceleration local frame (meters/(s*s))
    - roll, pitch , yaw in current frame (radians)
    - roll, pitch , yaw rates in local frame (radians/s)
  - probability¹ of this plan hypothesis being the most likely: 1
lanelines
- 4 lanelines (outer left, left, right, and outer right): 528 = 4 * 132
  - predicted mean and standard deviation for the following values at 33 x positions : 132 = 2 * 33 * 2
    - y position in current frame (meters)
    - z position in current frame (meters)
laneline probabilties
- 2 probabilities¹ that each of the 4 lanelines exists : 8 = 4 * 2
  - deprecated probability
  - used probability
road-edges
- 2 road-edges (left and right): 264 = 2 * 132
  - predicted mean and standard deviation for the following values at 33 x positions : 132 = 2 * 33 * 2
    - y position in current frame (meters)
    - z position in current frame (meters)
leads
- 2 hypotheses for potential lead cars : 102 = 2 * 51
  - predicted mean and stadard deviation for the following values at 0,2,4,6,8,10s : 48 = 2 * 6 * 4
    - x position of lead in current frame (meters)
    - y position of lead in current frame (meters)
    - speed of lead (meters/s)
    - acceleration of lead(meters/(s*s))
  - probabilities¹ this hypothesis is the most likely hypothesis at 0s, 2s or 4s from now : 3
lead probabilities
- probability¹ that there is a lead car at 0s, 2s, 4s from now : 3 = 1 * 3
desire state
- probability¹ that the model thinks it is executing each of the 8 potential desire actions : 8
meta ²
- Various metadata about the scene : 80 = 1 + 35 + 12 + 32
  - Probability¹ that openpilot is engaged : 1
  - Probabilities¹ of various things happening between now and 2,4,6,8,10s : 35 = 5 * 7
    - Disengage of openpilot with gas pedal
    - Disengage of openpilot with brake pedal
    - Override of openpilot steering
    - 3m/(s*s) of deceleration
    - 4m/(s*s) of deceleration
    - 5m/(s*s) of deceleration
  - Probabilities¹ of left or right blinker being active at 0,2,4,6,8,10s : 12 = 6 * 2
  - Probabilities¹ that each of the 8 desires is being executed at 0,2,4,6s : 32 = 4 * 8
pose ²
- predicted mean and standard deviation of current translation and rotation rates : 12 = 2 * 6
  - x,y,z velocity in current frame (meters/s)
  - roll, pitch , yaw rates in current frame (radians/s)
recurrent state
- The recurrent state vector that is fed back into the GRU for temporal context : 512

Driver Monitoring Model

.onnx model can be run with onnx runtimes
.dlc file is a pre-quantized model and only runs on qualcomm DSPs

input format

single image (640 * 320 * 3 in RGB):
- full input size is 6 * 640/2 * 320/2 = 307200
- represented in YUV420 with 6 channels:
  - Channels 0,1,2,3 represent the full-res Y channel and are represented in numpy as Y[::2, ::2], Y[::2, 1::2], Y[1::2, ::2], and Y[1::2, 1::2]
  - Channel 4 represents the half-res U channel
  - Channel 5 represents the half-res V channel
- normalized, ranging from -1.0 to 1.0

output format

39 x float32 outputs (parsing example)
- face pose: 12 = 6 + 6
  - face orientation [pitch, yaw, roll] in camera frame: 3
  - face position [dx, dy] relative to image center: 2
  - normalized face size: 1
  - standard deviations for above outputs: 6
- face visible probability: 1
- eyes: 20 = (8 + 1) + (8 + 1) + 1 + 1
  - eye position and size, and their standard deviations: 8
  - eye visible probability: 1
  - eye closed probability: 1
- wearing sunglasses probability: 1
- poor camera vision probability: 1
- face partially out-of-frame probability: 1
- (deprecated) distracted probabilities: 2
- face covered probability: 1

All probabilities are in logits, so you need to apply sigmoid or softmax functions to get actual probabilities ↩︎
These outputs come directly from the vision blocks, they do not have access to temporal state or the desire input ↩︎

README.md Unescape Escape