openpilot_comma

openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for over 200 supported car makes and models.

History

George Hotz b6e355a933 modeld: PC Thneed prereqs (#25615 ) * pc thneed prereqs * ugh, out of date * that can stay private * memcpy here is fine in SNPE variant * release files * thneed docs don't work anymore. they didn't look too useful Co-authored-by: Comma Device <device@comma.ai>		3 years ago
..
README.md	Update README.md	3 years ago
commonmodel.cc	nv12: encoderd avoids a full frame copy (#24519 )	3 years ago
commonmodel.h	nv12: encoderd avoids a full frame copy (#24519 )	3 years ago
dmonitoring.cc	DM: track RHD predictions (#24947 )	3 years ago
dmonitoring.h	DM: track RHD predictions (#24947 )	3 years ago
dmonitoring_model.current	fullframe DM model (#24860 )	3 years ago
dmonitoring_model.onnx	fullframe DM model (#24860 )	3 years ago
dmonitoring_model_q.dlc	fullframe DM model (#24860 )	3 years ago
driving.cc	modeld: PC Thneed prereqs (#25615 )	3 years ago
driving.h	modeld: delete wide_frame in model_free (#25562 )	3 years ago
supercombo.dlc	Nuclear Grade Model: less memory, more accuracy (#25524 )	3 years ago
supercombo.onnx	Nuclear Grade Model: less memory, more accuracy (#25524 )	3 years ago

README.md

Neural networks in openpilot

To view the architecture of the ONNX networks, you can use netron

Supercombo

Supercombo input format (Full size: 393738 x float32)

image stream
- Two consecutive images (256 * 512 * 3 in RGB) recorded at 20 Hz : 393216 = 2 * 6 * 128 * 256
  - Each 256 * 512 image is represented in YUV420 with 6 channels : 6 * 128 * 256
    - Channels 0,1,2,3 represent the full-res Y channel and are represented in numpy as Y[::2, ::2], Y[::2, 1::2], Y[1::2, ::2], and Y[1::2, 1::2]
    - Channel 4 represents the half-res U channel
    - Channel 5 represents the half-res V channel
wide image stream
- Two consecutive images (256 * 512 * 3 in RGB) recorded at 20 Hz : 393216 = 2 * 6 * 128 * 256
  - Each 256 * 512 image is represented in YUV420 with 6 channels : 6 * 128 * 256
    - Channels 0,1,2,3 represent the full-res Y channel and are represented in numpy as Y[::2, ::2], Y[::2, 1::2], Y[1::2, ::2], and Y[1::2, 1::2]
    - Channel 4 represents the half-res U channel
    - Channel 5 represents the half-res V channel
desire
- one-hot encoded vector to command model to execute certain actions, bit only needs to be sent for 1 frame : 8
traffic convention
- one-hot encoded vector to tell model whether traffic is right-hand or left-hand traffic : 2
recurrent state
- The recurrent state vector that is fed back into the GRU for temporal context : 512

Supercombo output format (Full size: XXX x float32)

Read here for more.

Driver Monitoring Model

.onnx model can be run with onnx runtimes
.dlc file is a pre-quantized model and only runs on qualcomm DSPs

input format

single image (640 * 320 * 3 in RGB):
- full input size is 6 * 640/2 * 320/2 = 307200
- represented in YUV420 with 6 channels:
  - Channels 0,1,2,3 represent the full-res Y channel and are represented in numpy as Y[::2, ::2], Y[::2, 1::2], Y[1::2, ::2], and Y[1::2, 1::2]
  - Channel 4 represents the half-res U channel
  - Channel 5 represents the half-res V channel
- normalized, ranging from -1.0 to 1.0

output format

39 x float32 outputs (parsing example)
- face pose: 12 = 6 + 6
  - face orientation [pitch, yaw, roll] in camera frame: 3
  - face position [dx, dy] relative to image center: 2
  - normalized face size: 1
  - standard deviations for above outputs: 6
- face visible probability: 1
- eyes: 20 = (8 + 1) + (8 + 1) + 1 + 1
  - eye position and size, and their standard deviations: 8
  - eye visible probability: 1
  - eye closed probability: 1
- wearing sunglasses probability: 1
- poor camera vision probability: 1
- face partially out-of-frame probability: 1
- (deprecated) distracted probabilities: 2
- face covered probability: 1