openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for over 200 supported car makes and models.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Vehicle Researcher c0c72376f1 openpilot v0.9.8 1 month ago
..
logs openpilot v0.9.8 1 month ago
tfexample openpilot v0.9.8 1 month ago
README.md openpilot v0.9.8 1 month ago

README.md

Google's TPU

We document the Google TPU v2/v3 in order to support it in tinygrad without the XLA compiler.

Creating a Google Cloud TPU VM

This costs $4.50/hr for a TPUv2-8 machine, the cheapest VM.

gcloud alpha compute tpus tpu-vm create test --zone=us-central1-b --accelerator-type=v2-8 --version=v2-alpha
gcloud alpha compute tpus tpu-vm ssh test --zone us-central1-b
# and for when you are done
gcloud alpha compute tpus tpu-vm delete test --zone us-central1-b
gcloud alpha compute tpus tpu-vm list --zone us-central1-b

Aside from the usual VM stuff, there's 4 accelerators on the PCI-E bus. (v2-8 is 4 chips with 2 cores each)

# lspci
00:04.0 Unassigned class [ff00]: Google, Inc. Device 0027
00:05.0 Unassigned class [ff00]: Google, Inc. Device 0027
00:06.0 Unassigned class [ff00]: Google, Inc. Device 0027
00:07.0 Unassigned class [ff00]: Google, Inc. Device 0027

They show up in /sys/class/accel (tons of files here) and the driver lives in /lib/libtpu.so. The devices are in /dev/accel[0-3], and a bunch of stuff is mmaped. They are "ba16c7433" chips.

We grab the minimal TPU example from TensorFlow. When the compiler runs, it produces tons of great logs in /tmp/tpu_logs

cd tfexample
gcc -o libtpu_client libtpu_client.c -ltpu
TPU_VLOG_LEVEL=99 ./libtpu_client

From these logs, we find the "LLO Instructions"

VLIW Instruction (322b VLIW bundle)

  spare         : 0   (0,1)
  vex_mxu       : 0   (1,1)
* 1 misc slot
  msc_targ      : 0   (2,3)
  msc_opnd      : 0   (5,3)
  msc_op        : 0   (8,5)
  msc_pred      : 31  (13,5)
* 2 matrix slots (push, pop)
  vres_dest     : 28  (18,2)
  vres_op       : 28  (20,2)
  vres_pred     : 31  (22,5)
  vex_source    : 28  (27,2)
  vex_subop     : 24  (29,3)
  vex_op        : 24  (32,3)
  vex_pred      : 31  (35,5)
* 4 vector slots (2 for load/store)
  vld_ttu       : 30  (40,1)
  vld_stride    : 24  (41,3)
  vld_offset    : 24  (44,2)
  vld_base      : 24  (46,2)
  vld_submsk    : 24  (48,3)
  vld_dest      : 0   (51,5)
  vld_op        : 0   (56,2)
  vld_pred      : 31  (58,5)
  vst_ttu       : 30  (63,1)
  vst_iar       : 30  (64,1)
  vst_value_two : 24  (65,3)
  vst_offset    : 24  (68,2)
  vst_base      : 24  (70,2)
  vst_value_one : 24  (72,3)
  vst_source    : 0   (75,5)
  vst_op        : 0   (80,5)
  vst_pred      : 31  (85,5)
* 4 vector slots (2 for ALU)
  v1_dest       : 0   (90,5)
  v1_y_vreg     : 0   (95,5)
  v1_y_src      : 0   (100,5)
  v1_x          : 0   (105,5)
  v1_op         : 0   (110,6)
  v1_pred       : 31  (116,5)
  v0_dest       : 0   (121,5)
  v0_y_vreg     : 0   (126,5)
  v0_y_src      : 0   (131,5)
  v0_x          : 0   (136,5)
  v0_op         : 0   (141,6)
  v0_pred       : 31  (147,5)
* 3 scalar registers copied in to the vector units?
  vs2           : 0   (152,5)
  vs1           : 0   (157,5)
  vs0           : 0   (162,5)
* 6 immediates (16-bit each, two can be merged for 32)
  imm_5         : 0   (167,16)
  imm_4         : 0   (183,16)
  imm_3         : 0   (199,16)
  imm_2         : 0   (215,16)
  imm_1         : 0   (231,16)
  imm_0         : 0   (247,16)
* ttu? what's a ttu?
  ttu_set_btr   : 0   (263,1)
  ttu_iterate   : 0   (264,1)
  ttu_row       : 0   (265,3)
* 2 scalar slots
  s1_dest       : 0   (268,5)
  s1_y          : 0   (273,6)
  s1_x          : 0   (279,5)
  s1_op         : 0   (284,6)
  s1_pred       : 31  (290,5)
  s0_dest       : 0   (295,5)
  s0_y          : 0   (300,6)
  s0_x          : 0   (306,5)
  s0_op         : 0   (311,6)
  s0_pred       : 15  (317,5)

Running a Program (WIP)

Our goal is to run a program on TPU without the driver.

...
openat(AT_FDCWD, "/dev/accel3", O_RDWR) = 184
mmap(NULL, 27799736, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED, 184, 0) = 0x7f59a74b3000
# size is 0x1a830b8, aka 28MB