What I learned from Tesla AI day?

Mainak Chakraborty
5 min readAug 22, 2021

Probably, it was the most interesting event in AI that happened in 2021. It was really exciting to see developers and Engineers sharing the stage, explaining the most amazing and complex technology available in a market in a simpler way. The event kicked off with a preface from Elon Musk. Elon started off by establishing the fact that Tesla is now more than just a electric car company. The core highlights discussed are as follows:

  1. Tesla Vision : Andrej Karpathy, started off by describing how existing architecture of Tesla car is developed. 8 cameras on all sides are used to extract raw data, in the form of video. This is then used to create vector space. In training, raw data is fed into residual network architectures, more specifically RegNets. It gives a good trade-off between latency and accuracy. Input size to the architecture is 1280*960 12bit(HDR)@36Hz. The output of the RegNet is then passed through feature pyramid networks, more specifically Bi-FPN(Weighted Bi-directional Feature Pyramid Network). Followed by, task specific architecture (YOLO like one-stage for object detector) for multiple use-cases. The use of multiple use-case detectors like traffic-light detection, object detection or lane prediction gives the architecture the name “Hydra-Nets”. Benefits of “Hydra-Nets” are Feature Sharing, ability to fine-tune individual tasks, use of feature cache to speed up fine-tuning. Karpathy, also highlighted how vector space is used in Tesla for autopilot instead of conventional image space. Combining all the sensor data into single space, is an incredible feat of engineering itself. These helps to take decision on the combined sensor data, rather than taking decision on each individual sensor data and then combining it. All the camera outputs are fed into single neural net to create vector space. An extra rectification layer is added for camera calibration and also rectifies blur. Video Module is added to learn the features and give context to complex real-time scenarios like “whether the car is moving?”. Video module takes multi-cam feature space as input and also the kinematics of the vehicle. Spatial RNN is trained on these inputs. These improves robustness to temporary occlusion and also improves depth and velocity perceptions. All this is achieved using video networks alone.
Representation of Tesla Vision Architecture (Source: Tesla)

2. Planning and Control : Director of Autopilot Software Ashok Elluswamy, explains how planning and control works on the core objective (transportation) while achieving comfort, efficiency and safety. The key problem in doing so is the Action space, which is non-convex and may converge to local maxima. Not to forget it is also high dimensional. Tesla’s solution to this problem is “Hybrid planning system”, where the input is the vector space and passed through coarse search to output convex corridor. This is followed by continuous optimization to obtain smooth trajectory. The importance to plan not only for itself but also for other vehicle is described in detail. Another idea is suggested to use learning instead of heuristics (Euclidean Distance/ Euclidean Distance+Navigation) to reach the destination, just like Monte Carlo Tree Search used in AlphaZero.

Final Architecture (Source: Tesla)

3. Labelling : Generally, Computer vision teams works closely with data and annotation teams to manually label objects for detection. These annotations are basically in 2-d images. Tesla, has created a 4-D labelling, to label data in vector space. Thus, label once and simultaneously label all the cameras in many frames. However, this was not enough and Tesla is moving into auto- labelling. The data from different vehicles are registered together to create a annotation of a particular region at a particular time. This means a vehicle will learn about the region, without even setting foot on it and effectively handling occlusions. Edge cases are trained using simulation. Simulation helps to label data that is difficult to source and difficult to label. This is where cyber-truck made a debut. The networks in the vehicles were traded to 371 million simulated images and half-a-billion cuboids.

4. Dojo: Ganesh Venkataramanan, Project Dojo’s lead, explained the need for a super-fast training mechanism for training Tesla Autopilot. Thus, giving rise to Dojo. Dojo is a distributed compute architecture connected by network fabric. It also has a large compute plane, extremely high bandwidth with low latencies, and big networks that are partitioned and mapped, to name a few. It is purely built for learning, with 500,000 training nodes being built together. 9 petaflops of computer/ tile and 36 terabytes/second of off-tile bandwidth. These will be connected seamlessly with modular level cooling to built a exaPOD. The fastest Neural Network training system ever known to mankind. To quote Ganesh, “The D1 chip is a GPU level compute with CPU level Flexibility and twice the network chip level I/O bandwidth”. Guess what? Dojo is yet to completion and is still evolving and being research upon.

Visual Representation of Tesla Training Tile (Source: Tesla)

6. Tesla Bot: The last attraction is the concept bot, described by Musk as general purpose robots. It can be used for manual tasks, like construction workers, in factories or for household chores. My take, if it goes through production we can forget about manual labour in space. No pressure suits, oxygen requirements. But will it ever be in production? or is this a marketing gimmick. Future 5'8'’ tesla bot will contain autopilot cameras and dojo training.

Tesla Bot (Source: Tesla)

All these from a person, who few years back openly explained the need to slow down AI, in a famous podcast. Hypocrite? or the First mover? Only Future can tell.

In the Q&A session, Musk explained the technology built by Tesla can be licensed to interested ones, maybe open-sourced in distant future. Dojo is set to disrupt the market next year.

“We basically want to encourage anyone who is interested in solving real-world AI problems at either the hardware or the software level to join Tesla, or consider joining Tesla,” said Musk at the end.

For us ? excited for torch.device(“dojo”)

Want to know more?