Deploying a Machine-Learned Planner for Autonomous Vehicles in San Francisco

November 5, 2021Tech Insights

Deploying a Machine-Learned Planner for Autonomous Vehicles in San Francisco

Note: Woven Planet became Woven by Toyota on April 1, 2023. Note: This blog was originally published on November 5, 2021 by Level 5, which is now part of Woven Planet. This post was updated on December 13, 2022.

Visualized data of roads, objects, and vehicles collected from an autonomous vehicle

By Moritz Niendorf, Senior Engineering Manager; and Alan Agon, Product Manager

At Woven Planet, we believe our machine learning-first approach to motion planning for autonomous vehicles (AVs) will chart the course for the future of self-driving technology. Our Machine-Learned (ML) Planner is now the default planner operating on our autonomous vehicles driving in San Francisco and Palo Alto.

Today, we’re excited to share examples of our machine-learned motion planning system in action and insight into how we’re building it.

In with Autonomy 2.0

The motion planner acts as the decision-making center for AVs. It sits in the middle of the AV’s technology stack, leveraging input from the vehicle’s sensors to generate the best trajectory for the AV, amidst uncertainty and often contradictory rules.

An AV planner has to consistently choose the right driving behaviors to ensure safety, legality, comfort, and route efficiency. Trying to summarize the complexity of the real world with hand-crafted rules may be limiting the speed and scalability of AV development. Instead, we believe a fully machine-learned system that’s trained using human driving data will be a catalyst for the next wave of advancements in autonomous driving.

Comparison between Autonomy 1.0 and Autonomy 2.0

This ML-first approach is called Autonomy 2.0. We believe that it can overcome the challenges inherent to the traditional rule-based Autonomy 1.0 approach that requires engineers to hand-code rules for every conceivable event. You can dig deeper into Autonomy 2.0 in our white paper, but simply put, it’s based on:

An end-to-end, fully differentiable AV stack trainable from human demonstrations
Closed-loop, data-driven reactive simulation
Large-scale, low-cost data collection to enable scalability

Following this approach, we began applying machine learning algorithms to planning, allowing us to learn from complex driving scenarios. We’ve confirmed that continuously adding more datasets and iteratively improving algorithms and frameworks improves the performance of our AV planner.

Our fully machine-learned planner in action

Our current ML Planner takes in a representation of the environment by using our perception stack and map, and then generates a trajectory — that is, a sequence of positions, velocities, accelerations, orientations, and steering commands — using a single neural network. A lightweight rule-based secondary layer enables us to guarantee certain behaviors (e.g., always stopping a minimum distance away from a red traffic light should the ML planner choose to stop closer), allowing us to accelerate development while we continuously improve the ML Planner.

This planner has a drastically simplified architecture and doesn’t require encoding any prior knowledge into it like the old rule-based approach demands. Instead, it learns from a large number of real-world examples.

_{The schematic of the current iteration of the machine learning-first planner, where the ML Planner — a single neural network — is the primary trajectory generation module.}

This ML Planner is now driving on public roads in San Francisco and handling a variety of complex driving scenarios.

In this video, our AV successfully:

Stops smoothly at a red light and behind a lead vehicle
Evaluates whether to drive through or stop at a yellow light
Accelerates and decelerates appropriately for vehicles cutting in and out of its lane
Nudges laterally around obstacles in its lane

Instead of investing effort to hand-code the desired behavior, our planner handles all of these scenarios well simply by being trained on a large amount of data.

We’re sure. More data leads to better performance.

By powering our machine learning approach with large, curated datasets, we’re learning from expert driving data at scale to rapidly improve performance and exhibit more nuanced behaviors.

We observed this phenomenon early on when the ML Planner struggled to perform good turning behavior. Our hypothesis was that turns were underrepresented in the training data set, so we added an additional dataset targeted at turning. Sure enough, we found that the model’s performance improved significantly after adding the additional training data. As we tested the planner on new routes, we also saw the ML Planner generalize well to new environments.

_{Our early models (BEFORE) crosses the inner lane boundary and leaves the road. After adding more turning data, the car stays centered in its lane (AFTER).}

This example gave us a playbook to address performance issues by using more data rather than through time-consuming hand engineering. We’ve used this approach to tackle many scenarios on San Francisco’s streets, and as we continue to gather robust datasets from our fleet and system data, we’ll be able to continuously improve our ML Planner’s performance.

What are the components of a road-ready ML Planner?

The best practices on how to build an ML-based planner are not yet established, so we’re in uncharted territory as we experiment, learn, and quickly iterate.

The figure above shows the typical workflow for our ML Planning engineers. It consists of an inner loop that’s fully contained within our ML training framework and an outer loop that allows us to develop and test ML models that are integrated into the AV environment and with different levels of simulation fidelity.

In order to put an ML-based planner on the road, we found it critical to develop easy-to-use tools and comprehensive metrics to enable fast development iteration. There’s a finite number of both AV engineers and hours in a day to test a planner version against every possible scenario — not to mention the cost required to continuously test and iterate. We need to be able to run a large number of simultaneous tests with different inputs and configurations at the push of a button, and then easily inspect and compare the results and dive deeper into interesting scenarios. Ultimately this method of evaluation allows engineers to review performance in aggregate using a statistical lens rather than custom, rigid test cases.

To achieve this, we built the following core components of our ML Planner:

Modeling and training framework

The ML training framework we built is flexible enough to let us build models for both perception and planning. It gives us the velocity and scale needed for our team to train complex models in hours — not days — and enables us to evaluate metrics and deploy deep learned models into the AV planner.

Simulation framework

Our engineering team runs thousands of large-scale simulations every day at the lowest possible cost in order to continuously test the performance of the software that powers our self-driving systems. Within our simulation framework, we can test how the AV planner will perform based on side-by-side comparisons of different versions or configurations that might have different weights, parameters, or model architectures. We can also switch between different levels of fidelity in order to evaluate the ML model in isolation, its integration with the secondary layer, or the integration of the ML Planner and the motion controller.

_{The ML Planner is reacting to a cut-in ahead of it. The green vehicle represents how the vehicle performed on the road, while the white vehicle shows how the vehicle controlled by our ML Planner reacted in simulation.}

Evaluation framework

To evaluate a version of the planner in simulation, we need to predict how well the planner would perform on the road. This requires a high-quality signal that enables us to reliably predict safety-critical and comfort-related events in simulation (some of which may only happen once in several thousands of miles or more) so developers can focus on improving failure cases. However, we also need to minimize false alerts from simulation so that our engineering team can spend their effort focusing on the real issues.Building the evaluation framework can be as difficult and complex as building AV driving systems themselves. We took a three-fold approach:

We built granular metrics for targeted datasets that allow us to compute specialized metrics for specialized scenarios, e.g., stopping at a red traffic light.
We built general metrics that can be applied to large-scale datasets that allow us to measure the impact of model changes across the full breadth of driving scenarios.
We built a data-driven evaluation system that scales with the performance of the planner and learns from the safety driver’s disengagement and notes. Using safety driver feedback, the system can learn when the AV does well, but also when it doesn’t perform optimally.

Result visualization and inspection framework

With the ability to simulate and evaluate the performance of the stack at scale, our visualization and introspection framework allows us to quickly visualize simulation experiments, analyze the results in aggregate, and zoom into individual metrics and scenarios.

Our engineering team is then able to review, correlate, and cross-reference failure cases with the specific models and data used for each iteration. This enables our engineers to identify failure case instances before returning to the modeling process.

Dataset curation & balancing framework

A large volume of data alone is not enough to continuously iterate and improve the AV planner. We also have to supply it with the right data to handle an ever-growing set of scenarios and edge cases (like something falling out of a truck’s flatbed in front of the AV). We test our planner against a very large data set, and leverage our evaluation framework to identify scenarios in which the planner does not perform well. Then, we add these scenarios to the training data set for the planner to improve in such cases. Much like we’re able to study simulated edge cases with this approach, we’ll eventually be able to apply learnings from specific long-tail events that occur daily via our partnership with Lyft’s rideshare fleet to inform the development of our AV planner — a core differentiator for our AV approach.

Automation framework

We heavily leverage workflow automation to increase development speed. This automation allows us to go through one full loop of the development cycle with few commands or a single click in a GUI. We also automatically train and evaluate our model stand-alone and integrated with the stack on a regular cadence to monitor our progress, track key metrics, and detect unwanted regressions.

Together, all of these components enable us to rapidly iterate on the machine learning-first planner, with a fast turnaround time from modeling to evaluation, followed by introspection.

Driving ahead

Deploying this fully machine-learned motion planner on our test fleet in an environment as challenging as downtown San Francisco marks a significant proof point for our Autonomy 2.0 approach. This deployment also sets us up to leverage data at scale, and take advantage of the progress made in the machine learning field to accelerate our development.

Join us!

We’re looking for talented engineers to help us on this journey. If you’d like to join us, we’re hiring!