TVM in the Arene AI Platform of Woven Planet

2022年1月18日Tech Insights

記事を共有する

Note: Woven Planet became Woven by Toyota on April 1, 2023.

アリーンとアパッチTVMのロゴの横に点が散らばっている様子から徐々に統合されて矢印を形成するイラスト

By Ryo Takahashi, Srushti Rashmi Shirish, Koichiro Yamaguchi, Takaaki Tagawa and Yusuke Yachide, Senior Engineer, ML Engineer, Staff Engineer, Senior Engineer and Senior Manager

Woven Planet Holdings, Inc. (Woven Planet) proceeds with various initiatives for future mobility solutions. Automated driving and robotics are, of course, our vital projects. In addition to them, we have been developing an Automated Mapping Platform (AMP), a vehicle programming platform called Arene and a smart city project, Woven City, in parallel. We aim to innovate mobility by leveraging the synergies of these projects. At the same time, we believe that machine learning (ML) is the key technology for all of our projects. Based on this belief, we aim to perform ML inference in various scenes and on various processors to make our perception modules more advanced and eventually contribute to safer vehicle automation.

In order to support the development of such ML applications, we have been working on a development environment called Arene AI Platform. The scope of this platform is not limited to ML training. It includes all the ML development steps ranging from data annotation, evaluation, deployment, and monitoring (e.g. drift analysis). One of these steps is called inference optimization. This step boosts inference speed of neural networks (NN) through model compactizaiton (e.g. quantization), graph-level optimization (e.g. kernel fusion), parallel computation (e.g. vectorization) and so on. In this step, we position a software called Apache TVM [1] as the core component.

Apache TVM is an open-sourced NN compiler, which is supported by Apache Software Foundation. Unlike general compilers such as GCC (GNU Compiler Collection), TVM is a domain-specific compiler, which generates an executable optimized for a given processor target from a trained model such as TensorFlow and PyTorch. In this field, there are many similar solutions such as Glow [2]. Processor vendors also offer their own proprietary compilers by themselves. On the other hand, TVM has a large number of supported ML frameworks and processors, and is under continuous development by ML engineers and compiler experts around the world.

Why focus on Apache TVM ?

We focus on TVM because we expect that TVM will resolve the three issues shown in Figure 1 that Woven Planet is currently facing. The first issue is the difference between inference APIs of proprietary solutions developed by processor vendors (Issue 1). As mentioned above, Woven Planet needs NNs in all of our businesses, and our inference processors range from high performance to low-power, from CPUs to GPUs to NPUs (Neural Processing Units). In response to this demand, many vendors now offer their own proprietary solutions for optimization, but their inference APIs are slightly different from each other. Consequently, this difference produces non-reusable inference code every time we support a new processor vendor. The second issue is, because of the different compilation procedures among proprietary solutions, Arene AI Platform has to keep maintaining multiple deployment pipelines (Issue 2). NN research is very rapid, and proprietary compilers are updated day by day. Therefore, our ML engineers need to devote a huge amount of person-hours to maintain their deployment pipelines. The last issue is proprietary compilers are mostly blackbox for us (Issue 3). To bring the latest research achievements into our products, we’re constantly attempting to compile new computation layers. However, this attempt often leads to compile errors, and in the case of proprietary compilers, it is difficult to resolve these errors except by contacting the processor vendor. While Woven Planet is blessed with many customer-oriented partners, this problem tends to be a barrier for us to deploy our state-of-the-art ML models on various processors.

Woven Planet focuses on Apache TVM as a solution for these issues. Taking advantage of TVM’s great extendability, Arene AI Edge team is working on integration of several processors. When this effort is successful, we will be able to do whiteboxing the component that interfaces with ML frameworks, called the frontend. In addition, we will be able to debug and modify the frontend as needed. Of course, we’re willing to contribute to the upstream repository of TVM. On the other hand, through frequently merging the upstream release into Arene AI Platform, we also expect to try out a wider variety of newer computation layers in our product processors. From the application developer perspective, they can use all the processors through the inference API of TVM. In other words, the TVM interface will become a kind of Hardware Abstraction Layer (HAL). Furthermore, we’re extending this compiler infrastructure as a managed service to free our ML engineers and data scientists from non-core operational work. It’s also a fact that there are several tools that could achieve the same objective. However, we believe that TVM is the closest solution to Arene’s vision in terms of 1) flexible and extendable Intermediate Representation such as Relay [3], 2) mature community management, and 3) rich documentation.

_{Figure 1: Unified abstraction of all the processors by Apache TVM}

Experiment

Nevertheless, it is not possible to achieve the transition shown in Figure 1 for dozens of processors overnight. However, Woven Planet has already begun to take steps in this direction. One of such examples is benchmarking in AMP, which is continuously generating a high-definition map to support vehicle automation on a global scale. In the pipeline of Figure 2, we’re continuously fusing anonymized vehicle probe data (e.g. RGB images) and space maps (e.g. satellite images). This is a typical batch inference of NN and, in fact, we are cascading several ML models through this pipeline. One such model is semantic segmentation of satellite images [4]. We are allocating tens or hundreds of CPU instances every day to accumulate segmentation results of satellite images. This year, we have optimized this semantic segmentation model with OctoML’s Auto-scheduler. As a result, we were able to achieve 80% and 28% inference time reduction compared to the learning framework and other inference engines, respectively, as shown in Figure 3. Of course, as many TVM fans imagine, this optimization is still in progress, and further speedups can be expected by making full use of quantization, vectorization, and the latest Auto Scheduler. We are working on similar benchmarking and bottleneck identification on a company-wide scale, and TVM is playing a major role in this effort.

_{Figure 2: High-level overview of Automated Mapping Platform}

_{Figure 3: Benchmarking Apache TVM}

Next Steps

The above application is batch inference in the cloud, but Woven Planet is an automotive company and will naturally support real-time inference in edge devices such as ADAS/AD. For practical use of TVM in the edge, we need to follow more functional safety standards such as ISO-26262. Also in this perspective, TVM is friendly to the automotive industry. Because the host-side compiler code and the edge-side runtime code are well-decoupled, and the mainstream TVM itself offers a MISRA-C/C++ compliant runtime. Of course, this is not enough to declare the achievement of functional safety, but these designs and assets will be a great advantage in complying with Woven Planet’s safety standards.

Another direction is integration with our training components such as Network Architecture Search (NAS). In Figure 1, we pointed out the differences between inference APIs of proprietary solutions, but the biggest difference is in the profiler interface. In hardware-aware NAS, we usually utilize processor statistics (e.g per-layer latency) obtained from each profiler. However, some vendors use Microsoft® Excel® files, others use the standard output (stdout), thus integration is usually difficult. On the other hand, TVM also provides a useful abstraction in the profiler interface. Currently, we’re integrating this profiler interface and our in-house NAS algorithm. In the future, we plan to extend this framework then pursue hardware/software codesign as Woven Planet.

In this article, we mainly discussed our motivation for adopting Apache TVM. At a virtual conference, TVMCon 2021, Ryo Takahashi in Woven Planet mentioned more specific use cases, such as integration with other components of the Arene AI Platform (e.g. Kubeflow). As with last year’s event, the organizer distributes recorded presentations on a video-sharing platform, so if you are interested, please check out our presentation as well. The project also needs a variety of experts, from model compactization algorithm researchers to compiler geeks to MLOps engineers. If you are interested in our Senior Software Engineer post, please apply here.

Reference

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, pp. 579–594, 2018.
Mingzhen Li, Yi Liu, Xiaoyan Liu, Qingxiao Sun, Xin You, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian. The Deep Learning Compiler: A Comprehensive Survey. In IEEE Transactions on Parallel & Distributed Systems, pp. 708–727, 2021
Jared Roesch, Steven Lyubomirsky, Logan Weber, Josh Pollock, Marisa Kirisame, Tianqi Chen, and Zachary Tatlock. Relay: A New IR for Machine Learning Frameworks. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision, pp. 833–851, 2018.