AI Quality & Evaluation Engineer, AI product

Woven City Social Infrastructures & Platforms

Tokyo

hybrid

About Woven by Toyota

Woven by Toyota is enabling Toyota’s once-in-a-century transformation into a mobility company. Inspired by a legacy of innovating for the benefit of others, our mission is to challenge the current state of mobility through human-centric innovation — expanding what “mobility” means and how it serves society.

Our work centers on four pillars: AD/ADAS, our autonomous driving and advanced driver assist technologies; Arene, our software development platform for software-defined vehicles; Woven City, a test course for mobility; and Cloud & AI, the digital infrastructure powering our collaborative foundation. Business-critical functions empower these teams to execute, and together, we’re working toward one bold goal: a world with zero accidents and enhanced well-being for all.

=========================================================================

TEAM

Toyota is redefining what it means to move. We're challenging the current state of mobility by enhancing the movement of people, goods, information and energy. Centered around three core concepts - A Living Laboratory™, Human-Centered, and Ever Evolving City™ - Woven City serves as a test course for mobility to fulfill our purpose of well-being for all.

We do this by bringing together a diverse community of people with a shared passion for the future of mobility to co-create, develop and refine innovative products and services. This cross-section of social infrastructure, mobility, and people provides a unique opportunity for inventors, residents and visitors to interact seamlessly with new technologies throughout daily life in an environment that emulates a real city.

For more information about Woven City, please visit: https://www.woven-city.global/

WHO ARE WE LOOKING FOR?

We are looking for the first dedicated QA Engineer for our AI products—someone who understands and embraces the uncertainty and non-deterministic nature of LLM-based systems, and who is interested not only in testing quality, but in making quality work as a sustainable system.

In this role, you will go beyond executing individual test cases. A key mission is to standardize and automate evaluation workflows so that user feedback and evaluation results continuously feed into product quality improvements. You will work closely not only with product development teams, but also with MLOps and data engineers, bringing a QA perspective to redesign, implement, and operate the entire feedback loop.

In particular, we expect you to take ownership of building the first practical quality feedback loop led by QA, even in situations where feedback from real users is limited or not yet sufficiently established.

This position reports to the function leader and offers a hybrid work arrangement. Additionally, there are business trips to Woven-City several times a month.

RESPONSIBILITIES

Define quality dimensions for AI products, including accuracy, consistency, safety, fairness, and UX
Conduct scenario-based testing, exploratory testing, and red teaming
Analyze LLM outputs to identify behavioral trends and failure patterns
Design quality evaluation processes using user feedback such as logs, ratings, and inquiries
Standardize evaluation processes and structure them in a reusable, scalable manner
Automate quality evaluation workflows
Collect and aggregate evaluation data
Establish mechanisms for continuous and recurring quality checks
Build and operate quality feedback loops in collaboration with development, MLOps, and data engineering teams
Document quality issues and risks, and communicate them clearly to relevant teams

MINIMUM QUALIFICATIONS

3+ years experience in QA or testing with products that use LLMs
Experience in test design, including defining test perspectives and creating test cases
Ability to evaluate systems with ambiguous specifications or non-unique “correct” answers by establishing clear judgment criteria
Experience automating test or evaluation processes
Experience collaborating with development teams to drive quality improvements
Business-level proficiency in both English and Japanese

NICE TO HAVES

Experience in scenario testing and red teaming
Experience evaluating and analyzing user feedback and logs
Proactive mindset with the ability to identify problems independently
Basic understanding of natural language processing, particularly in the context of the Japanese language

=========================================================================

Important Points

・All interviews will be arranged via Google Meet, unless otherwise stated.

・The same job descriptions are available in both English and Japanese; therefore, we kindly ask that you apply to only one version.

・We kindly request that you submit your resume in English, if possible. However, Japanese resumes are also acceptable. Please note that, depending on the English proficiency requirements of the role, we may request an English version of your resume later in the process.

WHAT WE OFFER

・Competitive Salary - Based on experience

・Work Hours - Flexible working time

・Paid Holiday - 20 days per year (prorated)

・Sick Leave - 6 days per year (prorated)

・Holiday - Sat & Sun, Japanese National Holidays, and other days defined by our company

・Japanese Social Insurance - Health Insurance, Pension, Workers’ Comp, and Unemployment Insurance, Long-term care insurance

・Housing Allowance

・Retirement Benefits

・Rental Cars Support

・In-house Training Program (software study/language study)

Our Commitment

・We are an equal opportunity employer and value diversity.

・Any information we receive from you will be used only in the hiring and onboarding process. Please see our privacy notice for more details.

Apply for this job