Chatbot Nuances
From Bard to Gemini: Iterative testing for Google
Case study spotlight
Long-term embedded qualitative research for Bard & Gemini: quick turnaround tactical studies that added up to offer high-level foundational insights as part of Google AIUX, 2023-2024
What I did
I served as an embedded researcher with Google teams working on Bard and Gemini, collaborating with Google quantitative and qualitative UXRs, designers, and product managers. I built moderator guides from research briefs, moderated sessions (in person and remotely) where stakeholders and team leadership observed, and delivered report readouts to multiple branches of the development groups.
Deliverables
~20 studies over about a year: research plans, moderator guides, text and slide-based reports including quotes and video clips as needed.
Impact & Outcomes
Features I tested were often incorporated into the live product within weeks of testing, and many were showcased at Google I/O 2023.
Findings I helped synthesize became part of the foundational insights about the UX of conversational AI shared across all of Google via their internal research repositories and among the teams tasked with integrating AI into their usual product offerings.
Context
When Google released Bard, the earlier version of their conversational AI, they were moving as fast as possible to compete with OpenAI's ChatGPT. The teams working on the Bard experience wanted to be able to get qualitative feedback from users to allow them to quickly iterate through design concepts for everything from the UI to user support to help people explore the breadth and depth of the LLM's capabilities. The explosion of interest and attention on AI meant that the teams were under intense pressure, and Google was running an enormous number of studies simultaneously, including matching qualitative data with usage data to understand the rapidly evolving AI product space. Qualitative user research was able to provide rich insights into how mental models of chatbots and perceptions of "AI" should be addressed as Bard was replaced by the even more advanced Gemini.
Project Objectives & Research Questions
Each study combined some elements of discovery to gage the quickly shifting perceptions of AI and Google's offerings, as well as tactical testing for specific new features. I helped answer key research questions including (study details are generalized to protect IP):
How do users expect to formulate prompts, and what kinds of support works to help them learn what Bard/Gemini is capable of?
How do users prefer answers to be displayed so that information is trustable, scannable, and the right length?
How do users evaluate the quality of the responses, and what aspects of the intereaction affect user satisfaction the most?
How do users want to see Google's ecosystem leverage the powers of LLMs, and how can Bard/Gemini work with apps like Google Maps, Google Travel, and Google Workspace?
Process
Over the course of about 10 months, I was part of a select group from AnswerLab serving Google teams under the Google Assistant umbrella. We ran fast-paced sets of IDIs to allow Google to gather direct user feedback, first for Bard, then as the AI landscape shifted, for Gemini.
Methodology
In-depth-interviews using the live versions of Bard and later, Gemini, static concept mocks, and interactive Figma prototypes. Desktop and mobile experiences included.
Duration
Each study typically took just over two weeks to complete, from a kick off to review the research brief in week one to report delivery after ~2.5 days of sessions in week two.
Participants & Logistics
Average sample: n = 8, range of ages between 18-65, mix of demographics (usually higher-tech literacy, self reported). This program sourced and managed participants through Google’s internal recruiting system (UXI), who I coordinated with directly via internal chat and email. Sessions took place at Google in NYC and remotely via Meet.