
Understanding Parents Perception of AI Avatars in Speech Therapy - Mixed Methods

Understanding Parents Perception of AI Avatars in Speech Therapy - Mixed Methods

Understanding Human
Behaviors via AI Agents
Next Project

Understanding Human
Behaviors via AI Agents
Next Project
Childhood Apraxia of Speech (CAS)
Prologue
Can a 3D AI avatar augment speech therapy experience?
This case study explores parents perception of 3D avatars when used for remote speech therapy through mixed methods (user interviews, think-aloud, surveys).
Through thematic analysis we identified the pros and cons of our application (Echo) which were synthesized into design directions.
To learn more about the design of Echo, please refer to the following-
Summary
INTRODUCTION
Limited availability of speech therapists complicates the accessibility of speech therapy for children suffering from motor-speech disorders.
METHODOLOGY
We implemented a 3D avatar to aid the users in visualizing the lip movement and pronunciation, augmenting the learning process.
OBSERVATIONS
86% of the users reported a higher level of engagement when going through our application.
KEY DECISIONS
Pivoting to parents of children with motor speech disorders from children with motor speech disorder. Primary demographic were sensitive to interviews.
MY ROLE
I designed the user research protocol, additionally interviewed 10 users to identify gaps between conventional speech therapy and user expectations.
MY TEAM
1 HCAI practitioner (Me), 3 HCI practitioner, 3 software developers
1 Scoping Domain Space Requirements
54
Literature Review
We conducted literature review on childhood apraxia of speech (motor-speech disorder), conversational AI, cognitive load via 3D avatars, and augmented reality to gauge the current state of the problem and its proposed solutions.
04
User Interviews
Initial rounds of interviews involved 2 speech therapists and 2 adults suffering from apraxia of speech (motor-speech disorder). The formative assessment was followed by 8 more interviews (refer to methodology ).
24
Online Survey
The survey aimed at finding quantitative insights involving user habits, frequency of speech therapy, and their learning preferences.
2 Defining the Problem
Through the identification of domain space requirements, we were able to pick out the gaps in contemporary solutions along with the synthesis of our solution hypothesis.
What is CAS?
Childhood Apraxia of Speech is a congenital motor-speech disorder. It affects the child's ability to learn languages and pronounce correctly.


Identifying the GAP
Once a week
observed frequency of therapy
3-4 times a week
Optimal frequency of therapy
The solution we hypothesized!
A 3D model, which aids the child when receiving speech therapy through mobile-based devices.
Designing an intuitive and engaging remote therapy is crucial when determining the success of the application, Echo.


“How might we develop methods for accessible therapy which supplements the speech therapy.”
3 Our Guiding Assumptions
Identification of domain space requirements helped us in synthesizing the hypothesis which aimed at assessing the effectiveness of our application - Echo.
I
Hypothesis: We expect that visual cues of the 3D model will aid the users, augmenting speech therapy.
Success metric: Are the users able to follow the cues provided by avatar.
II
Hypothesis: We expect that the addition of an avatar would make speech therapy engaging.
Success metric: Are the users completing the therapy flow, and willing to practice again.
4 Methodology, User Research
The formative assessment was broken down into the following steps-
User Interviews
During the interviews parents discussed the problems their child face during speech therapy. They were also briefed on the application-Echo, and its use cases.


Think-Aloud
The interviews were followed by a think-aloud, where the parents interacted with the 3D avatar and experienced the speech therapy flow.


Post Study Survey
Post study each parent was requested to fill out a survey which was mapped to a Likert scale, enabling us to gather quantitative and qualitative data.


Participants
We recruited 8 participants through online platforms, to test the first iteration of the speech therapy application, Echo.
4
Adults diagnosed with Childhood Apraxia of Speech.
4
Parents of children diagnosed with Childhood Apraxia of Speech.
5 Problems We Faced
I
Challenge faced: Limited users to test the application
Problem generated
We realize that testing the application with children has legal complications.
Solution synthesized
We switched to parents of children suffering from motor-speech disorder
II
Challenge faced: Conflicting ideas.
Problem generated
Concern raised on the uncanny appearance of the 3D avatar.
Solution synthesized
I resolved it by adding idle animations like body sway and blinking.
III
Challenge faced: User hesitance for the use of AI in speech therapy.
Problem generated
Latency and async in audio production led to increased user hesitance.
Solution synthesized
I incorporated Convai to reduce latency, improving user experience.
6 Data Analysis


The results of this study are based on qualitative coding shedding light on the hypothesis. Transcripts were generated from 4.3 hours of recording to develop codes, sub-themes, and themes. Please access Apraxia-Coding to learn about insight synthesis.
We observed an increased engagement as evidenced by parent’s feedback, but it was also learned that the animations need some fine-tuning in terms of latency and lip-movement animation.
Emerging Themes
15
Engagement
11
Ease of Access
09
Augmentation
Through several iterations of thematic analysis, we identified the above emerging themes and their corresponding frequency. A priori coding highlighted other features of the application which are described below.
Positive Feedback
The positive feedback is associated with the themes identified as ‘effective visual cueing’ and ‘increased engagement’ improving the accessibility of speech therapy.


Design Directions
The formative assessment paved way towards design directions, guided by key insights in the form of latency issues and avatar limitations condensed into pain points.


7 Results
Improved animation and visual cueing - an interesting finding!
The scenarios mentioned above are not related to each other. Even if the engagement factor was significant when using a 3D avatar it might not always translate to effective cueing. Our updated solution will work best for mild to moderate CAS.
High Engagement
Effective Cueing


How does the integration of a 3D avatar facilitate visual cueing, particularly in terms of real-time feedback, such as lip movement synchronization?
Cueing is limited in the current iteration for sounds/words that do not use lip movement to be initiated for example, -ch and -sh sounds, or the words that look the same when spoken, for example -pa and -ma sounds.
-ch
-sh
Sounds independent of lip movement.
-pa
-ma
Sounds with the same lip movements.


8 Recommendations, Design Directions
Through data analysis and results, we developed the following design directions as recommendations to the application, Echo.
Avatar augmentation: It was observed that the user experience will benefit through a polished implementation of conversational AI, making the avatar more interactive (functionality implemented in the latest iteration).
Child-friendly design: The initial iteration though functional lacked the friendliness and poppiness preferred by children (design changes were implemented in the latest iteration).
Learning through juxtaposition: Many parents suggested leveraging the front camera of the phone to enable the child to see their lip movement in comparison to the avatar, improving the learning experience.
Lessons We Learned
Epilogue


PARTICIPANT RECRUITMENT
Secondary demographic identification is vital to the continuation of the research process, since ideal scenarios are rarely available.
LEARNING VS. ENGAGEMENT
I learned that engagement and learning are two different aspects of the user experience. High engagement does not always translates to learning.
UNDERSTANDING THE UNCANNY
The avatar’s appearance is key when it comes to its acceptance. Uncanny or robotic appearances can deter the users from interacting with it.

Understanding Parents Perception of AI Avatars in Speech Therapy - Mixed Methods
Childhood Apraxia of Speech (CAS)
Prologue
Can a 3D AI avatar augment speech therapy experience?
This case study explores parents perception of 3D avatars when used for remote speech therapy through mixed methods (user interviews, think-aloud, surveys).
Through thematic analysis we identified the pros and cons of our application (Echo) which were synthesized into design directions.
To learn more about the design of Echo, please refer to the following-
Summary
INTRODUCTION
Limited availability of speech therapists complicates the accessibility of speech therapy for children suffering from motor-speech disorders.
METHODOLOGY
We implemented a 3D avatar to aid the users in visualizing the lip movement and pronunciation, augmenting the learning process.
OBSERVATIONS
86% of the users reported a higher level of engagement when going through our application.
KEY DECISIONS
Pivoting to parents of children with motor speech disorders from children with motor speech disorder. Primary demographic were sensitive to interviews.
MY ROLE
I designed the user research protocol, additionally interviewed 10 users to identify gaps between conventional speech therapy and user expectations.
MY TEAM
1 HCAI practitioner (Me), 3 HCI practitioner, 3 software developers
1 Scoping Domain Space Requirements
54
Literature Review
We conducted literature review on childhood apraxia of speech (motor-speech disorder), conversational AI, cognitive load via 3D avatars, and augmented reality to gauge the current state of the problem and its proposed solutions.
04
User Interviews
Initial rounds of interviews involved 2 speech therapists and 2 adults suffering from apraxia of speech (motor-speech disorder). The formative assessment was followed by 8 more interviews (refer to methodology ).
24
Online Survey
The survey aimed at finding quantitative insights involving user habits, frequency of speech therapy, and their learning preferences.
2 Defining the Problem
Through the identification of domain space requirements, we were able to pick out the gaps in contemporary solutions along with the synthesis of our solution hypothesis.
What is CAS?
Childhood Apraxia of Speech is a congenital motor-speech disorder. It affects the child's ability to learn languages and pronounce correctly.

Identifying the GAP
Once a week
observed frequency of therapy
3-4 times a week
Optimal frequency of therapy
The solution we hypothesized!
A 3D model, which aids the child when receiving speech therapy through mobile-based devices.
Designing an intuitive and engaging remote therapy is crucial when determining the success of the application, Echo.

“How might we develop methods for accessible therapy which supplements the speech therapy.”
3 Our Guiding Assumptions
Identification of domain space requirements helped us in synthesizing the hypothesis which aimed at assessing the effectiveness of our application - Echo.
I
Hypothesis: We expect that visual cues of the 3D model will aid the users, augmenting speech therapy.
Success metric: Are the users able to follow the cues provided by avatar.
II
Hypothesis: We expect that the addition of an avatar would make speech therapy engaging.
Success metric: Are the users completing the therapy flow, and willing to practice again.
4 Methodology, User Research
The formative assessment was broken down into the following steps-
User Interviews
During the interviews parents discussed the problems their child face during speech therapy. They were also briefed on the application-Echo, and its use cases.

Think-Aloud
The interviews were followed by a think-aloud, where the parents interacted with the 3D avatar and experienced the speech therapy flow.

Post Study Survey
Post study each parent was requested to fill out a survey which was mapped to a Likert scale, enabling us to gather quantitative and qualitative data.

Participants
We recruited 8 participants through online platforms, to test the first iteration of the speech therapy application, Echo.
4
Adults diagnosed with Childhood Apraxia of Speech.
4
Parents of children diagnosed with Childhood Apraxia of Speech.
5 Problems We Faced
I
Challenge faced: Limited users to test the application
Problem generated
We realize that testing the application with children has legal complications.
Solution synthesized
We switched to parents of children suffering from motor-speech disorder
II
Challenge faced: Conflicting ideas.
Problem generated
Concern raised on the uncanny appearance of the 3D avatar.
Solution synthesized
I resolved it by adding idle animations like body sway and blinking.
III
Challenge faced: User hesitance for the use of AI in speech therapy.
Problem generated
Latency and async in audio production led to increased user hesitance.
Solution synthesized
I incorporated Convai to reduce latency, improving user experience.
6 Data Analysis

The results of this study are based on qualitative coding shedding light on the hypothesis. Transcripts were generated from 4.3 hours of recording to develop codes, sub-themes, and themes. Please access Apraxia-Coding to learn about insight synthesis.
We observed an increased engagement as evidenced by parent’s feedback, but it was also learned that the animations need some fine-tuning in terms of latency and lip-movement animation.
Emerging Themes
15
Engagement
11
Ease of Access
09
Augmentation
Through several iterations of thematic analysis, we identified the above emerging themes and their corresponding frequency. A priori coding highlighted other features of the application which are described below.
Positive Feedback
The positive feedback is associated with the themes identified as ‘effective visual cueing’ and ‘increased engagement’ improving the accessibility of speech therapy.

Design Directions
The formative assessment paved way towards design directions, guided by key insights in the form of latency issues and avatar limitations condensed into pain points.

7 Results
Improved animation and visual cueing - an interesting finding!
The scenarios mentioned above are not related to each other. Even if the engagement factor was significant when using a 3D avatar it might not always translate to effective cueing. Our updated solution will work best for mild to moderate CAS.
High Engagement
Effective Cueing

How does the integration of a 3D avatar facilitate visual cueing, particularly in terms of real-time feedback, such as lip movement synchronization?
Cueing is limited in the current iteration for sounds/words that do not use lip movement to be initiated for example, -ch and -sh sounds, or the words that look the same when spoken, for example -pa and -ma sounds.
-ch
-sh
Sounds independent of lip movement.
-pa
-ma
Sounds with the same lip movements.

8 Recommendations, Design Directions
Through data analysis and results, we developed the following design directions as recommendations to the application, Echo.
Avatar augmentation: It was observed that the user experience will benefit through a polished implementation of conversational AI, making the avatar more interactive (functionality implemented in the latest iteration).
Child-friendly design: The initial iteration though functional lacked the friendliness and poppiness preferred by children (design changes were implemented in the latest iteration).
Learning through juxtaposition: Many parents suggested leveraging the front camera of the phone to enable the child to see their lip movement in comparison to the avatar, improving the learning experience.
Lessons We Learned
Epilogue

PARTICIPANT RECRUITMENT
Secondary demographic identification is vital to the continuation of the research process, since ideal scenarios are rarely available.
LEARNING VS. ENGAGEMENT
I learned that engagement and learning are two different aspects of the user experience. High engagement does not always translates to learning.
UNDERSTANDING THE UNCANNY
The avatar’s appearance is key when it comes to its acceptance. Uncanny or robotic appearances can deter the users from interacting with it.

Understanding Human
Behaviors via AI Agents
Next Project