Vishesh Sharma's Portfolio

resume

my work

sandbox

about

resume

Understanding Human

Behaviors via AI Agents

Next Project

Understanding Human

Behaviors via AI Agents

Next Project

Childhood Apraxia of Speech (CAS)

Prologue

Can a 3D AI avatar augment speech therapy experience?

This case study explores parents perception of 3D avatars when used for remote speech therapy through mixed methods (user interviews, think-aloud, surveys).

Through thematic analysis we identified the pros and cons of our application (Echo) which were synthesized into design directions.

To learn more about the design of Echo, please refer to the following-

Summary

INTRODUCTION

Limited availability of speech therapists complicates the accessibility of speech therapy for children suffering from motor-speech disorders.

METHODOLOGY

We implemented a 3D avatar to aid the users in visualizing the lip movement and pronunciation, augmenting the learning process.

OBSERVATIONS

86% of the users reported a higher level of engagement when going through our application.

KEY DECISIONS

Pivoting to parents of children with motor speech disorders from children with motor speech disorder. Primary demographic were sensitive to interviews.

MY ROLE

I designed the user research protocol, additionally interviewed 10 users to identify gaps between conventional speech therapy and user expectations.

MY TEAM

1 HCAI practitioner (Me), 3 HCI practitioner, 3 software developers

1 Scoping Domain Space Requirements

Literature Review

We conducted literature review on childhood apraxia of speech (motor-speech disorder), conversational AI, cognitive load via 3D avatars, and augmented reality to gauge the current state of the problem and its proposed solutions.

User Interviews

Initial rounds of interviews involved 2 speech therapists and 2 adults suffering from apraxia of speech (motor-speech disorder). The formative assessment was followed by 8 more interviews (refer to methodology ).

Online Survey

The survey aimed at finding quantitative insights involving user habits, frequency of speech therapy, and their learning preferences.

2 Defining the Problem

Through the identification of domain space requirements, we were able to pick out the gaps in contemporary solutions along with the synthesis of our solution hypothesis.

What is CAS?

Childhood Apraxia of Speech is a congenital motor-speech disorder. It affects the child's ability to learn languages and pronounce correctly.

Identifying the GAP

Once a week

observed frequency of therapy

3-4 times a week

Optimal frequency of therapy

The solution we hypothesized!

A 3D model, which aids the child when receiving speech therapy through mobile-based devices.

Designing an intuitive and engaging remote therapy is crucial when determining the success of the application, Echo.

“How might we develop methods for accessible therapy which supplements the speech therapy.”

3 Our Guiding Assumptions

Identification of domain space requirements helped us in synthesizing the hypothesis which aimed at assessing the effectiveness of our application - Echo.

Hypothesis: We expect that visual cues of the 3D model will aid the users, augmenting speech therapy.

Success metric: Are the users able to follow the cues provided by avatar.

Hypothesis: We expect that the addition of an avatar would make speech therapy engaging.

Success metric: Are the users completing the therapy flow, and willing to practice again.

4 Methodology, User Research

The formative assessment was broken down into the following steps-

User Interviews

During the interviews parents discussed the problems their child face during speech therapy. They were also briefed on the application-Echo, and its use cases.

Think-Aloud

The interviews were followed by a think-aloud, where the parents interacted with the 3D avatar and experienced the speech therapy flow.

Post Study Survey

Post study each parent was requested to fill out a survey which was mapped to a Likert scale, enabling us to gather quantitative and qualitative data.

Participants

We recruited 8 participants through online platforms, to test the first iteration of the speech therapy application, Echo.

Adults diagnosed with Childhood Apraxia of Speech.

Parents of children diagnosed with Childhood Apraxia of Speech.

5 Problems We Faced

Challenge faced: Limited users to test the application

Problem generated

We realize that testing the application with children has legal complications.

Solution synthesized

We switched to parents of children suffering from motor-speech disorder

Challenge faced: Conflicting ideas.

Problem generated

Concern raised on the uncanny appearance of the 3D avatar.

Solution synthesized

I resolved it by adding idle animations like body sway and blinking.

III

Challenge faced: User hesitance for the use of AI in speech therapy.

Problem generated

Latency and async in audio production led to increased user hesitance.

Solution synthesized

I incorporated Convai to reduce latency, improving user experience.

6 Data Analysis

The results of this study are based on qualitative coding shedding light on the hypothesis. Transcripts were generated from 4.3 hours of recording to develop codes, sub-themes, and themes. Please access Apraxia-Coding to learn about insight synthesis.

We observed an increased engagement as evidenced by parent’s feedback, but it was also learned that the animations need some fine-tuning in terms of latency and lip-movement animation.

Emerging Themes

Engagement

Ease of Access

Augmentation

Through several iterations of thematic analysis, we identified the above emerging themes and their corresponding frequency. A priori coding highlighted other features of the application which are described below.

Positive Feedback

The positive feedback is associated with the themes identified as ‘effective visual cueing’ and ‘increased engagement’ improving the accessibility of speech therapy.

Design Directions

The formative assessment paved way towards design directions, guided by key insights in the form of latency issues and avatar limitations condensed into pain points.

7 Results

Improved animation and visual cueing - an interesting finding!

The scenarios mentioned above are not related to each other. Even if the engagement factor was significant when using a 3D avatar it might not always translate to effective cueing. Our updated solution will work best for mild to moderate CAS.

High Engagement

Effective Cueing

How does the integration of a 3D avatar facilitate visual cueing, particularly in terms of real-time feedback, such as lip movement synchronization?

Cueing is limited in the current iteration for sounds/words that do not use lip movement to be initiated for example, -ch and -sh sounds, or the words that look the same when spoken, for example -pa and -ma sounds.

-ch

-sh

Sounds independent of lip movement.

-pa

-ma

Sounds with the same lip movements.

8 Recommendations, Design Directions

Through data analysis and results, we developed the following design directions as recommendations to the application, Echo.

Avatar augmentation: It was observed that the user experience will benefit through a polished implementation of conversational AI, making the avatar more interactive (functionality implemented in the latest iteration).

Child-friendly design: The initial iteration though functional lacked the friendliness and poppiness preferred by children (design changes were implemented in the latest iteration).

Learning through juxtaposition: Many parents suggested leveraging the front camera of the phone to enable the child to see their lip movement in comparison to the avatar, improving the learning experience.

Lessons We Learned

Epilogue

PARTICIPANT RECRUITMENT

Secondary demographic identification is vital to the continuation of the research process, since ideal scenarios are rarely available.

LEARNING VS. ENGAGEMENT

I learned that engagement and learning are two different aspects of the user experience. High engagement does not always translates to learning.

UNDERSTANDING THE UNCANNY

The avatar’s appearance is key when it comes to its acceptance. Uncanny or robotic appearances can deter the users from interacting with it.

resume

resume

Understanding Parents Perception of AI Avatars in Speech Therapy - Mixed Methods

Childhood Apraxia of Speech (CAS)

Prologue

Can a 3D AI avatar augment speech therapy experience?

This case study explores parents perception of 3D avatars when used for remote speech therapy through mixed methods (user interviews, think-aloud, surveys).

Through thematic analysis we identified the pros and cons of our application (Echo) which were synthesized into design directions.

To learn more about the design of Echo, please refer to the following-

Summary

INTRODUCTION

Limited availability of speech therapists complicates the accessibility of speech therapy for children suffering from motor-speech disorders.

METHODOLOGY

We implemented a 3D avatar to aid the users in visualizing the lip movement and pronunciation, augmenting the learning process.

OBSERVATIONS

86% of the users reported a higher level of engagement when going through our application.

KEY DECISIONS

Pivoting to parents of children with motor speech disorders from children with motor speech disorder. Primary demographic were sensitive to interviews.

MY ROLE

I designed the user research protocol, additionally interviewed 10 users to identify gaps between conventional speech therapy and user expectations.

MY TEAM

1 HCAI practitioner (Me), 3 HCI practitioner, 3 software developers

1 Scoping Domain Space Requirements

Literature Review

User Interviews

Online Survey

The survey aimed at finding quantitative insights involving user habits, frequency of speech therapy, and their learning preferences.

2 Defining the Problem

Through the identification of domain space requirements, we were able to pick out the gaps in contemporary solutions along with the synthesis of our solution hypothesis.

What is CAS?

Childhood Apraxia of Speech is a congenital motor-speech disorder. It affects the child's ability to learn languages and pronounce correctly.

Identifying the GAP

Once a week

observed frequency of therapy

3-4 times a week

Optimal frequency of therapy

The solution we hypothesized!

A 3D model, which aids the child when receiving speech therapy through mobile-based devices.

Designing an intuitive and engaging remote therapy is crucial when determining the success of the application, Echo.

“How might we develop methods for accessible therapy which supplements the speech therapy.”

3 Our Guiding Assumptions

Identification of domain space requirements helped us in synthesizing the hypothesis which aimed at assessing the effectiveness of our application - Echo.

Hypothesis: We expect that visual cues of the 3D model will aid the users, augmenting speech therapy.

Success metric: Are the users able to follow the cues provided by avatar.

Hypothesis: We expect that the addition of an avatar would make speech therapy engaging.

Success metric: Are the users completing the therapy flow, and willing to practice again.

4 Methodology, User Research

The formative assessment was broken down into the following steps-

User Interviews

During the interviews parents discussed the problems their child face during speech therapy. They were also briefed on the application-Echo, and its use cases.

Think-Aloud

The interviews were followed by a think-aloud, where the parents interacted with the 3D avatar and experienced the speech therapy flow.

Post Study Survey

Post study each parent was requested to fill out a survey which was mapped to a Likert scale, enabling us to gather quantitative and qualitative data.

Participants

We recruited 8 participants through online platforms, to test the first iteration of the speech therapy application, Echo.

Adults diagnosed with Childhood Apraxia of Speech.

Parents of children diagnosed with Childhood Apraxia of Speech.

5 Problems We Faced

Challenge faced: Limited users to test the application

Problem generated

We realize that testing the application with children has legal complications.

Solution synthesized

We switched to parents of children suffering from motor-speech disorder

Challenge faced: Conflicting ideas.

Problem generated

Concern raised on the uncanny appearance of the 3D avatar.

Solution synthesized

I resolved it by adding idle animations like body sway and blinking.

III

Challenge faced: User hesitance for the use of AI in speech therapy.

Problem generated

Latency and async in audio production led to increased user hesitance.

Solution synthesized

I incorporated Convai to reduce latency, improving user experience.

6 Data Analysis

We observed an increased engagement as evidenced by parent’s feedback, but it was also learned that the animations need some fine-tuning in terms of latency and lip-movement animation.

Emerging Themes

Engagement

Ease of Access

Augmentation

Positive Feedback

The positive feedback is associated with the themes identified as ‘effective visual cueing’ and ‘increased engagement’ improving the accessibility of speech therapy.

Design Directions

The formative assessment paved way towards design directions, guided by key insights in the form of latency issues and avatar limitations condensed into pain points.

7 Results

Improved animation and visual cueing - an interesting finding!

High Engagement

Effective Cueing

How does the integration of a 3D avatar facilitate visual cueing, particularly in terms of real-time feedback, such as lip movement synchronization?

-ch

-sh

Sounds independent of lip movement.

-pa

-ma

Sounds with the same lip movements.

8 Recommendations, Design Directions

Through data analysis and results, we developed the following design directions as recommendations to the application, Echo.

Child-friendly design: The initial iteration though functional lacked the friendliness and poppiness preferred by children (design changes were implemented in the latest iteration).

Lessons We Learned

Epilogue

PARTICIPANT RECRUITMENT

Secondary demographic identification is vital to the continuation of the research process, since ideal scenarios are rarely available.

LEARNING VS. ENGAGEMENT

I learned that engagement and learning are two different aspects of the user experience. High engagement does not always translates to learning.

UNDERSTANDING THE UNCANNY

The avatar’s appearance is key when it comes to its acceptance. Uncanny or robotic appearances can deter the users from interacting with it.

Understanding Human

Behaviors via AI Agents

Next Project