AI tools that modify speech have downsides

Carolyn Geason-Beissel/MIT SMR | Getty Images

At the annual TED conference earlier this year, the room went silent for a demonstration of a potentially revolutionary technology: a “cone of silence” tool, now in development, designed to block out any ambient noise so speakers can hear only each other’s voice. . Although many of us were impressed by the simulation of a “quiet table” in a crowded restaurant, I could only wonder to what extent each speaker’s voice might be lost along the way.

It’s not the only technology that “cleans up” audio involving human speech. Artificial intelligence applications are being developed to soften accents of call center employees to increase sales, an idea that critics have called dehumanizing. Other technologies reduce tones of anger upset callers, with the aim of making the experience of call center workers less difficult.

It may only be a matter of time before similar tools are used in everyday workplace interactions. Businesses are increasingly distributed and global, with employees working remotely or in satellite offices. Those with different accents or less proficient in English may be tempted, or even encouraged, to use these tools when speaking with colleagues or presenting at a meeting. And people who don’t want to put up with angry co-workers can tap a filter on their phone to eliminate that pesky emotion.

THE potential benefits The reasons for using these tools seem clear: speed up communication, avoid misunderstandings, avoid the prejudices that some people have with different accents, and much more. But these same tools also threaten to silence crucial parts of the very voices they aim to make heard.

The challenge of fostering trust

Because I’m focused on reading the room to uncover the truth, I’m closely observing how this technological revolution might affect our ability to recognize dishonesty and what that might mean for workplaces.

This change raises significant concerns. To understand why, we must recognize the complexities of human speech. When we speak, we convey much more than just the meaning of our words. Our speech patterns, pitches, vocal tones, cadences, and other vocal mannerisms help shape how others perceive us. They can also provide insight into whether we are telling the truth.

In my book Observation of liesI shared research on several crucial auditory cues. For example, when people lie, their speech sometimes takes on a strained or strained quality. Their vocal volume may also change, sometimes due to nervousness. People can also control their voices too much: a liar often tries to control their body, becoming unnaturally still, and their voice may follow suit, becoming more monotone. Sometimes a person’s voice also takes on a pleading quality, as if imploring the listener to believe them. All this can happen unconsciously.

The same technologies that can modify speech or filter unwanted sounds can also suppress these vital signals. They can flatten the depth of communication, turning a context-rich conversation into lifeless robotic noise, erasing the subtle nuances that make interactions human. These clues of deception may disappear in the process.

When the speech we hear from someone is partially falsified by technology, we may be less likely to develop trusting bonds with that person.

No auditory signal is informative in itself. When I train government agencies, fraud investigators, business leaders, and others, I teach them how to “reference” people. This requires them to observe people to detect the subtleties of how they communicate and behave normally, in order to establish a reliable reference point for measuring alert changes later.

This same process of careful observation also encourages stronger connections. Just as we can feel dishonesty, we can feel honesty. The same signals that sometimes unconsciously alert us to problems can also promote connection. When the speech we hear from someone is partially falsified by technology, we are less likely to develop trusting bonds with that person, even if we consciously know that person is using sound enhancement technology.

Audio deepfakes are already wreaking havoc in certain sectors of society. Robocalls from politicians’ voices can put elections at risk. Legal issues arise as music fans use AI to simulate their favorite artists performing other artists’ songs. Tools that manipulate speech could also blur the line between truth and falsehood, to the point where we no longer know how someone actually expresses themselves.

Four Tips for Maintaining Connection

Honesty is a fundamental principle of good leadership and a strong company culture. To signal honesty, leaders, managers and employees at all levels must communicate authentically. This means that we work better and create stronger relationships when we show up in our somewhat messy ways – not only in how we show up, but also in how we speak.

Leaders rely on the full spectrum of human communication to gauge team morale, resolve problems, and build authentic connections. Removing accents and emotional nuances, as well as slight noises in employee speech that technologies might read as background noise, could encourage homogenized dialogue that fails to capture the true situation. Employees may have less trust in their leaders, finding them disconnected, stoic or insincere, weakening the cohesion and effectiveness of the entire organization.

To meet this challenge, I recommend these actions:

Distinguish between transactional and interpersonal communication. Every day in our workplaces we have all kinds of brief exchanges with colleagues, managers and reports. Usually we need to get or share information quickly, such as when something will be completed, if someone needs more resources, or what a customer said during a phone call. But at other times we need to discuss ideas, perspectives, and experiences. Organizations should make it clear that if a conversation fits the latter description, it is interpersonal rather than simply transactional. In these cases, it is appropriate to use natural voices, without technological filtering, so that we can establish a more global connection.

Removing accents and emotional nuances could encourage a homogenized dialogue that fails to reflect the true situation.

Increase in-person interactions. Trust is built faster when we hear and see each other simultaneously. We naturally pick up on body language and facial expressions, including micro-expressions — fleeting signs of emotion that can last a fraction of a second. Organizations can foster connections by encouraging real-world interactions, especially in shared spaces for in-person and hybrid work. Remote workers can gain similar trust through one-on-one video meetings focused on getting to know each other, such as peer coaching sessions.

Encourage storytelling as a company standard. When someone tells a story, they offer a full range of vocal expressions. Emphasizing personal stories and reflections can counter the homogenizing effects of AI-filtered speech. During conference calls, all-hands meetings (whether in person or via video), team discussions, and group retreats, invite staff members to share a recent experience that they think might provide insight of the organization.

Track emotional engagement. If your organization begins using technologies that filter conversations, closely monitor whether emotional engagement begins to wane by using tools or surveys to track how engaged people feel in their work. Certainly, any individual change may be a matter of correlation rather than causation. But if emotional engagement continually declines as these filtering technologies become more normalized, that could be a strong sign that action is needed.

Ultimately, businesses must keep in mind that relationships are key to building a successful “team” mentality. Relationships are built on trust, which in turn is built on feeling and perceiving authenticity. Nothing builds that authenticity more than people speaking in their own voices – accents, quirks and all.

About the author

Pamela Meyer is CEO of Calibrate, a deception detection and insider threat mitigation consulting firm. His TED talk “How to Spot a Liar” is one of the 20 most popular of all time.

Physical AI does not replace farmers. It keeps them active

Supply Chain Risk and Resilience

How AI is helping Fonterra work differently within the cooperative

AI tools that modify speech have downsides

HII partners with GrayMatter Robotics to integrate physical AI into manned and unmanned shipbuilding

To succeed with AI, you need to master the basics

Retailers are integrating AI into their stores in different ways

Physical AI does not replace farmers. It keeps them active

Supply Chain Risk and Resilience

How AI is helping Fonterra work differently within the cooperative

How supply chain disruptions are reshaping the future of startups

AI tools that modify speech have downsides

Get updates on leadership with AI and data Get monthly insights into how artificial intelligence is impacting your organization and what it means for your business and your customers. Please provide a valid email address Thank you for registering Privacy Policy

The challenge of fostering trust

Four Tips for Maintaining Connection

About the author

Related Posts

Subscribe to Updates

Get updates on leadership with AI and data

Get monthly insights into how artificial intelligence is impacting your organization and what it means for your business and your customers.

Please provide a valid email address

Thank you for registering

Privacy Policy