Voice assistant technology is in danger of trying to be too human

Leigh Clark, Swansea University and Benjamin Cowan, University College Dublin

More than 200m homes now have a smart speaker providing voice-controlled access to the internet, according to one global estimate. Add this to the talking virtual assistants installed on many smartphones, not to mention kitchen appliances and cars, and that’s a lot of Alexas and Siris.

Because talking is a fundamental part of being human, it is tempting to think these assistants should be designed to talk and behave like us. While this would give us a relatable way to interact with our devices, replicating genuinely realistic human conversations is incredibly difficult. What’s more, research suggests making a machine sound human may be unnecessary and even dishonest. Instead, we might need to rethink how and why we interact with these assistants and learn to embrace the benefits of them being a machine.

Speech technology designers often talk about the concept of “humanness”. Recent developments in artificial voice development have resulted in these systems’ voices blurring the line between human and machine, sounding increasingly humanlike. There have also been efforts to make the language of these interfaces appear more human.

Perhaps the most famous is Google Duplex, a service that can book appointments over the phone. To add to the human-like nature of the system, Google included utterances like “hmm” and “uh” to its assistant’s speech output – sounds we commonly use to signal we are listening to the conversation or that we intend to start speaking soon. In the case of Google Duplex, these were used with the aim of sounding natural. But why is sounding natural or more human-like so important?

Chasing this goal of making systems sound and behave like us perhaps stems from pop culture inspirations we use to fuel the design of these systems. The idea of talking to machines has fascinated us in literature, television and film for decades, through characters such HAL 9000 in 2001: A Space Odyssey or Samantha in Her. These characters portray seamless conversations with machines. In the case of Her, there is even a love story between an operating system and its user. Critically, all these machines sound and respond the way we think humans would.

We need to remember virtual assistants aren’t human. Phonlamai Photo/Shutterstock

There are interesting technological challenges in trying to achieve something resembling conversations between us and machines. To this end, Amazon has recently launched the Alexa Prize, looking to “create socialbots that can converse coherently and engagingly with humans on a range of current events and popular topics such as entertainment, sports, politics, technology, and fashion”. The current round of competition asks teams to produce a 20-minute conversation between one of these bots and a human interactor.

These grand challenges, like others across science, clearly advance the state of the art, bringing planned and unplanned benefits. Yet when striving to give machines the ability to truly converse with us like other human beings, we need to think about what our spoken interactions with people are actually for and whether this is the same as the type of conversation we want to have with machines.

We converse with other people to get stuff done and to build and maintain relationships with one another – and often these two purposes intertwine. Yet people see machines as tools serving limited purposes and hold little appetite for building the kind of relationships with machines that we do every day with other people.

Pursuing natural conversations with machines that sound like us can become an unnecessary and burdensome objective. It creates unrealistic expectations of systems that can actually communicate and understand like us. Anyone who has interacted with an Amazon Echo or Google Home knows this is not possible with existing systems.

This matters as people need to have an idea of how to get a system to do things which, because voice-only interfaces have limited buttons and visuals, are guided significantly by what the system says and how it says it. The importance of interface design means humanness itself may not only be questionable but deceptive, especially if used to fool people into thinking they are interacting with another person. Even if their intent may be to create intelligible voices, tech companies need to consider the potential impact on users.

Looking beyond humanness

Rather than consistently embracing humanness, we can accept that there may be fundamental limits, both technological and philosophical, to the types of interactions we can and want to have with machines.

We should be inspired by human conversations rather than using them as a perceived gold standard for interaction. For instance, looking at these systems as performers rather than human-like conversationalists, may be one way to help to create more engaging and expressive interfaces. Incorporating specific elements of conversation may be necessary for some contexts, but we need to think about whether human-like conversational interaction is necessary, rather than using it as a default design goal.

It is hard to predict what technology will be like in the future and how social perceptions will change and develop around our devices. Maybe people will be ok with having conversations with machines, becoming friends with robots and seeking their advice.

But we are currently sceptical of this. In our view it is all to do with context. Not all interactions and interfaces are the same. Some speech technology may be required to establish and foster some form of social or emotional bond, such as in specific healthcare applications. If that is the aim, then it makes sense to have machines converse more appropriately for that purpose – perhaps sounding human so the user gets the right type of expectations.

Yet this is not universally needed. Crucially, this human-likeness should link to what the systems can actually do with conversation. Making systems that do not have the ability to converse like a human sound human may do far more harm than good.

Leigh Clark, Lecturer in Computer Science, Swansea University and Benjamin Cowan, Assistant Professor, School of Information & Communication Studies, University College Dublin

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Don’t just blame YouTube’s algorithms for ‘radicalisation’. Humans also play a part

YouTube uses a wide range of tools to keep viewers watching. Shutterstock

Ariadna Matamoros-Fernández, Queensland University of Technology and Joanne Gray, Queensland University of Technology

This is the second article in a series looking at the attention economy and how online content gets in front of your eyeballs. Read part 1 here.


People watch more than a billion hours of video on YouTube every day. Over the past few years, the video sharing platform has come under fire for its role in spreading and amplifying extreme views.

YouTube’s video recommendation system, in particular, has been criticised for radicalising young people and steering viewers down rabbit holes of disturbing content.

The company claims it is trying to avoid amplifying problematic content. But research from YouTube’s parent company, Google, indicates this is far from straightforward, given the commercial pressure to keep users engaged via ever more stimulating content.

But how do YouTube’s recommendation algorithms actually work? And how much are they really to blame for the problems of radicalisation?

The fetishisation of algorithms

Almost everything we see online is heavily curated. Algorithms decide what to show us in Google’s search results, Apple News, Twitter trends, Netflix recommendations, Facebook’s newsfeed, and even pre-sorted or spam-filtered emails. And that’s before you get to advertising.

More often than not, these systems decide what to show us based on their idea of what we are like. They also use information such as what our friends are doing and what content is newest, as well as built-in randomness. All this makes it hard to reverse-engineer algorithmic outcomes to see how they came about.

Algorithms take all the relevant data they have and process it to achieve a goal – often one that involves influencing users’ behaviour, such as selling us products or keeping us engaged with an app or website.

At YouTube, the “up next” feature is the one that receives most attention, but other algorithms are just as important, including search result rankings, homepage video recommendations, and trending video lists.


How YouTube recommends content

The main goal of the YouTube recommendation system is to keep us watching. And the system works: it is responsible for more than 70% of the time users spend watching videos.

When a user watches a video on YouTube, the “up next” sidebar shows videos that are related but usually longer and more popular. These videos are ranked according to the user’s history and context, and newer videos are generally preferenced.

This is where we run into trouble. If more watching time is the central objective, the recommendation algorithm will tend to favour videos that are new, engaging and provocative.

Yet algorithms are just pieces of the vast and complex sociotechnical system that is YouTube, and there is so far little empirical evidence on their role in processes of radicalisation.

In fact, recent research suggests that instead of thinking about algorithms alone, we should look at how they interact with community behaviour to determine what users see.

The importance of communities on YouTube

YouTube is a quasi-public space containing all kinds of videos: from musical clips, TV shows and films, to vernacular genres such as “how to” tutorials, parodies, and compilations. User communities that create their own videos and use the site as a social network have played an important role on YouTube since its beginning.

Today, these communities exist alongside commercial creators who use the platform to build personal brands. Some of these are far-right figures who have found in YouTube a home to push their agendas.

It is unlikely that algorithms alone are to blame for the radicalisation of a previously “moderate audience” on YouTube. Instead, research suggests these radicalised audiences existed all along.


Content creators are not passive participants in the algorithmic systems. They understand how the algorithms work and are constantly improving their tactics to get their videos recommended.

Right-wing content creators also know YouTube’s policies well. Their videos are often “borderline” content: they can be interpreted in different ways by different viewers.

YouTube’s community guidelines restrict blatantly harmful content such as hate speech and violence. But it’s much harder to police content in the grey areas between jokes and bullying, religious doctrine and hate speech, or sarcasm and a call to arms.

Moving forward: a cultural shift

There is no magical technical solution to political radicalisation. YouTube is working to minimise the spread of borderline problematic content (for example, conspiracy theories) by reducing their recommendations of videos that can potentially misinform users.

However, YouTube is a company and it’s out to make a profit. It will always prioritise its commercial interests. We should be wary of relying on technological fixes by private companies to solve society’s problems. Plus, quick responses to “fix” these issues might also introduce harms to politically edgy (activists) and minority (such as sexuality-related or LGBTQ) communities.

When we try to understand YouTube, we should take into account the different factors involved in algorithmic outcomes. This includes systematic, long-term analysis of what algorithms do, but also how they combine with YouTube’s prominent subcultures, their role in political polarisation, and their tactics for managing visibility on the platform.

Before YouTube can implement adequate measures to minimise the spread of harmful content, it must first understand what cultural norms are thriving on their site – and being amplified by their algorithms.


The authors would like to acknowledge that the ideas presented in this article are the result of ongoing collaborative research on YouTube with researchers Jean Burgess, Nicolas Suzor, Bernhard Rieder, and Oscar Coromina.

Ariadna Matamoros-Fernández, Lecturer in Digital Media at the School of Communication, Queensland University of Technology and Joanne Gray, Lecturer in Creative Industries, Queensland University of Technology

This article is republished from The Conversation under a Creative Commons license. Read the original article.