Listen to this post: [Podcast audio is at the bottom of the post.]
Following my 2019 episode about radio and disabled voices, today’s episode is about the way people describe synthesized voices. My guest, endever*, isn’t an audio describer, and we don’t get into Audio Description (AD) per se. But AD is what led me to talk to them.
People who use Audio Description have a lot of nuanced reasons they prefer some voices and don’t like other voices as much. Every person’s opinions are unique and valuable. And just to be clear, today’s episode is not advocating for specific voices to be part of AD or not part of AD. I really wanna talk about ableism. Because in people’s attempts to discuss why they don’t like artificial intelligence or synthesized voices for AD, I hear some ableist lines that might be painful when they trickle down to someone who uses a synthesized voice as their main form of communication in regular life with AAC (Augmentative and Alternative Communication). I wanna take a step back to explore if we can talk about inclusive practices like AD for specific groups without talking poorly about other groups. So, it’s less about Audio Description and more about scratching the surface of the complexities of synthesized voices and autistic voices and gendered voices and racialized voices and well, voices.
[Image Description: A screenshot of endever*’s AAC interface on iPad. There is a grid filling the screen. Most boxes in the grid contain one word plus a drawing or symbol that matches the word. Some boxes contain folders with topics. Toward the bottom is a QWERTY keyboard and an arrow pointing to two more options on the next page. At the top of the screen is a sentence shown both in words and their corresponding symbols or pictures: “This voice is part of me! It is part of who I am!”]
Download a transcript of Pigeonhole Podcast episode 34.
Transcript
Pigeonhole Episode 34
[bright ambient music]
CHORUS OF VOICES: Pigeonholed, pigeonhole, pigeonhole, pigeonhole, pigeonhole, pigeonhole, pigeonhole, pigeonhole.
CHERYL NARRATING: This episode isn’t about audio description. It’s about audio description and ableism. It’s about how we talk about voices in audio description.
[ambient music slowly fades out]
ENDEVER*: This voice is part of me! It’s part of who I am. It’s a chunk of technology (that I don’t even understand the workings of) that hangs at my side with stickers and pins and ribbons, and I can no longer imagine going without it. In my night dreams, sign language and this AAC are how I communicate, not usually mouthwords.
[bouncy electronica]
CHERYL NARRATING: That’s endever*. Today, I’m talking with endever* about their voice. We will get to that delicious term they used, “mouthwords,” but we’re starting with the synthesized voice that I hear coming from endever*’s iPad and the term, “AAC,” which stands for Alternative and Augmentative Communication.
[electronica music fades out]
ENDEVER*: Well, I think one of the coolest things someone said was someone who had heard my mouthwords as well as my AAC voice said, “how did you get that to sound so much like you?” My mouthwords aren’t me – but I know what they meant and it really felt like a compliment somehow that they thought they sounded similar.
Then, I’ve also heard people say things that very much didn’t feel like compliments. For example, someone laughing and saying it sounded like a robot, trying to imitate the tone and move their arms in a way they thought of as robotic. Again, I suspect I know what they meant, but what came out sounded something like, “that’s not a real person’s voice”. Like, they weren’t going for sentient cyborgness, but straight up “you are not a person”. And I think it was their amusement that really got to me. I should give them some slack because they are also disabled and I expect it was the first time they had interacted with an AAC user, but the fact that the only reference they had for my voice was along the lines of science fiction automata just reinforced my relative standing in society.
[softly pulsing electronica]
CHERYL NARRATING: Sentient cyborgness: It’s this amazing way endever* blasts apart a binary that I keep hearing in conversations about what makes a good audio description voice. It’s this idea: The best audio description is done by a human, and audio description that’s done by an artificial intelligence or computer-generated voice is inherently undesirable. Again, we’re not talking about what makes good audio description today. There are other podcasts out there doing that work. Here, it’s more about the consequences of how we talk about voices and audio description. [electronica fades out]
ENDEVER*: My name is endever*, my pronouns are they/them/theirs or xe/xem/xyrs, and I’m a mostly non-speaking autistic. I’m also trans and queer, which, those totally connect to my neurotype personally. I positively reclaim the word crazy to refer to my mental illnesses. Also, I am growing into a sense of cyborg identity recently. I’ve had my AAC system for a while now, so that’s clearly a blend of human-tech interaction acting as a core function of my life I’ve gotten used to. But I also got low gain hearing aids a year ago that are programmed to help with my auditory processing issues, so that feels like it’s just adding one more layer of human-tech interaction to my day to day life. My autistic brain-voice is supplemented by my iPad, and my autistic brain-ears are supplemented by my hearing aids. These put together are increasing my sense of cyborgness as a perhaps integral part of who I am. As for an audio description of me, I’m a white enby with blue hair, glasses, and light facial hair sitting on a floral upholstered couch.
CHERYL NARRATING: So, the AAC.
ENDEVER*: Augmentative and alternative communication – basically, anything that supplements or replaces mouthwords. You could think of so many things as AAC that almost everyone uses forms of it every day: facial expressions, handwriting, emails, gestures. But generally people use the term to refer to dedicated AAC systems that disabled people rely on to communicate. I am a multimodal communicator – I use a mixture of sign language, speech, letterboards and other low-tech AAC, my phone, and this communication device you’re hearing now. Right now I have sections of our conversation programmed in ahead of time to make communicating easier. In spontaneous conversation, it’s much slower! People have to be patient – which honestly, many aren’t.
CHERYL NARRATING: Talk about patience. I was sick with COVID-19 during this interview and really struggling, sometimes talking in long, winding sentences and even talking about swirls and circles. Thank goodness for editing. Going back to edit this, I often had no idea what I’d been intending to say to endever*. But as hard as talking was that day, paying attention was twice as hard. And like so much of life during pandemic times, solutions are easy to find in the disabled and chronically ill communities because we’ve been honing them for a long time, long before March 2020.
CHERYL: Typically, it’s hard for me to follow a long answer from someone who’s just speaking. But especially now, with COVID brain, I just like, it’s just a black void of nothingness in there sometimes. I don’t know that we could’ve done this interview today if you had not given me the gift of typing out your answers where I could see them in Google Docs. And so, I’m reading and listening at the same time helps me as the interviewer. It’s just fantastic!
ENDEVER*: I am so glad it works well.
CHERYL: Mm. Yeah, this would be nonsense without it.
[softly pulsing electronica music break]
CHERYL: I am intrigued by how often people say we shouldn’t have AI voices do audio description because they mispronounce things. Listen! I have had directors email me and say, “You said almost all of the names wrong in the credits, and I need you to rerecord them.” Like, what is this thing that mouthvoices don’t do that?
ENDEVER*: Totally. My voice has its quirks, and it doesn’t always pronounce punctuation or inflections the way I intend it to or expect it to. If I don’t want to go into Settings and teach the device how to say a certain word over and over, I will just purposely misspell it. But that’s how my mouthwords voice works too! If my fingers don’t cooperate, my voice can create funny incoherence similar to speakies’ autocorrect experiences.
Oh, another thing about my voice – it’s trans! It’s not perfect, but I put a lot of finickyness into choosing the voice I wanted to use. It’s the only one my app offers that has a mid-range pitch is what was most important to me. It’s mid-range because it’s supposed to emulate a teenage boy, so sometimes it makes me sound young. But at least it doesn’t sound super, super deep or super, super high the way a lot of adult voices do. And I have recently messed with the pitch settings a little bit to see if I can make it even more “me”. This is a very common experience amongst trans AAC users, trying to find and personalize a voice that feels like it fits us. But even though it’s not perfect for me, the fact that it’s mine means it’s trans! And so it’s also cyborg, it’s also autpunk, it’s also crazy. It’s part of who I am.
CHERYL NARRATING: AAC opens up so many avenues for communication for so many people. But people unfamiliar with it sometimes have a lot of unsavory things to say about how it sounds to them, like….
ENDEVER*: Flat, boring, unemotional, unclear, monotonous, toneless, droning, stuff like that. Oh, and there’s the time a random stranger in a coffee shop actually told me my voice was “annoying”.
CHERYL: Like, they just came up to you to announce this to you? What about ways that people describe autistic people’s mouthword voices?
ENDEVER*: Haha, okay, so this is the interesting point. It’s of course not true of every autistic, because autistic traits vary so widely from person to person – but one autistic trait that some people in the community share is that they use mouthwords in a way that other people might describe in the same ways I just listed in the previous question! What’s interesting is that I think there is a little more recognition within the autistic community that those descriptions of autistic mouthwords might be harmful in terms of stigma and negative implications, whether or not one considers them also factual. (And of course, I don’t think the potential harm has occurred to most non-autistic people whatsoever.) Whereas I don’t see people besides AAC users who have thought through the idea that describing our AAC voices that way might also be harmful, whether or not it’s considered also factual.
CHERYL: You’ve used the word “mouthwords,” and I borrowed that right from you. And I enjoy it so much. And I would love to dig into that because it’s so specific, and to me, it challenges this able-normative way that many of us talk about “voice” or even the term “speech generating device.” I would love for you to tell me a little bit more about this term, “mouthwords.”
ENDEVER*: Oh totally! Yeah, I’m seeing it crop up more and more in discussions amongst AAC users, and I think you’re exactly right that it’s challenging abled-normativity. In research literature and parenting books and stuff you see things like “natural speech” and “real voice” and it’s just… please, could you not? So yeah, I don’t know who first came up with the term or anything, but it’s definitely a good alternative to those ways of discussing different forms of communication that end up stigmatizing AAC users. It’s a more neutral description of what most people think a “real voice” is. And honestly, it’s rather nice to have language we are the ones placing on to speakies rather than the other way around. They can go around calling us people with complex communication needs or whatever other euphemism they like that day, and meanwhile we’re like, “okay this is literally just how you talk. You make words. With your mouth. That is what it is called.”
CHERYL: [laughs] Love it!
[bouncy electronica music break]
CHERYL: I work in film audio description where it’s really to describe the key visuals in a film. For me, it’s really about blind and low-vision and disability access. The reason I wanted to talk to you today is because I’ve been going to a lot of panels, thanks to everything being on Zoom now, where people talk about the importance of accessibility for inclusion. And I keep hearing people who create audio description and people who use it who say that they’re against this trend of using AI voices for audio description. Sometimes they’ll describe it as “unpleasant” or “robotic.” And they say that audio description has to be done by real humans, so that it’s pleasant and has emotion and nuance. So, first off, you can address the fact that if you did audio description using AAC, it would still literally be done by a human, or at least a cyborg? And secondly, I wonder if you could talk to me about how it feels to know that people are talking about inclusion but still saying that someone with a voice like yours should never audio describe films.
ENDEVER*: Okay first of all can I just say thanks for your work as an audio describer, because I’ve realized recently audio descriptions are actually really helpful for me as an autistic person too! But I won’t get into that.
So yikes, yeah, I suppose I’m not surprised that you’re hearing that kind of thing – but what a mess. First of all, for what it’s worth, I think this does happen in some form or another quite frequently in disabled communities and people adjacent to them. People try to talk about accessibility for one group of disabled people while completely ignoring what that means for accessibility for another group. And of course some of these situations come down to conflicting access needs in which both are valid, and you have to really sit down in community with each other and work out creative, equitable solutions. But too often one of the sets of access needs is prioritized without looking to how that harms the other group who then loses access. Or one of the sets of demands is being framed as an access need but maybe actually just represents stigma against the other group of people? I don’t know.
But also, the idea that speaking people convey more emotion is such an allistic (that means non-autistic) thing for them to think! Like, in autistic conversations, we tend to convey emotion more directly. An example is, I don’t know, we might say the actual words “I’m pissed off by that” – rather than using obfuscating language paired with mysterious ways of trying to get that message across… body language, subtleties of word choice or tone or inflection, facial expressions, whatever. To me, those are the things that don’t clearly convey emotion or meaning! It often baffles me that allistic people don’t just say what they mean, but they think we’re the defective ones because we have a hard time understanding the ridiculously confusing cues that lay under their words. But of course they’re using an entirely different set of standards that prioritizes abled norms of communication, perpetuating stigma against the way people like me express ourselves.
Like you said, whether or not my voice sounds like AI or not, it’s coming from me! Even people who use low tech AAC methods that skeptics claim are being authored by support people, no – assume their communication is authentic, it’s coming from them. AAC users are all people – human, human-cyborg, whatever, we all have consciousness and intention and feeling and cognition. This is true whether or not you, speakies who may be listening, like our voices or not – whether they’re melodic or not, varied or not, inflective or not. (Actually, it’s true whether or not you can even hear our voices auditorily – our letterboard spelling is valid communication too.) Just because you associate the way we sound with robots doesn’t mean we’re the personless automata you might want to think. Which like… I get it, it’s just flat out easier for you to believe that we are not real people, that we are objects y’all can shove into isolated classrooms and then conservatorships and then group homes, with special field trips out where we get to be a token representative of our supposed challenges. But we do have voices, of whatever kind, and an AAC user should be able to choose to work as an audio describer if that’s the way they want to use their voice. Why not? It’s theirs.
CHERYL: Totally…. You know what’s funny is I don’t usually say, “totally,” but you’ve said it several times. And I think I just kind of subconsciously started mirroring you, and I said, “totally.” [chuckles]
ENDEVER*: Oops.
CHERYL: [laughing] No!
[softly pulsing electronica music break]
CHERYL NARRATING: High-tech AAC has made a lotta strides, and people have more choices in software and even the voices. There’s still always room to keep growing.
ENDEVER*: Well, we do definitely need more options for trans-coded voices of all ages! As it stands a lot of trans people, and I think maybe especially nonbinary folks, can’t find any voices that they’re comfortable with. So it’s really important for companies that do this work to get some focus on voices that sound less stereotypically binary.
But I’d say what needs to happen even more than that is, we need more diverse voices for people of color! Now, some news on that is – Acapela Group has finished developing the first African American English voice, partnering with Assistiveware, PRC Saltillo, and Tobii
Dynavox. That will be available soon, and they plan to add a deeper voice as well as two younger voices to their African American English options. But I mean, yikes, here we are in 2021 and this is a first. It’s clear what that says about the state of the field. What similarly needs work, at least in English voices, is options that reflect immigrant backgrounds. Folks should have the opportunity to sound like their families do in terms of their accent. There’s already so much pressure on immigrants of color to assimilate into white culture here; we shouldn’t also be forcing them into using white American voices on top of that just because the status quo in English AAC technology has been to prioritize white experiences and perspectives.
CHERYL: I had not thought of that, and I love that you brought that up.
You have a podcast, and there’s so much more to learn about you and about AAC on that podcast. So, tell folks where they can find it.
ENDEVER*: I make a podcast with my friend Sam called AAC Town – we’re still kind of getting started, but we’ve had a couple great interviews lately of guests we know and love. Search for us on iTunes or Spreaker by name, or check out transcripts at AACTown.WordPress.com. You can also email us at AACTownPodcast@gmail.com if you have any questions or want to suggest topics for us to cover in the future! If you’re looking for my own blog, it’s AnotherQueerAutistic.WordPress.com.
CHERYL: We have started working on an audio description script for a short film Sam and I made years ago that we are going to make sure gets audio described by an AAC user. So, I’m looking forward to that project happening hopefully sometime in 2021. And it’ll be up on your podcast, AAC Town. [softly pulsing electronica plays ‘til theme music returns] This is really a cool conversation. I so appreciate it.
ENDEVER*: Thank you so much for having this discussion with me.
[upbeat theme music]
CHERYL: Every episode is transcribed. Links, guest info, and transcripts are all at WhoAmIToStopIt.com, my disability arts blog. I’m Cheryl, and…
TWO VOICES: this is Pigeonhole.
CHERYL: Pigeonhole: Don’t sit where society puts you.
Music in the episode: Blippy Trance by Kevin MacLeod. Link: https://incompetech.filmmusic.io/song/5759-blippy-trance. License: https://filmmusic.io/standard-license.
Odyssey by Kevin MacLeod. Link: https://incompetech.filmmusic.io/song/4995-odyssey. License: https://filmmusic.io/standard-license.
Hide
Podcast: Play in new window | Download | Embed
One response to “Pigeonhole Podcast 34: AAC Voices and Talking About Audio Description”
[…] I was also a guest on an established podcast called Pigeonhole. Listen to or read our conversation about the stigma around digitized voices. […]