Listen to this post. (The podcast is at the bottom of the post.)
I found a cute meme that has screenshots from a movie in which these two young white people are having a conversation on a sunny day outside by some very tall grass. The guy looks either puzzled or flirtatious, then the girl laughs delightedly. Then he looks earnest, and then she beats him by looking more earnest. It went around a lot for a while on social media with different fake captions added to the four squares, but I don’t know what film the images are from. The version I like has the guy saying, “I’ve added subtitles to make my content accessible.” And she laughs and says, “But not automatically generated ones, right?” And he looks earnest, and then she beats him by looking more earnest while captioned as saying, “But not or tomato call he gem her eight it ones, write?” It breaks long words into short ones. It gets homophones confused.
And as hard as it is to parse, my experience of budget captions and auto-craptions isn’t that far off from this. If you read her nonsense line really fast, it almost sounds like what she really said while still being impossible to understand. “But not or tomato call he gem her eight it ones, write?”
In today’s episode: A deep dive into captions with bagels, with juice, with black bean burritos, and with utter WTF! Also, take a stroll down my favorite ASMR playlist while I read some of the auto-craptions YouTube generated for my own content before I uploaded the accurate transcripts!
This episode is dedicated to my friend and colleague, Matt Lauterbach, who’s consistently doing amazing work for improving accessibility with integrity and creativity through collaborations across communities and organizations. He unintentionally inspired this episode by emailing me nearly a year ago with the worst caption fail he’d seen up to that point. Matt, may that one from last year be the worst.
Download the Pigeonhole Podcast 43 Transcript.
Transcript
[bright ambient music]
Introduction
CHORUS OF VOICES: Pigeonholed, pigeonhole, pigeonhole, pigeonhole, pigeonhole, pigeonhole, pigeonhole, pigeonhole.
[bright ambient music slowly fades out]
CHERYL: Every once in a while, a friend who knows I’m a captioner will send me a screenshot of some absolute fuckup of streaming auto-captions or Rev dot com captions. They laugh and say they thought of me. But it’s not purely funny. When the craptions are giving completely wrong or confusing information, then people who rely on captions are getting completely wrong or confusing information, like The Lost City trailer on Rotten Tomatoes that said, “big old German Jew” in the captions when the actor said, “big old Jamba juice.” I’ll even see two people’s sentences combined into one! (Um, just watch basically every line in The Lost City trailer.) It should be enraging, although I understand the instinct to laugh, and I split in half feeling rage while holding back a chortle.
Bagels and other stuff in the captions
[slow Klezmer tune fades in and plays through this section, opening with droning accordion and a high, noodling clarinet solo] Now, I was inspired to feel really pissed off by this next prime example. I’m watching a nice documentary. The guy is talking about how the other guy always had great food around. I hear him list out the contents of a typical spread, which included… corned beef… bagels… and more. [clarinet solo continues, sounding like it’s laughing, as drums slowly roll in]
As a Jew, I instantly recognize some Jewiness in these first two foods listed, even though I accept that a lot of people don’t associate bagels with Jewery, and I understand that non-Jews corn their beef. But put those together in a list? [delighted gasp] I’m basically at the deli, which I know, because that’s my childhood. And if it’s not in your life experience, you might not really know what all this is. That’s totally OK. That’s just how it goes when you get exposed to a culture you’re not familiar with through a film. [dramatic pause] So, look it up before you caption it.
[Klezmer music picks up in tempo and energy, accordion taking the lead] And when you don’t so, look it up before you caption it, you get this disaster. You get a caption that says, “From Corn, beef bagels….” For no reason, “corn” is capitalized in this caption. “Corn” was not a proper noun when this film came out because it came out before the rise of the beloved Corn Kid, Tariq, who does elevate the word to a proper noun in addition to a full mood. But anyway, what in the fuck is a beef bagel? And I do honestly think some non-Jews might recognize that corn should be “corned” and that it goes next to beef, not bagels. [Klezmer fades]
Setting my Jewishness aside, I did what I always do when I’m a confused captioner. I did a search for “corn beef bagel” and was inundated with hit after hit after picture after picture of literal corned beef on bagels. I searched again with a comma after “corn,” which the original captioner so offensively used, and got corned beef bagel recipe, corned beef bagel sandwich, even the horror of corned beef bagel dip. This is but the smallest bit of doing my due diligence as a captioner.
So, no, I do not like “From Corn, beef bagels” as a caption, nor do I like that people who rely on captions got that caption.
Rhett & Link go off the chain
If you want more of this absolute nonsense, but done on purpose instead of just being cheap, lazy, and not caring whether the access is accessible, check out Rhett & Link’s videos from a decade ago. They record a short skit, then download the YouTube auto-captions from their skit and use those as a new version of the script to replay the scene. But then they download and use the auto-captions from that version and play the scene a third time.
This one skit starts out with a phone call. Link is eating something when he calls Rhett.
RHETT: What are you eating?
LINK: A 100% organic free-range black bean vegan burrito.
RHETT: How can a black bean be free range?
LINK: I don’t know. Google it. You’ll never guess what I’m holding in my hand right now.
RHETT: Uh…an iPhone.
CHERYL: And the second time through, using YouTube’s auto-captions as their new script?
RHETT: What are you eating?
LINK: Or 100% organic free-range with the Indian retail.
RHETT: Part of my baby free range.
LINK: Parallel to collect. Never guess what I’m holding in my hand right now.
RHETT: Marathon!
CHERYL: “Part of my baby free range. Parallel to collect.” How are you supposed to get anything done in this world with that as your source of information?!
If you have access to watch Rhett & Link’s caption fails with the current YouTube auto-captions on, it’s a surreal mess of chaos as you listen to them earnestly talk or sing Christmas carols using the gibberish that YouTube fed them and watch as the current AI further mangles the nonsense into oblivion.
ASR presented with ASMR
Auto-captions are made by ASR, or automatic speech recognition. Not to be confused with ASMR, or autonomous sensory meridian response, which is that delicious tingly sensation some people get when they encounter certain sounds. I thought I’d present the two in juxtaposition in case you are not already on board with how terrible captions prevent people from getting the information they need. Here are some nice sounds that make my skin tingle and my jaw spontaneous unclench presented with things that make my whole body tense up! Imagine trying to have a conversation with someone who heard or read the original, intended words about the content these lines come from. Like, what would you talk about? I’d just default to talking about food, one of my favorite conversation topics, everything from corn to bagels. This stuff below is straight offa modern-day YouTube from my own content, not the Beta version of the technology Rhett and Link were using.
[Trippy chill synth music plays throughout reading the ASMR lines. The music weaves in an out of short, interlocking, colorful phrases that skip and hop unpredictably but without rushing above a steady droning tone and repeating chords in fourths and fifths. The tick-tick of a constantly off-beat digital high hat and softly pulsing kick drum play with the click of a track skipping, creating a light percussion that you might tap your foot to if you could find the rhythm but that maybe more reminds you what firefly wings might sound like beating if you could hear them.]
“It is a community partnership with storage war.”
“You and I should not be five people that have this deeply disabled voices.”
“She used to get I think your place without a voice to because of a neurological tradition.”
“I’m moving forward [Music] [Music] [Music] [Music] is times, and that is why I formed the real world, and this is that.”
“It’s hard for a lot of people to find appropriations for those things the best that’s the majority of females are not built for us.”
“I thought I have a bunch of watch me race the track.”
“Bustle-ness it really is the for this.”
“It also means it’s like care and love and attention rare cake.”
Wrap-up
There you have it. “Attention rare cake. Why I formed the real world, and this is that.” It’s like a never-ending facepalm. Or at least, in my mind, it should be.
[trippy ASMR music fades into bright ambient theme music]
Every episode is transcribed. Links, guest info, and transcripts are all at WhoAmIToStopIt.com, my disability arts blog. I’m Cheryl, and…
TWO VOICES: this is Pigeonhole.
CHERYL: Pigeonhole: Don’t sit where society puts you.
Music in the episode: Dance Medley by by Eastern Watershed Klezmer Quartet. (Source: FreeMusicArchive.org. Licensed under a Attribution-Noncommercial-No Derivative Works 3.0 United States License.)
Music for Shuffle, Sketch #1 by Matthew Irvine Brown. Used with permission of the composer.
Hide
Podcast: Play in new window | Download | Embed