Best Practices for Live Transcribe

A phone sitting on a table showing live transcribe in action outputting captions as someone offscreen speaks
Live Transcribe is an app to get free real time captions in 80+ languages, made for the deaf and hard of hearing. (image source)

It has been nearly two years since we launched Live Transcribe. In that time, we’ve heard lots of great stories from people about their creative uses and applications to get access to free real time transcriptions in 80+ languages. We’ve also done lots of experiments with transcription on both phones and tablets and recently published a paper.

Even though the app looks very simple, there are still plenty of best practices for getting good results. In this guide, we share some of what we’ve learned in our journey of trying to make the world’s sounds and conversations more accessible. In this guide, we will mostly discuss acoustics and external microphones. Our guiding principle is to get the best audio signal possible, usually by moving the microphone closer to what you are trying to transcribe.

There will always be environments in which getting accurate transcription is challenging, but fortunately the job is a lot easier if you follow one rule: move the microphone closer to the speaker! (We will use “loudspeaker” when we mean the electronic device and “speaker” when we mean a human who is talking!)

Good Acoustics

The two biggest challenges with audio transcription are noisy rooms and reverberant spaces. In a noisy room, it might be very challenging to hear and transcribe the speaker because of the loud sounds of people and objects in the background. In a reverberant space (the inside of a cathedral, for example), the app might be able to hear the speaker well enough, but there might be too much echo to understand them properly.

Moose and Raven are using Live Transcribe in a cave. Moose is yelling “Hello” but the phone doesn’t transcribe anything.
Moose and Raven are using Live Transcribe in a cave. Moose is yelling “Hello” but the phone doesn’t transcribe anything.
While exploring a cave, Moose and Raven discover that due to lots of echoes, Live Transcribe just isn’t working. Moose can yell as loud as she wants, but they might need to move closer together for Live Transcribe to understand.

Both of these challenges are helped by getting the microphone closer to the speaker’s mouth (see external mics below). When possible, stepping outside the room can sometimes be a good solution. Scenarios like church services where the speaker is speaking into an amplification system might be helped by sitting near the loudspeaker or putting a wireless microphone near the podium.

Phone Calls

As a rule of thumb, Live Transcribe should not be expected to perform well on phone calls and diagnosing quality issues can be difficult. We’ve seen it work, and you might too, but due to the audio processing used in network transmission, the signal picks up a lot of distortion and loses a lot of its high and low frequency content.

If you want to attempt this on your own, there are a few experiments you can try.

One-device experiments:

Run Live Transcribe on the same phone as your phone call. Set the volume to about half of its full volume, start the phone call and switch it to loudspeaker mode, open Live Transcribe, and talk. Depending on your phone and Android version, you might find that Live Transcribe doesn’t receive audio (in this case, the volume meter in the corner will not move). The reason is because not all phones allow sharing of the microphone between apps.

Two-device experiments:

Set the Live Transcribe device up so that its microphone is near the loudspeaker of the calling phone. The two devices shouldn’t be touching (the vibration due to the audio could cause an unnatural buzzing). Set the calling phone on loudspeaker mode and at 60% of the volume.

If your phone supports Live Captions, you may be able to get much better results. (Currently, Live Caption works in English on Pixel 2 and later, and selected other Android phones.)

Television

In general, some of the challenges with phone calls also come up when transcribing from a television. However, TV often has a better quality signal and produces better results. Put the transcribing device near the television’s loudspeaker. (or directly connect it using a UTC-C35 adapter, mentioned below)

Put the tablet across the table

If you are seated at a table, moving the device closer to the speaker is the easiest way to get better transcription. This works especially well with a tablet on a stand due to the large screen size, but a phone with a PopSocket or adjustable stand will also work. Increasing the text size can make it easier to read the screen from across the table.

External microphones

Especially for one-on-one conversations, using a wired or wireless microphone can make a huge difference.

A comica usb-c mic
A comica usb-c mic
The Comica CVM-VS09 mic plugs directly into a USB-C device like a phone and gives better quality audio in the direction the mic is pointing.

For those with USB C ports on their phones, mount the Comica CVM-VS09 directly on the phone. It is a “shotgun” style directional microphone meaning that it amplifies sounds in the direction you point it and suppresses sounds in other directions. This can be helpful for aiming the mic at a speaker, even if they are across the table.

A sena bluetooth mic
A sena bluetooth mic
The Sena (BT10–01) is a clip-on bluetooth wireless microphone.

There are other external microphone solutions that support more than one microphone, such as the Samson Go Mic. The Samson mic also supports two wireless mics that lead to great quality for a three-person conversation. However, it uses handheld mics and is a very bulky solution. It also isn’t scalable beyond two mics.

One of the wireless microphones that we have had success with was the Sena (BT10–01). The mic takes a bit of work to set up and to pair with your phone, potentially also requiring help from a hearing person. Once set up, it is very convenient for everyday use. The really great thing about wireless mics is that your conversation partner can clip the mic to their clothing. [UPDATE: It seems the Sena BT10–01 is currently not on the market. We will update this post once we find a similar replacement. Sorry!]

In this video, three phones with different audio inputs are used. From left to right, a Bluetooth earpiece, a directional microphone, and the phone’s built-in mic are used for transcription. As the speaker gets farther away, the built-in mic stops transcribing first, and the directional mic works for perhaps twice the distance. The Bluetooth microphone continues to transcribe even once the speaker is in another room (because the Bluetooth signal goes through the door and sound does not). The Bluetooth microphone has the highest quality transcription because the mic is only a few inches from the speaker’s mouth.

Using Live Transcribe directly with any external audio source

A saramonic utc-c35 adapter
The 3.5mm aux jack of the Saramonic UTC-C35 goes into your audio output device (source) and the USB-C adapter goes into your phone or tablet with the Live Transcribe app. Then in the Live Transcribe app, just select the external microphone.

If you want to transcribe a phone call, a video call, or any stream of audio from another device to a device with Live Transcribe, we found this special adapter by Saramonic UTC-C35 to be a simple reliable solution.

A great aspect of this solution is that the audio gets sent directly from the source, avoiding the reverb and noise that your room might have. This usually leads to much better transcription.

An example setup, with a phone and a laptop, would be: A video call playing on the laptop -> UTC-C35’s 3.5mm aux jack plugged into the laptop -> UTC-C35’s USB-C jack plugged into the phone -> Live Transcribe app on the phone.

A screenshot showing Live Transcribe’s settings on how an external microphone selector appears
A screenshot showing Live Transcribe’s settings on how an external microphone selector appears
Once you plug in the Saramonic UTC-C35 to a phone running Live Transcribe, it shows up as an external microphone that you can select in the Settings.

Mini phone plug connectors: TRS vs. TRRS

a diagram showing 3 variants of the aux 3.5mm jack. left is TS. middle is TRS. right is TRRS
a diagram showing 3 variants of the aux 3.5mm jack. left is TS. middle is TRS. right is TRRS
Here are some photos of mini phone plugs, aka 3.5mm or 1/8" audio plugs. Notice the different numbers of “rings” between the “tip” and the “sleeve”, with TS (left), TRS (middle), and TRRS (right).

Some mobile phones still have headphone jacks that support stereo output and mono microphone input via the TRRS mini phone plug. TRRS stands for tip–ring–ring–sleeve. It is not the same as the old standard mini phone plug that just carries stereo, the TRS (tip–ring–sleeve) configuration.

A Y-split audio adapter (in white) that plugs into a phone or laptop. With the separated mic socket, you can plug in your external audio to phone with Live Transcribe.

To make matters more complicated, a microphone with a mini phone plug almost always expects to plug into a TRS jack, so if you plug it into your phone’s TRRS jack, it just won’t work. You may need a Y split adapter to get the signal in from your external mic unless it’s a mic that’s integrated with a stereo headset.

A deconstructed layout of a directional microphone, a 3.5mm aux TRRS splitter and a USB-C-to-aux adapter
A deconstructed layout of a directional microphone, a 3.5mm aux TRRS splitter and a USB-C-to-aux adapter
With the adapter above, the large directional (sometimes called shotgun) mic (the AudioTechnica ATR550) with TRS plug can be connected to a phone with a TRRS jack. An adapter typically splits out separate TRS jacks for the stereo headphones (green) and for the mono microphone (red).
A usb-to-aux adapter showing the two 3.5mm receptacles to connect headphone and mic
A usb-to-aux adapter showing the two 3.5mm receptacles to connect headphone and mic
This USB-to-Aux (Andrea USB-SA) adapts directly to a USB port (or to USB-C by yet another adapter), as opposed to adapting to a TRRS jack. Using USB for audio allows it to handle stereo microphone input.

We also add a white USB-C headset adapter. This adapter has a TRRS jack so the mic can be connected to newer phones that use USB-C, or to a Chromebook.

Switching between WiFi and mobile/cellular data (3G, 4G LTE, 5G, etc)

Live Transcribe was built to work on-the-go. So if you leave your home WiFi and go out into the world, it will switch to the mobile network when possible. In practice, there can be a bit of delay (1~15 seconds) before Android acknowledges and actually re-establishes a functional connection from WiFi to mobile/cellular data (or vice-versa), leaving you connected to WiFi even though the signal strength is now very weak. This can be a common point of frustration in scenarios like elevators, basements, parking garages, or walking around in a big building. To make the app transcribe again, you can temporarily turn off WiFi.

Hopefully with these tips you’ll be on your way to having better conversations with Live Transcribe. If you have any other tips, feel free to share them below.

Live Transcribe is a free realtime captioning Android app for the deaf and hard of hearing

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store