It has been nearly two years since we launched Live Transcribe. In that time, we’ve heard lots of great stories from people about their creative uses and applications to get access to free real time transcriptions in 80+ languages. We’ve also done lots of experiments with transcription on both phones and tablets and recently published a paper.
Even though the app looks very simple, there are still plenty of best practices for getting good results. In this guide, we share some of what we’ve learned in our journey of trying to make the world’s sounds and conversations more accessible. In this guide, we will mostly discuss acoustics and external microphones. Our guiding principle is to get the best audio signal possible, usually by moving the microphone closer to what you are trying to transcribe.
There will always be environments in which getting accurate transcription is challenging, but fortunately the job is a lot easier if you follow one rule: move the microphone closer to the speaker! (We will use “loudspeaker” when we mean the electronic device and “speaker” when we mean a human who is talking!)
The two biggest challenges with audio transcription are noisy rooms and reverberant spaces. In a noisy room, it might be very challenging to hear and transcribe the speaker because of the loud sounds of people and objects in the background. In a reverberant space (the inside of a cathedral, for example), the app might be able to hear the speaker well enough, but there might be too much echo to understand them properly.
Both of these challenges are helped by getting the microphone closer to the speaker’s mouth (see external mics below). When possible, stepping outside the room can sometimes be a good solution. Scenarios like church services where the speaker is speaking into an amplification system might be helped by sitting near the loudspeaker or putting a wireless microphone near the podium.
As a rule of thumb, Live Transcribe should not be expected to perform well on phone calls and diagnosing quality issues can be difficult. We’ve seen it work, and you might too, but due to the audio processing used in network transmission, the signal picks up a lot of distortion and loses a lot of its high and low frequency content.
If you want to attempt this on your own, there are a few experiments you can try.
Run Live Transcribe on the same phone as your phone call. Set the volume to about half of its full volume, start the phone call and switch it to loudspeaker mode, open Live Transcribe, and talk. Depending on your phone and Android version, you might find that Live Transcribe doesn’t receive audio (in this case, the volume meter in the corner will not move). The reason is because not all phones allow sharing of the microphone between apps.
Set the Live Transcribe device up so that its microphone is near the loudspeaker of the calling phone. The two devices shouldn’t be touching (the vibration due to the audio could cause an unnatural buzzing). Set the calling phone on loudspeaker mode and at 60% of the volume.
If your phone supports Live Captions, you may be able to get much better results. (Currently, Live Caption works in English on Pixel 2 and later, and selected other Android phones.)
In general, some of the challenges with phone calls also come up when transcribing from a television. However, TV often has a better quality signal and produces better results. Put the transcribing device near the television’s loudspeaker. (or directly connect it using a UTC-C35 adapter, mentioned below)
Put the tablet across the table
If you are seated at a table, moving the device closer to the speaker is the easiest way to get better transcription. This works especially well with a tablet on a stand due to the large screen size, but a phone with a PopSocket or adjustable stand will also work. Increasing the text size can make it easier to read the screen from across the table.
Especially for one-on-one conversations, using a wired or wireless microphone can make a huge difference.
For those with USB C ports on their phones, mount the Comica CVM-VS09 directly on the phone. It is a “shotgun” style directional microphone meaning that it amplifies sounds in the direction you point it and suppresses sounds in other directions. This can be helpful for aiming the mic at a speaker, even if they are across the table.
There are other external microphone solutions that support more than one microphone, such as the Samson Go Mic. The Samson mic also supports two wireless mics that lead to great quality for a three-person conversation. However, it uses handheld mics and is a very bulky solution. It also isn’t scalable beyond two mics.
One of the wireless microphones that we have had success with was the Sena (BT10–01). The mic takes a bit of work to set up and to pair with your phone, potentially also requiring help from a hearing person. Once set up, it is very convenient for everyday use. The really great thing about wireless mics is that your conversation partner can clip the mic to their clothing. [UPDATE: It seems the Sena BT10–01 is currently not on the market. We will update this post once we find a similar replacement. Sorry!]
In this video, three phones with different audio inputs are used. From left to right, a Bluetooth earpiece, a directional microphone, and the phone’s built-in mic are used for transcription. As the speaker gets farther away, the built-in mic stops transcribing first, and the directional mic works for perhaps twice the distance. The Bluetooth microphone continues to transcribe even once the speaker is in another room (because the Bluetooth signal goes through the door and sound does not). The Bluetooth microphone has the highest quality transcription because the mic is only a few inches from the speaker’s mouth.
Using Live Transcribe directly with any external audio source
If you want to transcribe a phone call, a video call, or any stream of audio from another device to a device with Live Transcribe, we found this special adapter by Saramonic UTC-C35 to be a simple reliable solution.
A great aspect of this solution is that the audio gets sent directly from the source, avoiding the reverb and noise that your room might have. This usually leads to much better transcription.
An example setup, with a phone and a laptop, would be: A video call playing on the laptop -> UTC-C35’s 3.5mm aux jack plugged into the laptop -> UTC-C35’s USB-C jack plugged into the phone -> Live Transcribe app on the phone.
Mini phone plug connectors: TRS vs. TRRS
Some mobile phones still have headphone jacks that support stereo output and mono microphone input via the TRRS mini phone plug. TRRS stands for tip–ring–ring–sleeve. It is not the same as the old standard mini phone plug that just carries stereo, the TRS (tip–ring–sleeve) configuration.
To make matters more complicated, a microphone with a mini phone plug almost always expects to plug into a TRS jack, so if you plug it into your phone’s TRRS jack, it just won’t work. You may need a Y split adapter to get the signal in from your external mic unless it’s a mic that’s integrated with a stereo headset.
We also add a white USB-C headset adapter. This adapter has a TRRS jack so the mic can be connected to newer phones that use USB-C, or to a Chromebook.
Switching between WiFi and mobile/cellular data (3G, 4G LTE, 5G, etc)
Live Transcribe was built to work on-the-go. So if you leave your home WiFi and go out into the world, it will switch to the mobile network when possible. In practice, there can be a bit of delay (1~15 seconds) before Android acknowledges and actually re-establishes a functional connection from WiFi to mobile/cellular data (or vice-versa), leaving you connected to WiFi even though the signal strength is now very weak. This can be a common point of frustration in scenarios like elevators, basements, parking garages, or walking around in a big building. To make the app transcribe again, you can temporarily turn off WiFi.
Hopefully with these tips you’ll be on your way to having better conversations with Live Transcribe. If you have any other tips, feel free to share them below.