Digital Healthcare, Augmented Reality, Machine Learning, Cloud Computing and more! Andreas Jakl is a professor @ St. Pölten University of Applied Sciences, ex-Microsoft MVP for Windows Development and Amazon AWS Educate Cloud Ambassador & Community Builder.

Tag: Spark AR

App Development AR / VR Cloud Speech Assistants

How-To: Convert Neural Voice Audio from Amazon Polly (mp3) to Spark AR (m4a)

Post author By andijakl
Post date November 3, 2021

Finished Spark AR project playing a sound file upon its start. The artificial speech was generated using the Amazon Polly text-to-speech neural voice engine.

Currently, Facebook’s Spark AR Studio is restrictive with supported audio formats. Unfortunately, only M4A with specific settings is allowed. This short tutorial is a guidance on how to convert artificially generated neural voices (in this case coming from an mp3 file as produced by Amazon Polly) to the m4a format accepted by Spark AR. I’m using the free Audiacity tool, which integrates the open-source FFmpeg plug-in.

Spark AR has the following requirements on audio files:

M4A format
Mono
44.1 kHz sample rate
16-bit depth

Generating Audio using Text-to-Speech (mp3 / PCM)

Neither Amazon Polly nor the Microsoft Azure Text-to-Speech cognitive service can directly produce an m4a audio file. In its additional settings, Polly offers MP3, OGG, PCM and Speech Marks. MP3 goes up to a sample rate of 24000 Hz, PCM is limited to 16000 Hz.

Tags ar, augmented reality, AWS, Spark AR, Speech, Speech Services