Today we went to Matthew’s work to start on the process of voice banking. Voice banking will allow me to communicate in the future should my own voice suffer the effects of MND. ElevenLabs is an AI software company specialising in text-to-speech and voice cloning technology, and luckily Odyssey was already using the technology for their own business needs so was well positioned to help.

When signing up, we chose the monthly Pro service option and provide a credit card for the order process, however the MND Association supplied a code to allow us to subscribe with zero charge.

Odyssey had a variety of professional audio recording equipment (and a very lovely man who knew how to use it), however its also possible to use just a mobile phone and record directly in to the Elevenlabs iOS/Android app. The benefit of using professional equipment (apart from better quality sound) is that the audio can be saved to a local computer, thus allowing us to use it somewhere else in the future or in the event of an issue with the service.

We were also joined by a lovely NHS community Speech and Language Therapist who was interested in learning more about Elevenlabs.

I opted to delay my morning dose of Riliuzol as in the liquid form currently prescribed I find it can make my tongue numb, and my speech quality suffers.

We recorded around 35 minuets of audio which is the minimum recommended. Supplying 4 hours is ideal and gives the service the best chance of creating a quality clone. I currently find talking for long periods quite tiring, and the quality of my speech starts to suffer after a while so we intend to add more audio in the future; the AI can continue to learn with more recordings and keep getting better and better.

To start with I read some children’s books such as “We’re going on a bear hunt”, in part because we didn’t plan well and they were accessible on the way out the door, but also because children’s stories are quite uplifting and we didn’t want a miserable tone of voice! In practice, we found that the small quantity of text on each page meant lots of page turning which impacted the flow of my speech. I also read “Tees Business Magazine” which was a little dry. In the end, we found that simply talking on a topic that I knew well gave the most natural and flowing content. I spoke about our planned holidays, and our girls.

After recording the audio we uploaded it “un-tinkered” to see how well Elevenlabs could use it without any extra effort, however we plan to edit out any coughs, stutters, and background noise and re-upload to give the system the best possible chance.

After uploading, Elevenlabs required me to read out a specific sentence to verify that the voice was my own, and I was allowed to clone it. It takes 2-6 hours before its ready to test, so we’ll check back in later.

Post a comment

Your email address will not be published. Required fields are marked *

No Comments Yet.