Making books accessible to a greater audience

Books have always been one of the most powerful ways humans share knowledge, culture, and stories, but publishing a book is only the beginning of their journey. In an increasingly digital and connected world, readers expect books to exist in multiple formats and languages: print, audiobook, braille, and translation. The problem is that only a small percentage actually do. Worldwide, an estimated 2.2 to 2.4 million new books are published every year. Yet only a fraction are turned into audiobooks. Exact global figures are difficult to track, but available industry data suggests that the number of audiobook releases represents at least 3 – 5 percent of annual book production, and likely more when unreported markets are included. That means the overwhelming majority of books published globally never become available in audio form. In South Africa, the gap is even more visible. The country records roughly 8,500 to 10,000 books annually, depending on how titles are counted, but audiobook publishing remains a small and poorly measured part of the industry, with estimates suggesting it likely makes up less than 5% of total book output. While digital reading and listening continue to grow, traditional print still dominates. Language accessibility tells another important story. South Africa has made progress in publishing in indigenous African languages, particularly in education. Recent figures show approximately 1,600 to 1,700 new educational titles per year being published across African languages such as isiXhosa, isiZulu, Sepedi, Setswana, and others. However, these numbers reflect books published in those languages, not necessarily books translated into them, meaning true multilingual access remains difficult to measure. For readers with visual impairments, access becomes even more limited. Publicly available figures show only 6 documented braille book conversions within one major South African publishing accessibility programme to date, placing braille publishing at well below 1% of total annual book production.  (See references at the bottom of the page)

This is one of the main reasons I wanted to see if there is anything I could do about this and learn something new in the process. I wanted to see if one can use modern technology to make more books accessible to more people by converting text book into audio and to see if I can translate and convert to speech the same book into other languages. All while not having it sound like a robot reading off a powerpoint presentation he did not prepare for. I know this type of application has been done before, for example 11labs also has an audiobook feature, however this is in the cloud and rather expensive if you want to even generate one audiobook yourself. 

And yes, I’m confident to say that as of the writing of the blog I’m rater well versed in the modern text to speech (TTS) AI landscape. This will obviously change within one month from now as I learned the hard way. During my research and building my application I had to change the base TTS model 3 times. And this is just the ones I planned to use. I tested about 15 various AI TTS models to get where I’m now at. I even fine tuned some of my own, however this is much harder than I thought.  My aim it to try and automate the whole process end to so that multiple books can be converted into audio without any human intervention.  My first attempt ended up as an custom application I created (well me and claude and gpt) to sort out the bugs and add functionality as the app progressed.  I will not go into the technical detail in this blog, but feel free to reach out if you would like to know more.  However here are some failures and interesting outcomes from my research.

Some failure cases:

This was a good voice, but for some reason I could not get it to speak at a normal speed:

This was a total failure from one of my custom finetuning attempts. (Might sound a bit disturbing, you have been warned):

Just no:

 

The final application:

My aim was to have everything locally and I managed this for the most part. I could have just used online TTS models, however I’m pretty sure the cloud providers would train on the books and this would not do if the authors does not wish this!

I split the app up into sections. This section here takes a txt file containing the written book and splits it up into chapters based on a keyword. instead of just looking for “Chapter” you can adjust it so that if you have books in other languages you can update it on the fly and still keep the chapter structure as most of the processing is done on a per chapter level. Here you can also select a language you want the original book to be translated into. You can also select a voice if you just want the book to be converted into speech. This voice will come from the next tab where you can generate voices. Once the book is converted into audio you can then select a chapter or open it directly to upload or export.

The voice tab is there to design new voices. you can specify the gender, age and name. Along with a detailed description on how the voice should sound. Once you are happy with the voice you can then save it for future use. You also have the option to select a voice and either listen to a prerendered audio sample or write something new to test out and existing voice. I’ve also added the option to favorite a voice since the amount of voices generated can get overwhelming especially with my experimental audio voice feature. 

Below are some voices generated using this application. You are able to get a good range and emotion depending on how you write your prompt. 

Sample voices:

 

The next two tabs are still in experimental phase although I’ve got some decent output with it so far (see example book below). The idea behind this is in two steps. First the characters is extracted from each chapter and should the character appear in other chapters the personality of the character is updated accordingly. The idea is to extract the character’s personality and way of speaking nd write this to an internal database linked to the currant book being converted to audio. Next the description of each character is then user to produce the voice for that character. If it does not sound right one can always manually change the prompt and try again until you are happy with the voice of all the characters in the book. You can also have multiple voices for one character and switch between them if you so wish.  In the end you should end up with voice samples of each character in the book. As this is still an experimental feature I’ve not adjusted it to work with other languages than English, but doing so would not be very difficult. 

Lastly in the final tab the book is split again into characters while matching the ones created in the Characters tab, each chapter is split into speaking roles only. This also includes the narrator so that the narrator has its own voice while all other sections is assigned to a character’s speaking role. This process is not perfect and sometimes you will find a character speaks the narrator’s part and the narrator might speak it’s part and a characters part. However the application allows you to manually edit the sections if you so wish, although this will be very time consuming for very long audiobooks.  

And now for some examples. I used The wizard of OZ and Alice in Wonderland as examples due to them being in the public domain. Below is an example of the full Wizard of OZ read with a single voice the entire video is about four hours long.

Below is the same book, however I first translated it into isiZulu and then had a custom voice specifically for isiZulu speak the translation. I will admit I had to rely on google’s translation feature to test parts of the audio and I’m sure there might be sections that is not 100% correct. Should anyone wish to comment or give advice on any of the translated audiobook I would be more than happy to have a conversation with you. 


 

Here is Alice in Wonderland, again only read by one speaker, I did listen to sections of this only as I must admit if I ever hear this story again it will be too soon. I used this book for testing and must have listened to this book more than 20 times in total while testing various TTS AI models. 


 

Below is the Xhosa version. Again any feedback or positive input would be greatly apricated


 

 Same goes for the isiZulu version, this time I opted for a male voice to test the full capability of the TTS AI.

 

And now the final experimental feature mentioned above.  Some of the timing is still off and I’m sure with more tweaking I will get everything synced and aligned with the right characters speaking the right sections of the book.  I did some dog feeding and listened to not only this book but some other books while traveling to and from work and I did find myself actually getting into the story, even if it was more from a technical perspective. 

Thank you!

 

References: