About SpeechMe | Why I Built SpeechMe to Work Differently

Why I Built SpeechMe to Work Differently

Building SpeechMe was more of an adventure, maybe even a nightmare than what I had originally imagined.

In principle, it looked simple: build a smart layer over AI and use it to generate speeches. In reality, that was nowhere near enough.

I have built multiple apps and every one gives me new insights into development and processes and this was no different.

Every Speech Needed More Data Than Expected

The first hit was realising how much data needed collecting for each speech occasion. This wasn’t some trivial few questions, this turned out to be several weeks of in depth research into what a good, no, correct that, a great speech for every occasion consisted of.

Every speech occasion has its own unique set of ideal qualities and data sets, from a coding point of view anyway. This is why people attempting to write speeches with a generic AI fail, the prompt structure and the actual required content is so far removed from what the normal person realises or can specify that the speeches rarely convey or even relate to what they really want.

Building out those workflows with the ideal questions to feed the prompts for the actual speech generation was a real nightmare.

The end result is well worth it though because now, SpeechMe has redefined the benchmark for AI speechwriters. The question flow that you encounter as a user is specifically designed to generate a professional, accurate and occasion specific speech tailored to leverage the details you actually provide.

The question flow consists of multiple stages of questions, each relating to a specific part of the occasion speech. You get an explanation of why this section is important and the best way to write your answers to get the best from the app when it generates your response.

AI Text Is Not the Same as Human Speech

Hit number 2 was something I already knew and something that most people who have used AI to generate content know already. AI does not create realistic, non nuanced human sounding content.

While what it writes may read ok at first glance, in truth there are multiple issues with AI generated text.

The biggest issue, especially with speeches, is that readable text is not the same as human speech. AI can produce a sentence that is technically correct, but technically correct is not enough. A speech has to be read out loud, in front of people, in a real situation. That is where a lot of AI generated writing starts to fall apart.

There are lines that look fine on screen but sound completely off when spoken. There are phrases that feel too polished, too generic, too neat, or just not like something a real person would say in that moment.

This is one of the key differences between AI sentence generation and real human speech creation. AI can interpret the sentence request, but it does not automatically understand how that sentence works in the context of a real speech, with real people listening, and a real occasion behind it.

One of the primary issues is that AI uses a lot of repeated, nuanced and clichéd words, phrases and sentences. It also has its own little habits and quirks, from repeated sentence shapes to overusing dashes, producing too many bulleted lists or creating phrasing that sounds polished but strangely empty.

These are the kinds of patterns AI detectors are trained to pick up on, but more importantly, they are also the things real people notice even if they cannot always explain why the speech feels wrong.

Along the same lines, another reason, particularly for speeches, is that when you actually read the speech out loud as if you were actually addressing your audience, you realise that a lot of things in there are just not what a human would actually say, they look ok, well sort of, but read aloud, they just don't sound right.

Real Human Speech Had to Be Built Into the Original Generation Process

This was not something that could just be fixed at the end. If the original speech generation was poor, then a later clean-up pass would only be patching a weak draft. The speech needed to be generated in a better way from the start.

The first step in that process was some language learning for the AI. A long process of creating gold standard examples for each part of each speech occasion. These examples helped teach the prompt mechanism what good human speech should actually look like in context.

That work now sits inside the original speech generation process. The point was to make the first proper draft much stronger before any later refinement happened.

As an example, initially I was getting responses such as this in relation to a Father of the Bride speech. AI would come back with ‘I would like to thank everyone for being here today to celebrate Sam and Becky’. For GPT, that was correct grammar and phrasing but a real human knows that you are not celebrating those people, you are celebrating their wedding and what should have been generated was ‘I would like to thank everyone for being here today to celebrate Sam and Becky’s marriage’.

A small but very important tweak. If you created your speech with a basic GPT or Claude output then these are the types of issues you will constantly face and have to edit or risk the speech sounding ‘off’ in some way.

To make things just a little more difficult, as if there weren't enough things to do, another thing I had to account for was language style. A speech written for a UK wedding, an American rehearsal dinner or an Australian footy club event should not sound identical.

Creating the flows and training the AI on locale language nuances was also a major challenge. With the app being available with UK, USA and Australian English versions, it increased the workload significantly.

Once again, with that work now complete, the app boasts a significant feature set that caters to multiple countries and language variations with more to be added as time is available.

Training the AI on what is real human speech was actually quite a task and something I did not realise would be so demanding as we all assume that AI is trained in this at its base level. Unfortunately until I came to actually build this out, I didn’t realise how bad it actually is.

Why the Humaniser Had to Exist

Along with a couple of other little issues regarding this sort of content generation, it became apparent that these issues required fixing and that was a task in itself.

Even after improving the original prompt mechanism with better question flows, better speech structure and gold standard examples, there were still AI artefacts that needed catching. That is where the Humaniser comes in.

I created a ‘Humaniser’, a process that takes the speech and runs it through a series of additional safeguards that check the generated speech afterwards and remove the remaining AI quirks, repeated phrasing, clichéd terms and wording issues that can make a speech sound off when read aloud.

That combined process now refines speeches so that they actually sound like a human wrote them.

Why One Generation Step Was Not Enough

Another issue to overcome was taking the actual data from the user answers and generating a great speech.

This, I discovered, requires more than one step.

I implemented a multi stage process where the user receives various stages of the speech and is able to edit or regenerate all or part of that speech before moving on to the next stage, which has improved the actual speech generation immensely.

An initial outline followed by a draft, a punch up - no not a fight but a selection of targeted improvements, the Humaniser pass and finally the delivery. This 5 stage process ensures the user has full control of the generation with edits and regenerations and provides a speech built to replicate the kind of structure, refinement and control you would expect from a professional speechwriting process.

It creates speeches that sound like a human actually wrote them., but not just any human, but the human that entered the data in the question flow.

The Remaining Risk Was the User

There was a risk remaining though and that risk was not the AI, it was the user. For a lot of people, writing a speech, especially something like a best man speech where it is expected to be humorous, can lead to disaster.

Not knowing what really constitutes something appropriate for the occasion had the potential to create embarrassing or inflammatory content within the speeches.

As a precaution, there are safeguards and settings that are specific to different speech occasions. Level sliders that allow you to set what is an appreciable level of humour and the tone of the speech with warnings and advice to help guide decisions.

A dedicated section for what must be avoided in the speech, inappropriate references etc can be set as no go areas for the AI just in case you happened to include some, shall we say, dodgy content, in your answers.

Building a Complete Solution

Ok, so you have your speech written and damn, it's good. The only issue now is actually standing up and delivering that speech.

I quickly realised that generating the speech was half the job, a great speech is no good if you fluff your lines. Further solutions were required and that's when the Rehearse and Teleprompter features fell into place.

Not only does SpeechMe provide a fantastic speech writing service, it also helps you deliver that speech with poise and charisma.

The Rehearse feature took a bit of messing to get right. Rather than just have an AI voice read the text as it is written, the app remodels the finished speech text into a code format that actually specifies pauses, durations, speed of delivery and more.

This ‘code’ is then shipped off to an AI voice generator that creates a timed and paced delivery that can be played in the app for you to listen to and speak along with. This rehearsal mode lets you adjust speed of delivery and you can practice perfecting your speech for the actual occasion.

With that feature sorted, there was one remaining issue and that was a big one, delivering the speech on the day. So, do you print the speech and carry a pile of paper with you?

You can, that's an option in the app but a much cleaner and efficient way with better control is to use the teleprompter.

You install the app onto your phone and at the big moment you open it and there you have your speech on an autoscrolling teleprompter. Pop your phone on the table in front of you and glance down to follow the words for a simple, efficient prompting experience. If you don't want the autoscroll, simply use your thumb and scroll in your own time.

I tried to take into account every angle and issue that people face when having to write and deliver a speech in order to make the app a quality experience and great value for its users.

The Gap I Saw in Other AI Speechwriters

I reviewed other available speech writers on the market and found that they were in fact what my original idea was, Chat GPT wrapped in a user package cover. They all suffer from the issues I documented and have overcome with SpeechMe. That's why I believe that SpeechMe is the Gold Standard for AI speechwriters and is far superior to anything else available for speech generation outside a professional speechwriter but you won’t get one of those for the price of a speech made with SpeechMe.

So, after what should have taken 3 weeks and actually took 3 months of work, SpeechMe is now live and working and will hopefully become universally recognised as the standard for AI speech writing, providing its users with the best speeches that raise the roof or convey the required message in a professional and dignified manner.

I put everything into creating this app with the hope that it provides you, the user, with exactly what you need, with as little fuss or stress as possible.

Now, Go SpeechMe.

Why I Built SpeechMe to Work Differently

Why I Built SpeechMe to Work Differently

Every Speech Needed More Data Than Expected

AI Text Is Not the Same as Human Speech

Real Human Speech Had to Be Built Into the Original Generation Process

Why the Humaniser Had to Exist

Why One Generation Step Was Not Enough

The Remaining Risk Was the User

Building a Complete Solution

The Gap I Saw in Other AI Speechwriters

Ready to create your speech?