How Does Word Prediction Work? The AI Behind It

Word prediction uses language models trained on large amounts of text to calculate the most probable next word given the words that came before it. The model assigns a probability to every word in its vocabulary, ranks the candidates, and surfaces the top result as a suggestion. More sophisticated models use broader context - reading more of the preceding sentence - which produces predictions that are more accurate and more natural than simple frequency-based guesses.

What AI model powers word prediction?

Word prediction has evolved through three distinct generations of AI model, each more capable than the last.

N-gram models were the first practical approach. An n-gram model counts how often each word follows a given sequence of preceding words in a large text corpus. A bigram model looks one word back; a trigram model looks two words back. Given the phrase "the quick brown", a trigram model predicts "fox" because that sequence appears frequently in training data. N-gram models are fast and require little memory, which is why they powered early mobile word prediction. Their limitation is the short context window: they cannot use information from earlier in the sentence, so predictions become generic or nonsensical when the relevant context is more than a few words back.

Neural language models replaced n-gram approaches during the 2010s. These models use recurrent neural networks (RNNs) or, later, transformer architectures to process variable-length context. A transformer-based model reading "I need to finish the quarterly" can predict "report" with high confidence because it understands the relationship between "quarterly" and business documents, even though "I need" at the start of the sentence is many tokens away. Studies show that context-aware transformer models are approximately 3 times more accurate than n-gram models on professional writing text.

Compressed on-device models are what powers word prediction in consumer tools today. Full transformer models like GPT-4 are far too large to run in the background at keystroke speed on a laptop. Engineers use techniques like quantisation (reducing numerical precision), pruning (removing less important model weights), and knowledge distillation (training a small model to mimic a large one) to create compact models that run in milliseconds on standard hardware without a dedicated GPU.

Modern word prediction reduces keystrokes by 15-25% for average prose, with higher savings for formulaic or domain-specific writing where patterns repeat frequently.

How does context-aware prediction work?

Context-aware prediction works by encoding the preceding text into a numerical representation that captures semantic meaning, not just surface word patterns. A well-trained model understands that "the patient was prescribed" is likely to be followed by a medical term, and that "the developer pushed the" is likely to be followed by "code" or "commit" or "branch" - not "cat."

The context window is the span of preceding text the model reads when making each prediction. Larger context windows produce better predictions but require more computation. There is a practical trade-off: a model that reads 10,000 preceding tokens produces better results than one reading 100 tokens, but the former cannot run in real time on consumer hardware without noticeable latency.

The best on-device systems balance context width against inference speed. For inline word prediction - where suggestions must appear within milliseconds of each keystroke to feel natural - the latency budget is tight. Any suggestion that appears more than 100ms after the last keypress will feel slow and disruptive rather than helpful.

Prediction accuracy also depends on training data quality and domain coverage. A model trained primarily on news articles will make mediocre predictions for software engineers writing code comments or lawyers drafting contract language. Models fine-tuned on professional writing patterns perform significantly better in those contexts.

How does Charm's Oracle prediction work?

Oracle is Charm's word prediction feature. It uses a compact on-device language model that processes approximately the last 100 characters of context to generate each prediction. The model runs in under 50 milliseconds, keeping suggestions responsive enough to feel instantaneous.

When Oracle has a high-confidence prediction, it appears inline as ghost text immediately after your cursor, styled in a lighter colour. Pressing Tab accepts the prediction and advances to the next word boundary. If the prediction is wrong, you simply continue typing and it disappears - no dismissal gesture required.

Oracle is activated with a purple glow highlight, which distinguishes it visually from Spells (spelling corrections, cyan) and Polish (grammar corrections, blue). Each feature has its own visual signature, so you always know which part of Charm acted on your text.

The model is trained on general English text and fine-tuned for professional writing patterns - emails, documents, and messages, rather than code or casual chat. Over time, Oracle improves its predictions based on your accepted suggestions. This adaptive learning happens locally: your writing patterns are stored on your Mac and never uploaded to a server.

Because Oracle uses the macOS Accessibility API rather than keyboard input hooks, it works in every text field on the system - including Slack, VS Code, Discord, and other Electron apps that implement their own input handling and cannot use Apple's native prediction frameworks.

Why does word prediction improve over time?

Word prediction improves over time through two mechanisms: explicit learning from accepted suggestions, and implicit adaptation to vocabulary patterns.

When you accept a prediction, the system records that word in context as a positive signal. Over hundreds of accepted predictions, the model builds a picture of your vocabulary preferences, frequently used phrases, and domain-specific terminology. This is why prediction tools feel generic in the first week and increasingly useful after a month of regular use.

Implicit adaptation happens when the system observes which words you type repeatedly. If you frequently type "refactor" or "onboarding" or "pursuant to," those words move up in the candidate ranking for relevant contexts, even if you never explicitly accepted a prediction for them. The model learns your register and your field.

All of this learning happens on-device in Charm. There is no account, no cloud profile, no server that aggregates usage patterns across users. The adaptation is personal and private, stored in a local model file on your Mac.

Key point: Context-aware word prediction reduces keystrokes by 15-25% for average prose. Combined with Charm's autocorrect and grammar correction, the combined system reduces the effort required to produce polished written output by a measurable margin - without any cloud dependency.

Frequently asked questions

How does word prediction work?

Word prediction uses a language model to estimate the probability of each possible next word given preceding context. The model ranks candidates by probability and surfaces the highest-scoring one as a suggestion. Broader context windows produce more accurate and natural predictions.

Is word prediction the same as autocomplete?

They are related but distinct. Autocomplete finishes the word you are currently typing. Word prediction suggests the next complete word before you begin typing it. Many systems - including Oracle - combine both: completing the current word and predicting the next one simultaneously.

Does Oracle send my text to a server?

No. Oracle runs entirely on your Mac using an on-device language model. The last 100 characters of context are processed locally in under 50 milliseconds. No text is transmitted to Charm's servers or any third party.

What is Tab completion in word prediction?

Tab completion is the interaction model where a prediction appears inline as ghost text and pressing Tab accepts it. If the prediction is wrong, you keep typing and it disappears. Charm's Oracle uses this pattern - one keypress to accept, no action needed to dismiss.

Does word prediction work in all apps?

Charm's Oracle works in every text field on macOS, including Slack, VS Code, Discord, and other apps that block native macOS frameworks. It uses the Accessibility API to read context and insert predictions system-wide.

Word prediction that runs on your Mac, not a server.

Oracle predicts your next word across every Mac app. Private, fast, and $9.99 - yours forever.

Learn more about Charm Get Charm for Mac $9.99