How Does Word Prediction Work on Mac?

Q: How does word prediction work technically?

Word prediction works by running the words you have typed through a language model that assigns probability scores to every possible next word. The word with the highest score above a confidence threshold is shown as the prediction. Modern systems use transformer-based models that consider the full preceding context, not just the last few words.

Q: What is the difference between on-device and cloud word prediction?

On-device prediction runs the language model locally on your computer. No text leaves your device. Cloud prediction sends your text to a remote server, processes it, and returns a suggestion. On-device is private but limited by local compute. Cloud can use larger models but transmits your writing to external servers.

Q: Does word prediction read everything I type?

To generate predictions, the system must read the preceding text in your current text field. In Charm, this processing happens entirely on-device. The text is read locally by the language model and no copy is sent anywhere. Charm does not log, store, or transmit your typing.

Q: Does Charm's word prediction learn from my writing?

Oracle's core language model does not update from your typing in the way cloud systems improve from user data. The model is fixed and runs locally. It does learn which predictions you accept over time to improve suggestion relevance, but this learning stays on your device and is never sent anywhere.

Educational April 2026 5 min read

Word prediction on Mac works by reading the text you have just typed, running it through a language model that assigns probability scores to all possible next words, and displaying the highest-scoring candidate as an inline suggestion. The user presses Tab to accept or keeps typing to dismiss. Charm's Oracle does this entirely on-device using your Mac's local processor - no text is sent to any server, and the process completes in milliseconds.

What happens inside a word prediction system?

At its core, word prediction is a probability problem. Given a sequence of words already typed, what is the most likely next word? The system needs to answer this question accurately enough - and fast enough - that the result is useful rather than distracting.

The process has three stages:

1. Context capture. The system reads the preceding text up to a defined context window. Charm's Oracle reads up to 100 characters of context - roughly 15-20 words. This context is the input to the language model. Longer context windows produce more accurate predictions because the model can use meaning established earlier in a sentence, but they also require more compute to process.

2. Probability scoring. The language model scores every word in its vocabulary against the context. A word that fits naturally given the preceding text gets a high score. An unlikely word gets a low score. The model has learned these probability distributions from large amounts of training text, so it can assign scores quickly without accessing the internet or external data.

3. Threshold filtering. The top-scored word is checked against a confidence threshold before being displayed. If the score is above the threshold, the prediction appears as ghost text. If it falls below - because the context is too ambiguous, the sentence structure is unusual, or no word stands out clearly - no prediction is shown. This filtering is what prevents constant low-quality interruptions. Research on prediction systems shows that users accept suggestions roughly 40% of the time on systems that filter well, compared to much lower rates on systems that suggest anything regardless of confidence.

How have word prediction models evolved?

The history of word prediction models is a progression from simple statistical tables to sophisticated neural networks. Understanding this progression helps explain why modern systems feel qualitatively different from older ones.

N-gram models are the earliest and simplest approach. A bigram model looks at the single preceding word. A trigram model looks at the two preceding words. Given "thank you for", a trigram model consults a table of observed three-word sequences in its training corpus and surfaces whichever word most often followed "thank you for" - typically "your" or "the". N-gram models are fast and require minimal memory. Their limitation is the short context window: they cannot use information from earlier in the sentence, so they miss meaning that depends on earlier context.

Recurrent neural networks (RNNs) represented the next step. Instead of a fixed window, RNNs maintain a hidden state that carries forward information across the full sequence of words typed. This allows predictions influenced by context several sentences back. RNNs significantly improved prediction quality for longer, more structured writing. They were the dominant approach in the early smartphone era.

Transformer models are the current state of the art and the architecture behind modern word prediction. Transformers process the full context window simultaneously using attention mechanisms that weight the relevance of every preceding word to the current prediction. A transformer reading "the quick brown fox jumped over the lazy" assigns high attention weight to "fox" when predicting the word after "over the lazy" - understanding the grammatical relationship between subject and following action. This produces predictions that feel contextually intelligent rather than statistically mechanical.

Modern systems run compact versions of transformer models that are small enough to fit on a device's local memory and fast enough to produce predictions within milliseconds. Apple Silicon's Neural Engine accelerates this kind of inference, which is one reason on-device AI quality has improved significantly in recent years.

What is the difference between on-device and cloud word prediction?

This distinction matters for both performance and privacy, and the two approaches make different trade-offs.

Cloud-based prediction sends your typed text to a remote server, runs inference on a large model, and returns the prediction. The advantage is that cloud servers can run much larger models than fit on a local device, potentially producing better predictions. The disadvantages are latency (predictions may lag slightly, especially on slow connections), offline availability (it does not work without internet), and privacy (your text - including confidential documents and private messages - is transmitted to an external server).

On-device prediction runs the language model entirely on your Mac. No text is transmitted. The model is smaller than cloud equivalents, but modern on-device models are accurate enough for practical word prediction. Predictions work offline, have zero network latency, and never expose your writing to external parties. For professionals handling confidential text, this is significant. Charm's Oracle is fully on-device. The language model runs on your Mac's processor or Neural Engine, and the entire prediction cycle completes without any network request.

The practical accuracy difference between a well-designed on-device model and a cloud model for single-word prediction is smaller than it might seem. Cloud models have a larger edge on longer phrase completions and more complex contextual reasoning. For the specific task of predicting one next word, compact on-device transformers perform competitively. Studies on mobile keyboard prediction show that even n-gram models achieve acceptance rates above 30%, and modern on-device transformers push that closer to the 40% benchmark observed in production keyboard systems.

How does Charm's Oracle work specifically?

Oracle is Charm's implementation of on-device word prediction for Mac. Charm operates via the macOS Accessibility API, which allows it to observe text fields in any app on your Mac without requiring those apps to implement any integration.

As you type, Oracle reads the text in the active field, extracts the most recent context (up to 100 characters), passes it to the on-device language model, and - if the top prediction clears the confidence threshold - displays it as a purple-glowing ghost text character sequence directly inline after your cursor. The purple colour is Charm's visual signal for Oracle specifically. Charm's other features use different colours: cyan for Spells (spelling correction) and blue for Polish (grammar correction).

Pressing Tab inserts the predicted word and moves the cursor past it. The next prediction cycle begins immediately. Typing any other key dismisses the ghost text and the cycle restarts with the updated context. The Tab interaction requires virtually no deliberate attention once the habit is formed - experienced users report that it becomes reflexive within 20-30 minutes of active use.

Key fact: Smartphone users accept word prediction suggestions roughly 40% of the time - one of the highest engagement rates of any typing assistance feature. On Mac, macOS has no native word prediction. Charm's Oracle brings on-device, zero-latency prediction to every Mac app via the Accessibility API.

Get word prediction on your Mac today

Charm works in every app. One-time purchase. No subscription.

✓ Learn more about Charm ✓ Get Charm for Mac $9.99

Frequently asked questions

How does word prediction work technically?

The system reads your preceding typed text and runs it through a language model that assigns probability scores to every possible next word. The top scorer above a confidence threshold is shown as the prediction. Modern systems use transformer models that consider the full preceding context, producing more accurate and natural suggestions than older statistical approaches.

What is the difference between on-device and cloud word prediction?

On-device runs the model locally on your computer - no text leaves your device, it works offline, and there is no network latency. Cloud sends your text to a remote server to generate predictions - potentially using a larger model, but transmitting your writing externally. Charm's Oracle is fully on-device.

How accurate is word prediction?

Research on smartphone keyboards shows users accept word prediction roughly 40% of the time on well-designed systems. Accuracy depends on context length, writing style consistency, and model quality. Systems that only show predictions above a confidence threshold achieve higher rates by filtering out weak guesses.

Does word prediction read everything I type?

To generate predictions, the system reads the preceding text in your current text field. In Charm, this is done entirely on-device. Text is read locally by the language model and no copy is sent anywhere. Charm does not log, store, or transmit your typing at any point.

Does Charm's word prediction learn from my writing?

Oracle's core language model does not retrain from your typing. The model is fixed and local. It can improve suggestion relevance based on predictions you accept over time, but this learning stays entirely on your device and is never transmitted anywhere.