AI UX Empty States in 2026: Loading, Waiting, and Thinking Patterns
LLM responses take seconds, not milliseconds. A practical guide to loading states, thinking indicators, and empty state design for AI products.
TL;DR
- AI responses take 2–30 seconds, not 200ms—traditional loading patterns don’t work.
- Two-stage model: Processing (model receives prompt) → Generation (tokens streaming).
- Show “thinking” states explicitly—users tolerate waiting when they understand why.
- Stream responses as they generate—the typing effect creates perception of speed.
- Use progressive fidelity: show low-quality results first, refine as generation completes.
- Empty states should suggest prompts, not force users to compose complex input.
- Avoid fake progress bars—they break trust when they don’t match reality.
Why AI UX Is Different
Traditional web applications respond in 100–500ms. AI applications take 2–30 seconds. This creates unique UX challenges:
| Aspect | Traditional App | AI Application |
|---|---|---|
| Response time | 100–500ms | 2–30s |
| Loading indicator | Spinner sufficient | Need engagement |
| Progress | Deterministic | Unpredictable |
| Output | Complete response | Streaming tokens |
| User expectation | Instant | Tolerant if explained |
Users will wait for AI—but they need to understand what’s happening.
The Two-Stage Loading Model
AI loading has distinct phases:
Stage 1: Processing
The model has received the prompt but hasn’t started generating:
User submits prompt
│
▼
┌─────────────────┐
│ PROCESSING │
│ "Thinking..." │ ← Model is reasoning
│ (2-10 seconds) │
└────────┬────────┘
│
▼
Start streaming
What’s happening: Context processing, retrieval (for RAG), reasoning, planning.
What to show: Thinking indicator, active animation, status text.
Stage 2: Generation
The model is actively producing tokens:
Generation started
│
▼
┌─────────────────┐
│ GENERATING │
│ "Writing..." │ ← Tokens streaming
│ (5-20 seconds) │
└────────┬────────┘
│
▼
Complete response
What’s happening: Token-by-token output.
What to show: Streaming text, typing animation, partial results.
Thinking Indicators
Design Principles
| Principle | Implementation |
|---|---|
| Show activity | Continuous smooth animation |
| Explain state | ”Thinking…”, “Analyzing…”, “Researching…” |
| Set expectations | ”This usually takes 10-15 seconds” |
| Allow interruption | Cancel/stop button visible |
Animation Patterns
/* Thinking indicator animation */
.thinking-indicator {
display: flex;
gap: 4px;
align-items: center;
}
.thinking-dot {
width: 8px;
height: 8px;
border-radius: 50%;
background: var(--color-primary);
animation: thinking-pulse 1.4s ease-in-out infinite;
}
.thinking-dot:nth-child(2) {
animation-delay: 0.2s;
}
.thinking-dot:nth-child(3) {
animation-delay: 0.4s;
}
@keyframes thinking-pulse {
0%, 80%, 100% {
transform: scale(0.6);
opacity: 0.4;
}
40% {
transform: scale(1);
opacity: 1;
}
}
Context-Aware Status Text
function getStatusText(stage: AIStage, context: Context): string {
if (stage === 'processing') {
if (context.hasDocuments) {
return "Searching your documents...";
}
if (context.isComplex) {
return "Analyzing your request...";
}
return "Thinking...";
}
if (stage === 'generating') {
if (context.outputType === 'code') {
return "Writing code...";
}
if (context.outputType === 'long-form') {
return "Composing response...";
}
return "Generating...";
}
return "Processing...";
}
Streaming Response Patterns
Why Streaming Matters
| Approach | Perceived Wait | User Experience |
|---|---|---|
| Wait for complete response | 15 seconds | Frustrating |
| Stream as generated | ~0 seconds to first token | Engaging |
Streaming creates the perception of instant response, even when total time is the same.
Implementation
// React component for streaming text
function StreamingResponse({ stream }: { stream: AsyncIterable<string> }) {
const [text, setText] = useState('');
const [isComplete, setIsComplete] = useState(false);
useEffect(() => {
let mounted = true;
async function consume() {
for await (const chunk of stream) {
if (!mounted) break;
setText(prev => prev + chunk);
}
if (mounted) setIsComplete(true);
}
consume();
return () => { mounted = false; };
}, [stream]);
return (
<div className="response">
<div className="response-text">
{text}
{!isComplete && <span className="cursor">|</span>}
</div>
{isComplete && <ResponseActions />}
</div>
);
}
Typing Effect CSS
.cursor {
display: inline-block;
width: 2px;
height: 1.2em;
background: currentColor;
animation: blink 1s step-end infinite;
margin-left: 2px;
}
@keyframes blink {
50% {
opacity: 0;
}
}
Empty State Design
The Problem with Blank Prompts
Empty text areas create two problems:
- Users don’t know what to ask
- Composing good prompts is hard
Solution: Suggested Prompts
function EmptyState({ context }: { context: AppContext }) {
const suggestions = getSuggestions(context);
return (
<div className="empty-state">
<h3>What can I help you with?</h3>
<div className="suggestions">
{suggestions.map(suggestion => (
<button
key={suggestion.id}
onClick={() => submitPrompt(suggestion.prompt)}
className="suggestion-chip"
>
<span className="suggestion-icon">{suggestion.icon}</span>
<span className="suggestion-text">{suggestion.label}</span>
</button>
))}
</div>
<div className="input-hint">
Or type your own question...
</div>
</div>
);
}
Context-Aware Suggestions
| Context | Suggested Prompts |
|---|---|
| Code editor | ”Explain this function”, “Find bugs”, “Add tests” |
| Document editor | ”Summarize this”, “Make it shorter”, “Improve clarity” |
| Data analysis | ”What are the trends?”, “Find anomalies”, “Create visualization” |
| Support chat | ”How do I…”, “What’s the status of…”, “I need help with…” |
Progressive Fidelity
The Pattern
Show lower-quality results immediately, refine as generation completes:
Immediate: Skeleton/placeholder
↓
Early: Rough outline or structure
↓
Mid: Partially complete content
↓
Final: Complete, polished response
Implementation for Images
function ProgressiveImage({ generation }: { generation: ImageGeneration }) {
return (
<div className="image-container">
{/* Low-res preview during generation */}
{generation.preview && (
<img
src={generation.preview}
className="image-preview blur-sm"
alt="Generating..."
/>
)}
{/* Progress indicator */}
{!generation.complete && (
<div className="progress-overlay">
<div className="progress-bar" style={{ width: `${generation.progress}%` }} />
<span className="progress-text">{generation.progress}% complete</span>
</div>
)}
{/* Final image */}
{generation.complete && (
<img
src={generation.final}
className="image-final"
alt={generation.alt}
/>
)}
</div>
);
}
What to Avoid
Fake Progress Bars
// ❌ BAD: Fake progress that doesn't match reality
function FakeProgress() {
const [progress, setProgress] = useState(0);
useEffect(() => {
const interval = setInterval(() => {
// This is lying to users
setProgress(p => Math.min(p + 5, 95));
}, 500);
return () => clearInterval(interval);
}, []);
return <ProgressBar value={progress} />;
}
// ✅ GOOD: Honest state indicator
function HonestState({ stage }: { stage: string }) {
return (
<div className="state-indicator">
<ThinkingAnimation />
<span>{stage}</span>
</div>
);
}
Misleading Time Estimates
| ❌ Bad | ✅ Good |
|---|---|
| ”Almost done!” (after 10 seconds) | “This usually takes 10-20 seconds" |
| "Just a moment” (takes 30 seconds) | “Analyzing your documents…” |
| Progress bar stuck at 99% | Thinking animation with context |
Silent Waiting
| ❌ Bad | ✅ Good |
|---|---|
| Blank screen | ”Thinking about your question…” |
| Static spinner | Animated state with status text |
| No feedback | Cancel button visible |
Re-stating Pattern
Why It Matters
LLMs synthesize from context. Users benefit from seeing what the AI understood:
function RestatementConfirmation({
userInput,
aiUnderstanding
}: {
userInput: string;
aiUnderstanding: string;
}) {
return (
<div className="restatement">
<div className="user-input">
<span className="label">You asked:</span>
<span className="content">{userInput}</span>
</div>
<div className="ai-understanding">
<span className="label">I understood this as:</span>
<span className="content">{aiUnderstanding}</span>
</div>
<div className="actions">
<button onClick={proceed}>Yes, continue</button>
<button onClick={clarify}>No, let me clarify</button>
</div>
</div>
);
}
When to Use
- Complex multi-part requests
- Ambiguous queries
- High-stakes actions (deletions, payments)
- When misunderstanding is costly
Accessibility Considerations
Screen Reader Support
<div
role="status"
aria-live="polite"
aria-busy="true"
aria-label="AI is thinking about your question"
>
<span class="visually-hidden">
Processing your request. This usually takes 10-20 seconds.
</span>
<ThinkingAnimation aria-hidden="true" />
</div>
Keyboard Navigation
function AIPrompt() {
return (
<div>
<textarea
aria-label="Ask AI a question"
placeholder="Ask me anything..."
onKeyDown={(e) => {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault();
submit();
}
if (e.key === 'Escape') {
cancel();
}
}}
/>
<button aria-label="Cancel AI request" onClick={cancel}>
Cancel
</button>
</div>
);
}
Implementation Checklist
Loading States
- Distinct processing vs. generating states
- Smooth, continuous animations
- Context-aware status text
- Time expectations when known
- Cancel/stop button visible
Streaming
- Stream responses as they generate
- Cursor/typing indicator during stream
- Smooth text appearance
- Actions appear on completion
Empty States
- Suggested prompts for common tasks
- Context-aware suggestions
- Easy prompt composition
- Clear call to action
Accessibility
- Screen reader announcements
- Keyboard navigation
- Focus management
- Motion preferences respected
FAQ
How long is too long for a loading state?
Users tolerate 30+ seconds if they understand why. The key is setting expectations and showing progress. After 60 seconds, offer a “still working” message or option to continue in background.
Should I always stream responses?
For text output, yes. For structured data (JSON, tables), consider showing a skeleton first, then revealing complete data. Streaming partial structured data can cause layout shifts.
How do I handle errors during generation?
Show inline error with retry option. Preserve any partial response if useful. Don’t make users re-type their prompt.
What about slow connections?
Streaming helps here too—users see something immediately. Consider lower-quality initial responses that upgrade when bandwidth allows.
Should I show exact token counts or timing?
Only for power users or developer tools. Regular users don’t care about tokens—they care about getting helpful responses.
Sources & Further Reading
- AI Loading States Patterns — Comprehensive pattern library
- LLM Design Patterns — Re-stating and other patterns
- In The Pocket AI Interactions — Interaction guidelines
- Cloudscape GenAI Loading — AWS design system patterns
- Loading States Guide — General loading best practices
- Microinteractions — Related: animation timing
Interested in our research?
We share our work openly. If you'd like to collaborate or discuss ideas — we'd love to hear from you.
Get in Touch