ML ArchitectureText → ImageMid creditEN

Decoder-Only LLM Architecture (GPT-Style)

Stacked decoder blocks with masked self-attention and a language modeling head.

When to use this prompt

For LLM / fine-tuning / instruction-tuning papers introducing or modifying a decoder-only model.

The prompt

A decoder-only transformer architecture in the style of GPT, drawn as a vertical stack with input at the bottom and output at the top.

Bottom: token embedding + sinusoidal positional encoding.

Middle: a stack of N decoder layers (N=12 for the figure). Each layer contains:
- Masked multi-head self-attention (12 heads)
- Add & LayerNorm
- Feed-forward MLP (hidden dim 3072)
- Add & LayerNorm

Show residual (skip) connections as curved dashed arcs around each sub-layer.

Top:
- Final LayerNorm
- Linear projection to vocab size
- Softmax to next-token probability distribution

Right margin: tensor shape annotations beside each block (B = batch, T = seq length, D = 768).
Style: clean academic vector, navy / teal accent, white background, sans-serif labels. Suitable for ICLR or NeurIPS.

Variations

With KV-cache annotation

Same architecture but annotate the KV-cache flow: highlight where keys and values are cached at each layer during autoregressive decoding. Add a side note showing how cache reuse skips re-computation across positions.

Tips

  • Specify N (number of layers) and D (hidden dim) explicitly — generic prompts produce generic counts.
  • Mention "masked" self-attention. Without it the figure may not show the causal triangle.
  • For causal attention masks, ask for a small triangular mask icon next to the attention block.

FAQ

Try this prompt now

Open it inside the generator with the prompt pre-filled.

Try this prompt

Related prompts