You see "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA…" in HTML. You see Base64 in API responses, JWT tokens, and email attachments. What is it actually doing? Base64 encoding turns binary data into safe-for-text-channels strings, paying a 33% size penalty for the convenience.

The problem Base64 solves

Many text-based channels (email, JSON, URLs, HTML) can't safely transmit arbitrary binary bytes. Some bytes are "control characters" that break the format. Some are filtered or modified by intermediate systems.

If you want to send a binary file (image, audio, encrypted data) through a text-only channel, you need to convert it to a sequence of safe printable characters first. That's encoding. Base64 is one of many encoding schemes; it's the most common.

How Base64 works

Base64 uses 64 distinct ASCII characters: a–z (26), A–Z (26), 0–9 (10), and + and / (2). 64 characters = 2⁶ = each Base64 char represents 6 bits.

To encode 3 bytes (24 bits) of input:

  1. Split into 4 groups of 6 bits.
  2. Convert each 6-bit group to a Base64 character.
  3. Output 4 characters of Base64 from 3 bytes of input.

This is the source of Base64's 4/3 ratio — output is 33% larger than input.

Worked example

Encode the byte sequence 0x4D 0x61 0x6E (which spells "Man" in ASCII).

  1. 0x4D 0x61 0x6E in binary: 01001101 01100001 01101110
  2. Group into 6-bit chunks: 010011 010110 000101 101110
  3. Each 6-bit chunk in decimal: 19, 22, 5, 46
  4. Look up Base64 alphabet: T, W, F, u
  5. Result: TWFu

"Man" → "TWFu" in Base64. 3 bytes input → 4 chars output.

Padding

If your input isn't a multiple of 3 bytes, Base64 pads with "=" characters:

  • 1 byte input: 4 chars output, 2 = padding (e.g., "QQ==")
  • 2 bytes input: 4 chars output, 1 = padding (e.g., "QUI=")
  • 3 bytes input: 4 chars output, no padding

The = isn't a Base64 alphabet character — it just signals "ignore this position."

The 33% size penalty

3 bytes input becomes 4 chars output. 4/3 = 1.33, so output is 33.3% larger than input.

Why this matters:

  • Email attachments: a 1 MB photo becomes a 1.33 MB Base64 string in the email body.
  • HTML data URIs: embedding a 100 KB image inflates the HTML by 133 KB.
  • API JSON responses: a Base64-encoded binary blob takes 33% more bandwidth.

For small embedded items (icons, small images), this is fine. For large blobs, it's wasteful — a separate file URL is better.

Where you'll see Base64

Email attachments: MIME standard uses Base64 to encode attached files. This is why email files appear larger than the actual content.

HTML data URIs: "data:image/png;base64,..." embeds an image directly in HTML. Saves a network request but inflates HTML size.

JSON Web Tokens (JWTs): the three parts (header, payload, signature) are each Base64-URL encoded. JWTs are usually 200–500 chars total.

API responses: when an API returns binary data inside JSON, it's Base64. Public keys, certificates, document files, etc.

Authentication tokens: some auth systems Base64-encode the username:password pair (HTTP Basic Auth).

OAuth state parameters: often Base64-URL encoded for safe URL transit.

Cryptographic operations: public keys, signatures, and certificates are typically presented in Base64 (the "PEM" format wraps Base64 in BEGIN/END markers).

Base64 vs Base64URL

Standard Base64 uses + and /. These characters have special meanings in URLs (% encoding).

Base64URL is a variant designed for URLs:

  • + → -
  • / → _
  • = padding often dropped (decoders handle missing padding)

JWTs and OAuth use Base64URL. Standard Base64 is for non-URL contexts.

When NOT to use Base64

  • Large files: 33% overhead is meaningful. Use proper file uploads with multipart/form-data.
  • Encryption: Base64 is not encryption — it's encoding. Anyone can decode it back. Don't think Base64 = secure.
  • Compressed data: Base64-encoding compressed data adds overhead without security benefit. Send binary directly when possible.
  • Frequent network transfers: the 33% overhead adds up.

When Base64 makes sense

  • Small embedded items: 1 KB icons in HTML/CSS save a network request.
  • Plain-text channels: email, copy-paste, anything not built for binary.
  • JSON-friendly: when binary must live in JSON.
  • Cryptographic notation: public keys, certificates, signatures all use Base64 conventionally.

Performance considerations

Base64 encoding/decoding is fast — modern CPUs handle hundreds of MB/sec. The performance cost is usually negligible.

The bandwidth overhead (33% larger payloads) is the main practical concern. Not the CPU time.

Encoding alternatives

  • Hex (Base16): 50% size penalty (each byte = 2 hex chars). Used in older protocols.
  • Base85: ~25% penalty. Used in Adobe PDF.
  • BinHex: Mac-only, historical.
  • UUencoding: historical Unix; replaced by Base64.

Base64 won because of its simplicity, wide library support, and tolerable size penalty.

Estimate the size

Our Base64 length calculator takes a raw byte count and returns the Base64-encoded length. Useful for sizing API payloads, calculating data URI overhead, or sanity-checking storage estimates.