GSM-7 vs UCS-2 SMS Encoding — Reduce Costs

Tags: gsm-7 encoding, ucs-2 encoding, sms segments, sms deliverability, sms cost optimization, character encoding

Every SMS message is encoded into a specific character set before it travels across carrier networks. The two encoding standards that govern virtually all SMS traffic are GSM-7 and UCS-2, and the difference between them has a direct, measurable impact on messaging costs, deliverability, and campaign performance. Understanding GSM-7 vs UCS-2 SMS encoding is not a purely technical exercise — it is a practical skill that can reduce per-message spend and help prevent silent delivery failures.

This guide breaks down how each encoding works, why a single misplaced character can double your costs, and what steps keep messages lean and deliverable.

What Is SMS Character Encoding?

SMS was designed in the 1980s with strict constraints. Each message is transmitted in a protocol data unit (PDU) that allows a maximum payload of 140 bytes. Character encoding determines how written text gets translated into those bytes, and different encoding schemes use different amounts of space per character.

Two encoding schemes are relevant to modern SMS:

GSM-7 — A 7-bit encoding that supports 128 standard characters plus an extended set. Because each character uses only 7 bits, up to 160 characters fit in a single 140-byte SMS segment.
UCS-2 — A 16-bit encoding (a subset of Unicode) that supports tens of thousands of characters, including non-Latin scripts, emoji, and special symbols. Each character uses 16 bits (2 bytes), limiting a single segment to just 70 characters.

Encoding is not something senders typically choose manually. It is determined automatically based on the characters present in the message. If every character belongs to the GSM-7 character set, the message is encoded as GSM-7. If even one character falls outside that set, the entire message falls back to UCS-2.

The GSM-7 Character Set: What Is Included

The GSM 03.38 standard defines the default alphabet used in GSM-7 encoding. It includes:

All uppercase and lowercase Latin letters (A–Z, a–z)
Digits 0–9
Common punctuation: . , : ; ! ? ' " ( ) / - + = < > & @ # * %
Whitespace and newline characters
Currency symbols: $ £ ¥ €
A small set of Greek letters used in technical contexts (Δ, Φ, Γ, Λ, Ω, Π, Ψ, Σ, Θ, Ξ)

There is also a GSM-7 extended table that includes characters like { } [ ] ~ \ ^ | €. These extended characters are important to understand because each one consumes two character slots — they require an escape character prefix — effectively counting as two characters toward the 160-character limit.

Characters That Commonly Trigger UCS-2 Fallback

The most frequent culprits that force a message from GSM-7 to UCS-2 are characters that look innocuous but fall outside the GSM-7 set:

Character	Description	Common Source
" "	Smart (curly) quotes	Word processors, CMS editors, copy-paste from Google Docs
' '	Smart apostrophes	Auto-correct in most text editors
—	Em dash	Word processors, some CMS platforms
–	En dash	Word processors
…	Ellipsis (single character)	Auto-correct replacing three periods
😀 🔥 🎉	Emoji	Intentional use or copy-paste
© ® ™	Legal symbols	Brand copy, legal disclaimers
é ñ ü ö	Accented Latin characters (most)	Names, loanwords, international content

The smart quote problem is particularly insidious. A marketer drafts a message in Google Docs or Microsoft Word, copies it into an SMS platform, and unknowingly includes curly quotes that force the entire message into UCS-2 encoding. What was a single-segment message at 155 characters suddenly becomes a three-segment message.

How Encoding Affects Message Segments and Cost

SMS billing is based on segments, not messages. When a message exceeds the character limit for a single segment, it is split into multiple segments, and each one is billed separately. The segment limits differ significantly between encodings:

Encoding	Single Segment Limit	Multi-Segment Limit (per segment)	Reason for Reduced Multi-Segment Limit
GSM-7	160 characters	153 characters	7 characters reserved for User Data Header (UDH) for concatenation
UCS-2	70 characters	67 characters	3 characters (6 bytes) reserved for UDH

The User Data Header (UDH) is metadata included in each segment of a multi-part message that tells the receiving device how to reassemble the segments in the correct order. This overhead reduces the usable character count per segment.

Cost Comparison: A Practical Example

Consider a promotional message that reads:

Flash sale — 30% off all items today only. Use code SAVE30 at checkout. Shop now: https://example.com/sale. Reply STOP to opt out.

This message is 138 characters. With GSM-7 encoding, it fits in a single segment. But notice the em dash (—) after "sale." That single character forces the entire message into UCS-2 encoding. At 138 characters under UCS-2, the message requires three segments (67 + 67 + 4).

If this message is sent to 100,000 subscribers at a per-segment cost of $0.01:

Scenario	Segments per Message	Total Segments	Total Cost
GSM-7 (replace — with -)	1	100,000	$1,000
UCS-2 (with em dash)	3	300,000	$3,000

A single character tripled the campaign cost. At scale, these encoding mistakes compound into significant budget waste. For a deeper look at measuring and protecting SMS investment, see the guide on how to calculate and maximize SMS marketing ROI.

How Encoding Affects Deliverability

The cost implications are straightforward, but encoding also affects whether messages arrive at all. Several mechanisms link encoding choices to deliverability outcomes.

Multi-Segment Reassembly Failures

When a message is split into multiple segments, each segment travels independently across the carrier network. The receiving device reassembles them using the UDH metadata. In most cases this works seamlessly, but there are failure modes:

One or more segments may be dropped by an intermediate carrier, resulting in an incomplete message displayed to the recipient.
Segments may arrive out of order on older devices or in regions with less reliable infrastructure.
Some carrier gateways impose limits on the number of concatenated segments they will process, particularly for international routes.

Every additional segment increases the probability of a partial delivery. A single-segment GSM-7 message has the highest likelihood of arriving intact.

Carrier Filtering and Throughput

Carrier spam filters evaluate messages at the segment level. A three-segment message generates three times the carrier traffic of a single-segment message, which can affect throughput rates and increase the chance of triggering rate-based filtering. High segment volumes from a single sender in a short window can resemble spam behavior to carrier systems.

For a comprehensive look at the factors that determine whether messages reach the inbox, read the guide to SMS deliverability.

Device and Network Compatibility

While UCS-2 is widely supported on modern smartphones, some IoT devices, older feature phones, and certain MVNO network configurations may not handle UCS-2 messages correctly. In markets where feature phones still represent a meaningful share of the subscriber base, GSM-7 is the safer choice for maximum reach.

Common Encoding Pitfalls and How to Avoid Them

Most encoding problems are preventable. Below are the most common mistakes and their solutions.

1. Copy-Pasting from Rich Text Editors

Word processors and design tools like Google Docs, Microsoft Word, Notion, and Figma automatically replace straight quotes with curly quotes, three periods with a single ellipsis character, and hyphens with em or en dashes. When text is copied from these tools into an SMS composer, these invisible substitutions come along.

Solution: Draft SMS copy in a plain text editor, or use a tool that strips non-GSM-7 characters. Some SMS platforms include built-in encoding validation that flags problematic characters before send.

2. Dynamic Content Injection

Personalization tokens like {first_name} or {city} pull data from a contact database. If a subscriber's name contains accented characters (José, François, Müller), the dynamically inserted content will force the entire message into UCS-2.

Solution: Audit contact data for non-GSM-7 characters in fields used for personalization. Consider maintaining a "display name" field that has been sanitized for SMS use, or implement fallback logic that uses a generic greeting when a name contains incompatible characters.

3. Emoji Use Without Segment Awareness

Emoji are Unicode characters and always trigger UCS-2 encoding. A single emoji in an otherwise GSM-7 message drops segment capacity from 160 to 70 characters. Some emoji also consume more than one UCS-2 character slot due to surrogate pair encoding — many emoji use 4 bytes, counting as 2 UCS-2 characters.

Solution: If emoji are desired, design the message from the start with the 70-character UCS-2 limit in mind. Keep the message concise enough to fit in a single UCS-2 segment. Do not add emoji as an afterthought to a message written for GSM-7 limits.

4. URL Shortener Output

Most URL shorteners produce links using only GSM-7-compatible characters, but some may include special characters in the path. Always verify that shortened URLs contain only standard alphanumeric characters, hyphens, and forward slashes.

5. Invisible Unicode Characters

Zero-width spaces, non-breaking spaces, byte order marks, and other invisible Unicode characters can be introduced through copy-paste operations. These characters are not visible in the message composer but will trigger UCS-2 encoding. A hex editor or encoding validation tool is the only reliable way to detect them.

GSM-7 Extended Characters: The Hidden Cost

Even within GSM-7 encoding, certain characters consume more space than expected. The GSM-7 extended table includes these characters:

Character	GSM-7 Slots Used
{ }	2 each
[ ]	2 each
~	2
\	2
^	2
\|	2
€	2

Each extended character requires an escape sequence, so it occupies two of the 160 available character slots. A message with several curly braces or brackets — common in messages that include JSON-like syntax or code snippets — can push into multi-segment territory even though every character is technically GSM-7 compatible.

Segment Counting: Getting It Right

Accurate segment counting requires knowing three things:

The encoding — Is the message GSM-7 or UCS-2? This is determined by scanning every character in the message.
The effective character count — For GSM-7, extended characters count as 2. For UCS-2, surrogate pairs (many emoji) count as 2.
The segment boundaries — Single messages use the full payload (160 or 70). Multi-part messages use the reduced payload (153 or 67) for every segment, including the first.

A common mistake is assuming that a 161-character GSM-7 message uses one full segment plus one character. In reality, once the message exceeds 160 characters, it becomes a concatenated message, and the per-segment limit drops to 153. A 161-character GSM-7 message requires two segments with a combined capacity of 306 characters — but both segments are billed.

A message at 161 GSM-7 characters costs twice as much as a message at 160 characters. That single extra character doubles the spend. Always verify segment boundaries before sending.

Trackly's deliverability tools include real-time GSM-7 encoding validation and segment counting directly in the message composer. As content is typed or pasted, the platform identifies any non-GSM-7 characters, highlights them, and shows the exact segment count — preventing encoding surprises from reaching subscribers and invoices.

When UCS-2 Is the Right Choice

Despite the cost and deliverability advantages of GSM-7, there are legitimate reasons to use UCS-2 encoding.

Non-Latin Scripts

If an audience reads Arabic, Chinese, Japanese, Korean, Hindi, Thai, or any other non-Latin script, UCS-2 is not optional — it is required. GSM-7 does not include these character sets. For international campaigns targeting these markets, segment costs should be budgeted accordingly, and messages should be written concisely to fit within the 70-character single-segment limit when possible.

Emoji as a Strategic Choice

Some brands find that emoji improve engagement rates enough to justify the additional segment cost. If testing shows that an emoji-containing message generates meaningfully higher click-through rates, the extra segment cost may be a worthwhile trade-off. The key is to make this a deliberate, data-informed decision rather than an accidental one.

A/B testing is the appropriate way to evaluate this trade-off. Test a GSM-7 version of a message against a UCS-2 version with emoji, and compare not just click-through rates but cost-per-conversion. For more on structuring effective SMS tests, see the SMS marketing best practices guide.

Building an Encoding-Aware Workflow

For teams sending SMS at scale, encoding awareness should be built into the content creation process rather than treated as an afterthought.

Pre-Send Checklist

Draft in plain text. Use a plain text editor or the SMS platform's built-in composer. Avoid drafting in rich text tools.
Validate encoding. Run the message through an encoding checker that identifies non-GSM-7 characters. Platforms like Trackly do this automatically.
Check segment count. Verify the segment count with all personalization tokens expanded to their maximum expected length. A {first_name} token that resolves to "Christopher" adds 11 characters.
Test dynamic content. Send test messages using contact records that contain accented characters, long names, and edge-case data to verify that the final rendered message stays within the target segment count.
Review shortened URLs. Confirm that tracking links and short URLs contain only GSM-7-compatible characters.
Decide on emoji intentionally. If emoji are included, rewrite the message to fit within UCS-2 segment limits. Do not add emoji to a message written for GSM-7 limits.

Template Standards

Establishing message templates with built-in character budgets prevents common overruns. For a single-segment GSM-7 message with a personalization token and a tracking link, a practical budget might look like:

Component	Character Budget
Greeting + personalization	20 characters
Message body	80 characters
Tracking URL	25 characters
Opt-out instruction	25 characters
Buffer for extended chars	10 characters
Total	160 characters

A standard template budget prevents the common pattern where a message is written, a link is added, an opt-out footer is appended, and the final result spills into a second segment.

Encoding and Regulatory Compliance

Regulatory requirements like TCPA opt-out language ("Reply STOP to unsubscribe") consume character space in every message. This compliance overhead is fixed — it cannot be shortened or removed. When planning a character budget, account for opt-out language first, then allocate the remaining space to actual content.

In UCS-2 messages, the opt-out instruction alone can consume 30 or more of the 70 available characters, leaving very little room for the message itself. This is another reason to prefer GSM-7 encoding when possible: it provides more usable space after compliance requirements are met.

Technical Reference: Encoding Detection Logic

For technical teams building or evaluating SMS systems, here is the logic that determines encoding:

Iterate through every character in the message.
Check each character against the GSM-7 default alphabet and extended table.
If all characters are in the GSM-7 set (default or extended), encode as GSM-7.
If any character is not in the GSM-7 set, encode the entire message as UCS-2.

There is no partial encoding. It is not possible to encode part of a message as GSM-7 and part as UCS-2. A single non-GSM-7 character forces the entire message — including all the characters that would have been fine in GSM-7 — into UCS-2. This all-or-nothing behavior is why a single smart quote can triple costs.

Segment Calculation Formulas

For GSM-7:

If effective character count ≤ 160: segments = 1
If effective character count > 160: segments = ceiling(effective character count / 153)

For UCS-2:

If effective character count ≤ 70: segments = 1
If effective character count > 70: segments = ceiling(effective character count / 67)

Note that "effective character count" accounts for GSM-7 extended characters (2 slots each) and UCS-2 surrogate pairs (2 slots each).

Key Takeaways

GSM-7 encoding provides 160 characters per segment. UCS-2 provides 70. A single non-GSM-7 character forces the entire message into UCS-2, potentially doubling or tripling segment count and cost. The most common culprits are smart quotes, em dashes, and emoji introduced through copy-paste from rich text editors.

Understanding SMS character encoding is one of the highest-leverage optimizations available to SMS marketers. It requires no changes to messaging strategy, audience targeting, or offer structure — just awareness of how characters map to bytes and bytes map to billing.

For teams sending SMS at any meaningful volume, encoding validation and segment counting should be part of the standard workflow. Trackly's built-in deliverability tools handle this automatically, flagging encoding issues and showing real-time segment counts during message composition so that costly mistakes are caught before they reach an audience.

GSM-7 vs UCS-2 Encoding: How Character Encoding Affects SMS Cost and Deliverability