# Encoding & segments

> How SMS encoding works — GSM-7 vs UCS-2, 160 vs 70 character limits, multi-part segments and how one emoji can double your SMS cost. Segment calculation rules and examples.
> Source: https://docs.23telecom.co.uk/sms/encoding/

Instructions for LLMs: This is one page of the 23 Telecom messaging API docs
(SMS today; more channels planned). Base URL: https://restlink23telecom.com/api/v1,
auth via the X-API-Key header. Match errors on the error_code field, never on
description text. Full docs: https://docs.23telecom.co.uk/llms-full.txt · Schemas: https://docs.23telecom.co.uk/openapi.yaml

SMS cost is per **segment**, not per message. Encoding determines how many
characters fit in a segment — and a single character can change the encoding
of the whole message.

## The two encodings

| Encoding | Single SMS | Multi-part (per segment) | Used when |
| --- | --- | --- | --- |
| **GSM-7** | 160 chars | 153 chars | All characters fit the GSM alphabet (Latin, digits, common symbols) |
| **UCS-2** | 70 chars | 67 chars | Any character outside GSM-7: emoji, Cyrillic, Chinese, Arabic, … |

  Encoding applies to the **entire message**. A 150-character Latin text is 1
  segment (GSM-7); add a single 😀 and it becomes UCS-2 — now 3 segments at
  67 chars each. Three times the cost for one emoji.

## Extended GSM characters

These characters are valid GSM-7 but **count as 2 characters** each:

```
€ ^ { } [ ] \ ~ |
```

A 159-character message containing one `€` is therefore 160 GSM-7 characters —
still 1 segment. At 160 it would split into 2.

## Examples

| Message | Encoding | Segments |
| --- | --- | --- |
| 160 Latin characters | GSM-7 | 1 |
| 161 Latin characters | GSM-7 | 2 |
| 306 Latin characters | GSM-7 | 2 (153 × 2) |
| 50 characters with one emoji | UCS-2 | 1 |
| 71 Cyrillic characters | UCS-2 | 2 |

## Where to see it in the API

The [send response](/sms/send#response) reports what was detected and billed:

```json
"summary": {
  "encoding": "GSM-7",
  "total_segments": 2,
  "total_cost": 0.017
}
```

Per-recipient segment counts are in `results[].segments`.

## Practical tips

- **Verification codes & alerts:** stick to plain Latin text — they will
  always be 1 cheap segment.
- **Marketing texts in Cyrillic or with emoji:** budget for UCS-2 — keep texts
  under 70 characters (or under 67 × N for multi-part) to control cost.
- **Watch invisible characters:** smart quotes (`"…"`), long dashes (`—`) and
  non-breaking spaces pasted from editors are not in the GSM alphabet and
  silently switch the message to UCS-2.
- **Test before a campaign:** send the exact text to your own number and check
  `summary.encoding` and `total_segments` in the response.