HomeDevCentral

Support Grapheme functions for UTF-8 strings

Description

Support Grapheme functions for UTF-8 strings

Summary:
The intl extension supports the grapheme concept from Unicode,
while the mbstring extension handle UT8 codepoints.

An emoji like 🏴󠁧󠁢󠁥󠁮󠁧󠁿 has 28 bytes, 7 codepoints (tags E N G L A N D),
1 grapheme. That could affects method like substr or strlen when
we want to manipulate graphemes and not codepoints.

Strategy is to offer bytes/codepoints/graphemes capabilities,
downgrade from graphemes to codepoints for non UTF-8 encoding,
and defaults to grapheme.

Test Plan: Tests added for new methods.

Reviewers: dereckson

Reviewed By: dereckson

Differential Revision: https://devcentral.nasqueron.org/D2550

Details

Provenance
derecksonAuthored on Nov 11 2024, 22:32
derecksonPushed on Nov 17 2024, 00:43
Reviewer
dereckson
Differential Revision
D2550: Support Grapheme functions for UTF-8 strings
Parents
rKOTb418fdefd1b0: Introduce Result class with Ok and Err types for status handling
Branches
Unknown
Tags
Unknown
References
HEAD -> main