HomeDevCentral

Support Grapheme functions for UTF-8 strings

Description

Support Grapheme functions for UTF-8 strings

Summary:
The intl extension supports the grapheme concept from Unicode,
while the mbstring extension handle UT8 codepoints.

An emoji like 🏴󠁧󠁢󠁥󠁮󠁧󠁿 has 28 bytes, 7 codepoints (tags E N G L A N D),
1 grapheme. That could affects method like substr or strlen when
we want to manipulate graphemes and not codepoints.

Strategy is to offer bytes/codepoints/graphemes capabilities,
downgrade from graphemes to codepoints for non UTF-8 encoding,
and defaults to grapheme.

Test Plan: Tests added for new methods.

Reviewers: dereckson

Reviewed By: dereckson

Differential Revision: https://devcentral.nasqueron.org/D2550

Details

Provenance
derecksonAuthored on Nov 11 2024, 22:32
derecksonPushed on Nov 17 2024, 00:38
Reviewer
dereckson
Differential Revision
D2550: Support Grapheme functions for UTF-8 strings
Parents
rKERUALD82dd44cdfab7: Print triggered deprecations, errors, notices, warnings running tests
Branches
Unknown
Tags
Unknown
References
tag: omnitools/0.13.0