Page MenuHomeDevCentral

WIP: support Grapheme functions for UTF-8 strings
DraftPublic

Authored by dereckson on Feb 21 2022, 23:04.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Sep 6, 19:40
Unknown Object (File)
Fri, Sep 6, 15:47
Unknown Object (File)
Tue, Sep 3, 17:32
Unknown Object (File)
Sat, Aug 31, 19:19
Unknown Object (File)
Mon, Aug 26, 15:27
Unknown Object (File)
Tue, Aug 20, 19:20
Unknown Object (File)
Tue, Aug 20, 19:11
Unknown Object (File)
Tue, Aug 20, 18:40
Subscribers
None
This is a draft revision that has not yet been submitted for review.

Details

Reviewers
None
Summary

The intl extension supports the grapheme concept from Unicode,
while the mbstring extension handle UT8 codepoints.

An emoji like 🏴󠁧󠁢󠁥󠁮󠁧󠁿 has 28 bytes, 7 codepoints (tags E N G L A N D),
1 grapheme. That could affects method like substr or strlen when
we want to manipulate graphemes and not codepoints.

Strategy is to offer bytes/codepoints/graphemes capabilities,
downgrade from graphemes to codepoints for non UTF-8 encoding,
and defaults to grapheme.

Diff Detail

Repository
rKERUALD Keruald libraries development repository
Lint
No Lint Coverage
Unit
No Test Coverage
Branch
mbstring-to-grapheme
Build Status
Buildable 3987
Build 4239: arc lint + arc unit

Event Timeline

dereckson held this revision as a draft.