Page MenuHomeDevCentral

WIP: support Grapheme functions for UTF-8 strings
DraftPublic

Authored by dereckson on Feb 21 2022, 23:04.
Tags
None
Referenced Files
Unknown Object (File)
Wed, May 29, 03:28
Unknown Object (File)
Mon, May 27, 19:28
Unknown Object (File)
Sun, May 26, 17:34
Unknown Object (File)
Sun, May 26, 16:23
Unknown Object (File)
Sat, May 25, 20:54
Unknown Object (File)
Thu, May 23, 05:51
Unknown Object (File)
Wed, May 22, 13:16
Unknown Object (File)
Tue, May 21, 20:09
Subscribers
None
This is a draft revision that has not yet been submitted for review.

Details

Reviewers
None
Summary

The intl extension supports the grapheme concept from Unicode,
while the mbstring extension handle UT8 codepoints.

An emoji like 🏴󠁧󠁢󠁥󠁮󠁧󠁿 has 28 bytes, 7 codepoints (tags E N G L A N D),
1 grapheme. That could affects method like substr or strlen when
we want to manipulate graphemes and not codepoints.

Strategy is to offer bytes/codepoints/graphemes capabilities,
downgrade from graphemes to codepoints for non UTF-8 encoding,
and defaults to grapheme.

Diff Detail

Repository
rKERUALD Keruald libraries development repository
Lint
No Lint Coverage
Unit
No Test Coverage
Branch
mbstring-to-grapheme
Build Status
Buildable 3987
Build 4239: arc lint + arc unit

Event Timeline

dereckson held this revision as a draft.