Page MenuHomeDevCentral

WIP: support Grapheme functions for UTF-8 strings
DraftPublic

Authored by dereckson on Feb 21 2022, 23:04.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Apr 23, 07:23
Unknown Object (File)
Tue, Apr 23, 07:23
Unknown Object (File)
Mon, Apr 22, 18:05
Unknown Object (File)
Sun, Apr 21, 20:29
Unknown Object (File)
Sun, Apr 21, 09:28
Unknown Object (File)
Thu, Apr 18, 21:02
Unknown Object (File)
Wed, Apr 17, 23:38
Unknown Object (File)
Tue, Apr 16, 17:47
Subscribers
None
This is a draft revision that has not yet been submitted for review.

Details

Reviewers
None
Summary

The intl extension supports the grapheme concept from Unicode,
while the mbstring extension handle UT8 codepoints.

An emoji like 🏴󠁧󠁢󠁥󠁮󠁧󠁿 has 28 bytes, 7 codepoints (tags E N G L A N D),
1 grapheme. That could affects method like substr or strlen when
we want to manipulate graphemes and not codepoints.

Strategy is to offer bytes/codepoints/graphemes capabilities,
downgrade from graphemes to codepoints for non UTF-8 encoding,
and defaults to grapheme.

Diff Detail

Repository
rKERUALD Keruald libraries development repository
Lint
No Lint Coverage
Unit
No Test Coverage
Branch
mbstring-to-grapheme
Build Status
Buildable 3987
Build 4239: arc lint + arc unit

Event Timeline

dereckson held this revision as a draft.