Page MenuHomeDevCentral

WIP: support Grapheme functions for UTF-8 strings
DraftPublic

Authored by dereckson on Feb 21 2022, 23:04.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jul 25, 00:28
Unknown Object (File)
Wed, Jul 24, 17:37
Unknown Object (File)
Wed, Jul 24, 17:37
Unknown Object (File)
Wed, Jul 24, 16:29
Unknown Object (File)
Sat, Jul 20, 16:32
Unknown Object (File)
Sat, Jul 20, 15:37
Unknown Object (File)
Tue, Jul 16, 04:02
Unknown Object (File)
Sun, Jul 14, 06:52
Subscribers
None
This is a draft revision that has not yet been submitted for review.

Details

Reviewers
None
Summary

The intl extension supports the grapheme concept from Unicode,
while the mbstring extension handle UT8 codepoints.

An emoji like 🏴󠁧󠁢󠁥󠁮󠁧󠁿 has 28 bytes, 7 codepoints (tags E N G L A N D),
1 grapheme. That could affects method like substr or strlen when
we want to manipulate graphemes and not codepoints.

Strategy is to offer bytes/codepoints/graphemes capabilities,
downgrade from graphemes to codepoints for non UTF-8 encoding,
and defaults to grapheme.

Diff Detail

Repository
rKERUALD Keruald libraries development repository
Lint
No Lint Coverage
Unit
No Test Coverage
Branch
mbstring-to-grapheme
Build Status
Buildable 3987
Build 4239: arc lint + arc unit

Event Timeline

dereckson held this revision as a draft.