Page MenuHomeDevCentral

WIP: support Grapheme functions for UTF-8 strings
DraftPublic

Authored by dereckson on Feb 21 2022, 23:04.
Tags
None
Referenced Files
F2937011: D2550.id6432.diff
Mon, May 13, 13:18
F2934327: D2550.diff
Mon, May 13, 08:07
Unknown Object (File)
Mon, May 13, 00:47
Unknown Object (File)
Sun, May 12, 20:33
Unknown Object (File)
Sat, May 11, 13:52
Unknown Object (File)
Sat, May 11, 13:29
Unknown Object (File)
Thu, May 9, 08:15
Unknown Object (File)
Thu, May 9, 01:22
Subscribers
None
This is a draft revision that has not yet been submitted for review.

Details

Reviewers
None
Summary

The intl extension supports the grapheme concept from Unicode,
while the mbstring extension handle UT8 codepoints.

An emoji like 🏴󠁧󠁢󠁥󠁮󠁧󠁿 has 28 bytes, 7 codepoints (tags E N G L A N D),
1 grapheme. That could affects method like substr or strlen when
we want to manipulate graphemes and not codepoints.

Strategy is to offer bytes/codepoints/graphemes capabilities,
downgrade from graphemes to codepoints for non UTF-8 encoding,
and defaults to grapheme.

Diff Detail

Repository
rKERUALD Keruald libraries development repository
Lint
No Lint Coverage
Unit
No Test Coverage
Branch
mbstring-to-grapheme
Build Status
Buildable 3987
Build 4239: arc lint + arc unit

Event Timeline

dereckson held this revision as a draft.