Page MenuHomeDevCentral

Support Grapheme functions for UTF-8 strings
ClosedPublic

Authored by dereckson on Feb 21 2022, 23:04.
Tags
None
Referenced Files
F4037367: D2550.diff
Tue, Jan 21, 22:44
Unknown Object (File)
Sun, Jan 19, 17:10
Unknown Object (File)
Fri, Jan 17, 21:27
Unknown Object (File)
Thu, Jan 16, 06:22
Unknown Object (File)
Wed, Jan 15, 12:26
Unknown Object (File)
Wed, Jan 15, 12:26
Unknown Object (File)
Wed, Jan 15, 12:26
Unknown Object (File)
Wed, Jan 15, 09:48
Subscribers
None

Details

Summary

The intl extension supports the grapheme concept from Unicode,
while the mbstring extension handle UT8 codepoints.

An emoji like 🏴󠁧󠁢󠁥󠁮󠁧󠁿 has 28 bytes, 7 codepoints (tags E N G L A N D),
1 grapheme. That could affects method like substr or strlen when
we want to manipulate graphemes and not codepoints.

Strategy is to offer bytes/codepoints/graphemes capabilities,
downgrade from graphemes to codepoints for non UTF-8 encoding,
and defaults to grapheme.

Test Plan

Tests added for new methods.

Diff Detail

Repository
rKERUALD Keruald libraries development repository
Lint
Lint Errors
SeverityLocationCodeMessage
Erroromnitools/src/Strings/Multibyte/StringUtilities.php:14PHPCS.E.Generic.Files.LineLength.MaxExceededGeneric.Files.LineLength.MaxExceeded
Warningomnitools/tests/Strings/Multibyte/OmniStringTest.php:52PHPCS.W.Generic.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassAfterLastUsedGeneric.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassAfterLastUsed
Warningomnitools/tests/Strings/Multibyte/OmniStringTest.php:52PHPCS.W.Generic.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassAfterLastUsedGeneric.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassAfterLastUsed
Warningomnitools/tests/Strings/Multibyte/OmniStringTest.php:52PHPCS.W.Generic.Files.LineLength.TooLongGeneric.Files.LineLength.TooLong
Warningomnitools/tests/Strings/Multibyte/OmniStringTest.php:58PHPCS.W.Generic.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassAfterLastUsedGeneric.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassAfterLastUsed
Warningomnitools/tests/Strings/Multibyte/OmniStringTest.php:58PHPCS.W.Generic.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassBeforeLastUsedGeneric.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassBeforeLastUsed
Warningomnitools/tests/Strings/Multibyte/OmniStringTest.php:58PHPCS.W.Generic.Files.LineLength.TooLongGeneric.Files.LineLength.TooLong
Warningomnitools/tests/Strings/Multibyte/OmniStringTest.php:64PHPCS.W.Generic.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassBeforeLastUsedGeneric.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassBeforeLastUsed
Warningomnitools/tests/Strings/Multibyte/OmniStringTest.php:64PHPCS.W.Generic.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassBeforeLastUsedGeneric.CodeAnalysis.UnusedFunctionParameter.FoundInExtendedClassBeforeLastUsed
Warningomnitools/tests/Strings/Multibyte/OmniStringTest.php:64PHPCS.W.Generic.Files.LineLength.TooLongGeneric.Files.LineLength.TooLong
Warningomnitools/tests/Strings/Multibyte/OmniStringTest.php:84PHPCS.W.Generic.Files.LineLength.TooLongGeneric.Files.LineLength.TooLong
Warningomnitools/tests/Strings/Multibyte/OmniStringTest.php:91PHPCS.W.Generic.Files.LineLength.TooLongGeneric.Files.LineLength.TooLong
Unit
Tests Passed
Branch
mbstring-to-grapheme
Build Status
Buildable 5700
Build 5982: arc lint + arc unit

Event Timeline

dereckson held this revision as a draft.

We need unit tests for this change.

Spacing issues. Adding tests.

dereckson retitled this revision from WIP: support Grapheme functions for UTF-8 strings to Support Grapheme functions for UTF-8 strings.Nov 11 2024, 23:50
dereckson edited the test plan for this revision. (Show Details)
dereckson published this revision for review.Nov 17 2024, 00:37
dereckson accepted this revision.
This revision is now accepted and ready to land.Nov 17 2024, 00:37
This revision was landed with ongoing or failed builds.Nov 17 2024, 00:38
This revision was automatically updated to reflect the committed changes.