Page MenuHomeDevCentral

Update word tokenizer
ClosedPublic

Authored by dereckson on Feb 27 2018, 21:19.
Tags
None
Referenced Files
F10184156: D1353.id3462.diff
Mon, Jun 23, 12:23
F10182646: D1353.id3462.diff
Mon, Jun 23, 09:16
F10182075: D1353.diff
Mon, Jun 23, 08:03
Unknown Object (File)
Tue, Jun 17, 19:38
Unknown Object (File)
Sat, Jun 14, 04:03
Unknown Object (File)
Wed, Jun 11, 16:19
Unknown Object (File)
Wed, Jun 11, 07:54
Unknown Object (File)
Wed, Jun 11, 02:36
Subscribers
None

Details

Summary

Switch to the treebank tokenizer to cut a sentence into words.

PunktWordTokenizer is now considered as an internal part of punkt,
and not a public API class.

See https://github.com/nltk/nltk/commit/0b91a7160717faa2fe93d42f7c6bba735f6dd48a

Test Plan

Tested with Ada Palmer, the Will to Battle

Diff Detail

Lint
No Lint Coverage
Unit
No Test Coverage
Branch
update-tokenizer (branched from master)
Build Status
Buildable 2132
Build 2380: arc lint + arc unit

Event Timeline

dereckson created this revision.
This revision is now accepted and ready to land.Feb 27 2018, 21:20