主要内容

characterCategories

Unicode character categories

    Description

    example

    ucats= characterCategories(str32)returns the major Unicode character categories for the characters in theUTF32objectstr.

    example

    ucats= characterCategories(str32,'Granularity',granularity)also specifies the granularity of the returned categories. For example,characterCategories(str32,'Granularity','detailed')returns detailed Unicode character categories.

    Examples

    collapse all

    Convert the string“你好!”to its Unicode UTF-32 string representation using thetextanalytics.unicode.UTF32function.

    str =“你好!”; str32 = textanalytics.unicode.UTF32(str)
    str32 = UTF32 with properties: Data: [72 101 108 108 111 33 32 128512]

    Get the Unicode character categories ofstr32using thecharacterCategoriesfunction.

    ucats = characterCategories(str32)
    ucats =1x1 cell array{[L L L L L P Z S]}

    The Unicode character categories "L", "P", "Z", and "S" correspond to "letter", "punctuation", "separator", and "symbol", respectively.

    Convert the string“你好!”to its Unicode UTF-32 string representation using thetextanalytics.unicode.UTF32function.

    str =“你好!”; str32 = textanalytics.unicode.UTF32(str)
    str32 = UTF32 with properties: Data: [72 101 108 108 111 33 32 128512]

    Get the Unicode character categories ofstr32using thecharacterCategoriesfunction. To return detailed Unicode character categories, set the'Granularity'option to'detailed'.

    ucats = characterCategories(str32,'Granularity','detailed')
    ucats =1x1 cell array{[Lu Ll Ll Ll Ll Po Zs So]}

    The Unicode character categories "Lu", "Ll", "Po", "Zs", and "So" correspond to "uppercase letter", "lowercase letter", "other punctuation", "space separator", and "other symbol", respectively.

    Input Arguments

    collapse all

    UTF-32 string representation, specified as aUTF32array.

    Granularity of returned Unicode character categories, specified as one of the following:

    • 'major'– Return the major Unicode character category. This includes the first character of the Unicode character category only.

    • 'detailed'– Return detailed Unicode character codes. This includes all characters of the Unicode character category.

    Output Arguments

    collapse all

    Unicode character categories, returned as a cell array of categorical vectors.

    This table shows the major and detailed Unicode character categories. To specify which granularity of Unicode character categories to return, use theGranularityoption.

    Major Character Category Major Character Category Description Detailed Character Category Detailed Character Category Description
    L Letter Lu 大写字母
    罗wercase letter
    Lt Titlecase letter
    Lm Modifier letter
    Other letter
    M Mark Mn Nonspacing mark
    Mc Spacing mark
    Me Enclosing mark
    N Number Nd Decimal number
    Nl Letter number
    No Other number
    P Punctuation Pc Connector punctuation
    Pd Dash punctuation
    Ps Open punctuation
    Pe Close punctuation
    Pi Initial punctuation
    Pf Final punctuation
    Po Other punctuation
    S Symbol Sm Math symbol
    Sc Currency symbol
    Sk Modifier symbol
    So Other symbol
    Z Separator Zs Space separator
    Zl Line separator
    Zp Paragraph separator
    C Other Cc Control
    Cf Format
    Cs Surrogate
    Co Private use
    Cn Unassigned

    References

    [1]Unicode® Standard Annex #44 Unicode Character Databasehttps://www.unicode.org/reports/tr44/

    Version History

    Introduced in R2021a