Main Content

nt2int

将核苷酸sequence from letter to integer representation

Syntax

SeqInt= nt2int(SeqChar)
SeqInt= nt2int(SeqChar,……”Unknown',UnknownValue,……)
SeqInt= nt2int(SeqChar,……”ACGTOnly',ACGTOnlyValue,……)

Input Arguments

SeqChar

One of the following:

UnknownValue Integer to represent unknown nucleotides. Choices are integers ≥0and ≤255. Default is0.
ACGTOnlyValue Controls the prohibition of ambiguous nucleotides. Choices aretrueorfalse(默认)。如果ACGTOnlyValueistrue, you can enter only the charactersA,C,G,T, andU.

Output Arguments

SeqInt Nucleotide sequence specified by a row vector of integers.

Description

SeqInt= nt2int(SeqChar)convertsSeqChar, a character vector or string specifying a nucleotide sequence, toSeqInt, a row vector of integers specifying the same nucleotide sequence. For valid codes, see the tableMapping Nucleotide Letter Codes to Integers. Unknown characters (characters not in the table) are mapped to0. Gaps represented with hyphens are mapped to16.

SeqInt= nt2int(SeqChar,……”PropertyName',PropertyValue,……)callsnt2intwith optional properties that use property name/property value pairs. You can specify one or more properties in any order. EachPropertyNamemust be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

SeqInt= nt2int(SeqChar,……”Unknown',UnknownValue,……)specifies an integer to represent unknown nucleotides.UnknownValuecan be an integer ≥0and ≤255. Default is0.

SeqInt= nt2int(SeqChar,……”ACGTOnly',ACGTOnlyValue,……)controls the prohibition of ambiguous nucleotides (N,R,Y,K,M,S,W,B,D,H, andV). Choices aretrueorfalse(默认)。如果ACGTOnlyValueistrue, you can enter only the charactersA,C,G,T, andU.

Mapping Nucleotide Letter Codes to Integers

Nucleotide Code Integer
Adenosine A 1
Cytidine C 2
Guanine G 3
Thymidine T 4
Uridine (if'Alphabet'set to'RNA') U 4
Purine (AorG) R 5
Pyrimidine (TorC) Y 6
Keto (GorT) K 7
Amino (AorC) M 8
Strong interaction (3 H bonds) (GorC) S 9
Weak interaction (2 H bonds) (AorT) W 10
NotA(CorGorT) B 11
NotC(AorGorT) D 12
NotG(AorCorT) H 13
NotTorU(AorCorG) V 14
Any nucleotide (AorCorGorTorU) N 15
Gap of indeterminate length - 16
Unknown (any character not in table) * 0(default)

Examples

Example 44. Converting a Simple Sequence

Convert a nucleotide sequence from letters to integers.

s = nt2int('ACTGCTAGC') s = 1 2 4 3 2 4 1 3 2
Example 45. Converting a Random Sequence
  1. Create a random character vector to represent a nucleotide sequence.

    SeqChar = randseq(20) SeqChar = TTATGACGTTATTCTACTTT
  2. Convert the nucleotide sequence from letter to integer representation.

    SeqInt = nt2int(SeqChar) SeqInt = Columns 1 through 13 4 4 1 4 3 1 2 3 4 4 1 4 4 Columns 14 through 20 2 4 1 2 4 4 4

Version History

Introduced before R2006a