nt2int
将核苷酸sequence from letter to integer representation
Syntax
SeqInt
= nt2int(SeqChar
)SeqInt
= nt2int(SeqChar
,……”Unknown',UnknownValue
,……)SeqInt
= nt2int(SeqChar
,……”ACGTOnly',ACGTOnlyValue
,……)
Input Arguments
SeqChar |
One of the following:
|
UnknownValue |
Integer to represent unknown nucleotides. Choices are integers ≥0 and ≤255 . Default is0 . |
ACGTOnlyValue |
Controls the prohibition of ambiguous nucleotides. Choices aretrue orfalse (默认)。如果ACGTOnlyValue istrue , you can enter only the charactersA ,C ,G ,T , andU . |
Output Arguments
SeqInt |
Nucleotide sequence specified by a row vector of integers. |
Description
convertsSeqInt
= nt2int(SeqChar
)SeqChar
, a character vector or string specifying a nucleotide sequence, toSeqInt
, a row vector of integers specifying the same nucleotide sequence. For valid codes, see the tableMapping Nucleotide Letter Codes to Integers. Unknown characters (characters not in the table) are mapped to0
. Gaps represented with hyphens are mapped to16
.
callsSeqInt
= nt2int(SeqChar
,……”PropertyName
',PropertyValue
,……)nt2int
with optional properties that use property name/property value pairs. You can specify one or more properties in any order. EachPropertyName
must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:
specifies an integer to represent unknown nucleotides.SeqInt
= nt2int(SeqChar
,……”Unknown',UnknownValue
,……)UnknownValue
can be an integer ≥0
and ≤255
. Default is0
.
controls the prohibition of ambiguous nucleotides (SeqInt
= nt2int(SeqChar
,……”ACGTOnly',ACGTOnlyValue
,……)N
,R
,Y
,K
,M
,S
,W
,B
,D
,H
, andV
). Choices aretrue
orfalse
(默认)。如果ACGTOnlyValue
istrue
, you can enter only the charactersA
,C
,G
,T
, andU
.
Mapping Nucleotide Letter Codes to Integers
Nucleotide | Code | Integer |
---|---|---|
Adenosine | A |
1 |
Cytidine | C |
2 |
Guanine | G |
3 |
Thymidine | T |
4 |
Uridine (if'Alphabet' set to'RNA' ) |
U |
4 |
Purine (A orG ) |
R |
5 |
Pyrimidine (T orC ) |
Y |
6 |
Keto (G orT ) |
K |
7 |
Amino (A orC ) |
M |
8 |
Strong interaction (3 H bonds) (G orC ) |
S |
9 |
Weak interaction (2 H bonds) (A orT ) |
W |
10 |
NotA (C orG orT ) |
B |
11 |
NotC (A orG orT ) |
D |
12 |
NotG (A orC orT ) |
H |
13 |
NotT orU (A orC orG ) |
V |
14 |
Any nucleotide (A orC orG orT orU ) |
N |
15 |
Gap of indeterminate length | - |
16 |
Unknown (any character not in table) | * |
0 (default) |
Examples
Convert a nucleotide sequence from letters to integers.
s = nt2int('ACTGCTAGC') s = 1 2 4 3 2 4 1 3 2
Create a random character vector to represent a nucleotide sequence.
SeqChar = randseq(20) SeqChar = TTATGACGTTATTCTACTTT
Convert the nucleotide sequence from letter to integer representation.
SeqInt = nt2int(SeqChar) SeqInt = Columns 1 through 13 4 4 1 4 3 1 2 3 4 4 1 4 4 Columns 14 through 20 2 4 1 2 4 4 4