在寻找字符串时强调？

Posted byLoren Shure.那2020年12月3日

23.views (last 30 days) |0.喜欢|0评论

当您必须搜索字符串模式时，您是否会得到Clammy手，而不仅仅是一个特定的字符串？挣扎的想法是regexp.make you sweat?

不再担心！您现在可以使用新的许多搜索patternfeature in MATLAB. And in some cases, you can get away with even less.

对于今天的帖子，我的合着者是Jason Breslau和Curtis Anderson，因为他们更多地了解更多regexp.比我，以及关于功能的更多细微差别。我们将通过展示一些例子来完成此操作。您也想查看Jiro最近的Pick of the Week。

内容

示例0：对于那些喜欢Regexp的人

This example is for you if you loveregexp.并且不要明白为什么你应该考虑其他任何东西。您可以使用regexppattern.to convert your favorite regular expression to a pattern so you can take advantages of code features. Compare these two ways to see if a string is contained in some text.

包含（str，regexppattern（expr））

〜Cellfun（'isempty'，Regexp（str，expr））

每次阅读它时，您都可以快速了解哪一个，而无需通过逻辑？

And now more for those who really would prefer to skipregexp.more often.

示例1：计数注释行

假设我想在Matlab文件中计算注释的行（不是阻止注释）。以下是怎么做的regexp.：

codeFile = fileread('num2str.m'）;comments = regexp(codeFile,'^ \ s *％'那'lineanchors'）;numel（评论）

ans = 37

That's annoying as you had to know about lineanchors, and that regular expression is a little ugly. Plus, it returned an array of indices, that we don't really care about. Instead, try this:

数数(codeFile, lineBoundary + whitespacePattern +“％”的）

ans = 37

我们仍然需要阅读该文件，我们寻找线条（忽略领先的空白），有效地以％开头。

示例2：如何在按元音开始的文件中查找单词

假设我们希望获得以元音开始的单词的一些统计数据。

vowelwords = regexpi（codefile，'\<[aeiou][a-z]*'那'match'）;howManyWords = length(vowelWords)

howManyWords = 176

使用pattern，我们首先搜索单词，这是字母字符。然后只看起来像元音开始的那些。

单词=提取物（Codefile，LettersPattern）;vowelwords1 =单词（startswith（lock，characthlistpattern（'aeiou'），'Ignorecase'， 真的））;HowManyWords =长度（vowelwords1）

howManyWords = 176

And here's perhaps an even better way to do this! Build a pattern from a list of the vowels. And then look for something that has a boundary before a letter - some whitespace, followed by a vowel and then for the possible rest of the word.

vowel = caseInsensitivePattern(characterListPattern(“aeiou”的））;vowelWords2Pat = letterBoundary + vowel + lettersPattern(0,inf); vowelWords2 = extract(codeFile, vowelWords2Pat); howManyWords = length(vowelWords2)

howManyWords = 176

Assuming we只要想要伯爵在这里，我们可以用呼叫替换前一行的代码数数更有效地达到目标。关于建立复杂模式的工作流程的好事是它为您提供的多功能性。

hmw = count（codefile，vowelwords2pat，"IgnoreCase"， 真的）

hmw = 176.

Or I could replaceletterspattern.(0,Inf)和optionalPattern(lettersPattern)。能够为模式提供模式数数那以。。开始and包含是最大的胜利。

最佳实践

我们发现最好通过加入较小的碎片来建立一种模式。它使得更容易理解您在做什么，您所在的方式或不适用案例敏感性等。

示例3：匡威 - 查找以辅音开头的单词

假设我们想要以辅音开始的单词。这是这一点regexp.way.

consregexp = regexpi（codefile，'\ <（？！[aeiou]）[a-z] +'那'match'）;

并使用模式

conspat = extract（codefile，。。。alphanumericBoundary +。。。〜Lookaheadboundary（CaseInsensityPattern（CharacthListPattern（'aeiou'的）的）的）。。。+ lettersPattern);

And finally using neitherregexp.nor a consonant pattern. Instead, use the negation of the starting with vowel words. This is the easiest to understand, perhaps.

gymings = plats（〜startswith（lock，caseInsensitypattern（characthlistpattern（'aeiou'）））））;

精明的读者会看到这里的答案不同意。有关详细信息，请参阅结尾附录，以及如何获得对齐的答案。

示例4：查找具有某些扩展的文件

我们审核了我们测试系统一个阶段中使用的所有正则表达式，发现其中大约50％的人可以被取代以。。结束根本没有模式。以前我们使用过regexp.但这是这份工作的巨大锤子。我认为寻找具有特定文件扩展名的文件可能是一个常见的用例。喜欢，

Regexp（filename，'.txt $'）

有两个虫子！你需要isempty那and'一次'：

〜isempty（regexp（filename，'.txt $'，'一次'））

And you also have to escape the dot, which everyone forgets to do.

〜isempty（regexp（filename，'\ .txt $'，'一次'））

相反，你只是做

endswith（filename，'.txt'）

这interesting things is that this uses no pattern at all but uses a function,以。。结束那that could take a pattern.

Suppose you now want to check for 2 different extensions. Easily done.以。。结束万博1manbetx支持多个搜索字符串，并将其视为or。This is faster but a bit more limited than doing a search with a proper pattern.

以。。结束(fileName, [".txt", ".somethingElse"])

与明确或

endswith（filename，“.txt”|“.somethingelse”）

没有扩展的文件呢？

What if a filename ends with either txt or no extension at all?

endswith（filename，'.txt'）||~contains(fileName, '.')

这是一个文件，没有完整的路径名。

你的搜索怎么样？

您是否能够在MATLAB中充分利用模式，并能够（或不）消除一些或所有使用regexp.。让我们知道这里。

附录

正如所承诺的那样，我会在这里描述为什么辅音答案不同意以及如何使它们相同。

这字变量具有连续字母组。我们有一些名字num2str使用数字的代码，例如，MAT2STR.。这转化为2个单词，垫andstr。We can fix this using

字= extract(codeFile, alphanumericBoundary ... + lettersPattern + alphanumericBoundary);

这意味着regexp.version is:

consregexp = regexpi（codefile，'\<(?![aeiou])[a-z]+\>', 'match');

and the corresponding pattern:

conspat = extract（codefile，... alphanumericboundary + ...〜pookaheadboundary（caltsissensitypattern（charactlinglistpattern（'aeiou'））... + letterspattern ... +/phanumericBoundary）;

Phew! That's a mouthful! But pretty readable too.

发布与MATLAB®R2020B