Search - regular expressionsThe Spider in the extended search and replace tool and in the text search tool allows you to use regular expressions ( regular expressions) . Regular expressions are special strings of characters that are rules or patterns that allow you to check whether the text you are searching for matches a set pattern. Special metacharacters allow, for example, to specify that the searched string must occur at the beginning or end of a line, contain a certain number of repetitions of selected characters, etc. Regular expressions look complicated to beginners, but they are actually usually quite simple, handy and very useful. Simple matchEach individual character matches itself until it is a metacharacter with a special meaning, described below. A group of characters matches a group of characters in the target string, so for example the text lubudu will match the string lubudu in the target string. You can make characters that normally function as metacharacters or escape sequences interpreted as regular characters by prefixing them with a backslash \ . For example, the metacharacter ^ matches any string that begins at the beginning of a line. However, \ ^ matches the ^ character in that string, \\ matches the \ character, etc. Examples:
Escape sequencesIndividual characters can be escaped by using an escape sequence, the syntax of which is similar to that known from, for example, C or Perl. For example, \n stands for a newline, \t stands for a TAB character, and so on. A more general construct is \ xnn , where nn is a string of hexadecimal digits that matches a character whose ASCII code matches the value nn . For Unicode characters, you can use \ x {nnnn} where nnnn is one or more hexadecimal digits.
Examples:
Character classesYou can specify a character class by enclosing a list of characters in square brackets [] that cause a match to any character in the bracketed list. If the first character after [ is a ^ character, the class matches characters that are outside of it. This contradicts the list of characters present in the class. Examples:
In the list, you can use the - character to specify a range of characters, e.g. a-z represents the range of characters between a and z including all between. If you want to use the - character as a member of a class, that is, a search character, place it at the beginning of the list or mark it with a backslash character. If you want to search for the bracket character ] you can also place it at the beginning of the list or use the backslash character. Examples:
MetacharactersMetacharacters are special characters that are the essence of regular expressions. There are different types of metacharacters, described below. Metacharacters - line separators
Examples
The ^ metacharacter implicitly guarantees matching only for the beginning of the input string, and $ for the end. Internal line spearators will therefore not be matched by the ^ or $ metacharacters. However, you may wish to treat the string as multi-line such that ^ will match after any line separator and $ will match before any line separator. You can do this with the /m switch. The metacharacters \A and \Z work like ^ and $ except that they will not match to multiline strings when the /m switch is used, and ^ and $ match to each internal row separator. The . metacharacter matches any character by default, but if you disable the /s switch, then the . metacharacter will not match internal row separator characters. Metacharacters - predefined classes
You can use \w, \d, and \s inside your own character classes. Examples:
Metacharacters - iteratorsEach element of a regular expression can be followed by a metacharacter type called an iterator. Using iterators you can specify the number of times a preceding character, metacharacter or subexpression repeats in a string.
So as you can see, the numbers in the braces of the form {n, m} specify the minimum ( n ) and maximum ( m ) number of repetitions for a match to occur. You can use the {n} form interchangeably instead of {n, n} and this allows for an exact match to the number of occurrences indicated. The {n,} form, on the other hand, allows for a match of at least n times or more. There is no limit to the size of n or m , but large values can consume more memory and slow down regular expressions. If the curly bracket occurs in another context, it is treated as a regular character. Examples
The greedy annotation (Polish greedy ) and not greedy used in the iterator enumeration should be explained. In short, greedy answers as many as possible, and not greedy as few as possible. For example, b + and b * applied to abbbbc will return bbbb , b +? will return b and b *? will return an empty string. b {2,3}? will return bb and b {2,3} will return bbb . All iterators can be put into non-greedy mode using the /g modifier. Metacharacters - AlternativesYou can specify a group of alternatives to a pattern using | to separate groups so that e.g. Feb | lup | gap will match any Feb , lup or vulnerability in the target string ( lu (t | p | k) will do the same). The first alternative contains everything from the last pattern separator ( ( , [ or the beginning of the pattern) up to the first character | , and the last alternative contains everything from the last character | to the next pattern separator, for this reason it is common practice to put alternatives in parentheses to minimize confusion about their beginning and end. Alternatives are matched from left to right, so the first alternative found for which the entire expression matches is the one that is selected. This means that alternatives are not usually "greedy". For example, when foo|foot is compared with baerfoot only foo will match due to the fact that it is the first matching alternative that successfully matches the string being compared. Also note that | in square brackets is interpreted as a normal character, so if you write [luk|lup|lut] you are actually looking for the expression [lukpt|]. Example: luk(asz|iem) matches two strings: luk or luk. Metacharacters - subexpressionsThe parenthesis construction ( ... ) can also be used to define regular subexpressions. Subexpressions are numbered in order from left to right depending on the opening parentheses. The first subexpression is numbered 1 (the entire result of the regular expression is numbered 0). Examples:
Metacharacters - backward referencesMetacharacters \1 through \9 are interpreted as backward references. \ Examples:
ModifiersModifiers allow you to change the behavior of the regular expression search function. There are many ways to set modifiers. Each of the modifiers can be included in a regular expression using the (?...) construct. .
The / x modifier needs some explanation. It tells the program to ignore blanks that are backslash or inside the class. You can use this modifier to break regular expressions into more readable parts. The # character is also treated as a metacharacter to denote a comment, e.g .: ((abc) # comment 1 | # You can use spaces to comment regexp (efg) # comment 2 ) This means that if you want to include spaces and the # character in a pattern (except in the character class, where the /x modifier does not apply to them), you must either use a backslash character to denote them or encode them with hexadecimal or octal character code values. In summary, these properties allow you to make the regular expression more readable. Extensions from Perl(?imsxr-imsxr) You can use them inside regular expressions on the fly. If such a construct is included in a subexpression, it affects only that subexpression. Examples:
(?#text) A comment whose text is ignored. Note that the program closes the comment as soon as it encounters the metacharacter ), so there is no way to include the character ) in the comment. Using "Replace with" in regular expression search resultsYou will want to use the phrases you find frequently in the Replace with field. In this case, in the text that will be inserted in place of the phrase, the appropriate symbol should be placed, e.g. $ 1 , $ 2 , $ 0 (where $ 1 is the first fragment matched with the expression, $ 2 the second, and so on, and $ 0 matches the entire matching phrase - all fragments) Examples: Let us assume that the content of many documents includes, among others text: <a href="gallery_first.php">First gallery</a> <a href="gallery_second.php">Second gallery</a> <a href="gallery_third.php">Third gallery</a> <a href="https://guestbook.com/index.php">Guest book</a> However, you have decided to rewrite the whole site to use PHP and all the links should be fixed. At first glance, the easiest way would be to simply change the .html string to .php everywhere. However this is not a good idea, because the extension in the guestbook links will also be changed. Therefore, you should use the regular expression capabilities in the Extended Search and Replace tool. In the'Find' field, enter: gallery_([a-z0-9]+){1}.html All strings containing ' gallery_ ' will be found, followed by a string of lowercase letters or numbers (the entire substring will be treated as one occurrence because it is enclosed in parentheses followed by {1} ) and finally the .html extension In the'Replace with' field, enter: gallery_$1.php The above entry means that the searched phrase will be changed to ' gallery_ ', further searched for lowercase letters or numbers (respectively ' first ', ' second ',' third ') and the extension' .php '. The guest book reference will of course remain unchanged. The result will be the following content: <a href="gallery_first.php">First gallery</a> <a href="gallery_second.php">Second gallery</a> <a href="gallery_third.php">Third gallery</a> <a href="https://guestbook.com/index.php">Guest book</a> |
Related topics |