Today is:
    | Home | Web Technology | Language | Articles | | About-Us | Contact-Us |

JS Tutorial     

JavaScript Ch8 - Regular Expressions


What in this chapter:

8.1 Defining Regular Expression

In JavaScript, regular expressions are represented by RegExp objects. RegExp objects may be created with RegExp() constructor, but they are more often created using a special literal syntax. Regular expression literals are specified as characters within a pair of slash (/) characters. Thus, your JavaScript code may contain something like this:
var pattern =/s$/;
This line creates a new RegExp object and assigns it to the variable pattern. This particular RegExp object matches any string that ends with the letter "s". This regular expression could have equivalently been defined with the RegExp() constructor like this:
var pattern = new RegExp("s$");
Regular expression pattern specifications consist of a series of characters. Most characters, including all alphanumeric characters, simply describe characters to be matched literally. Thus, regular expression /java/ matches any string that contains the substring "java". Other characters in regular expressions are not matched literally, but have special significance. For example, the regular expression /s$/ contains two characters. The first, "s", matches itself literally. The second, "$", is a special metacharacter that matches the end of the string. Thus, this regular expression matches any string that has the letter s as its last character.

8.1.1 Literal Characters

All alphabetic characters and digits match themselves literally in regular expressions. JavaScript regular expression syntax also supports certain non-alphabetic characters through escape sequences that begin with a backslash (\). For example, the sequence \n matches a literal newline character in a string. The table 8-1 lists these characters.

Table 8-1. Regular Expression literal characters

Character Matches
Alphanumeric character Itself
\0 The NUL character (\u0000)
\t Tab (\u0009)
\n Newline (\u000A)
\v Vertical tab (\u000B)
\f Form feed (\u000C)
\r Carriage return (\u000D)
\xnn The Latin character specified by the hexadecimal number nn; for example, \x0A is the sam as \n
\uxxxx The Unicode character specified by the hexadecimal number xxxx; for example, \u0009 is the same as \t
\cX The control character ^x; for example, \cJ is equivalent to the newline character \n
A number of punctuation characters have special meanings in regular expressions. They are:
^ $ . * + ? = ! : | \ / ( ) [ ] { }
If you want to include any of these characters literally in a regular expression, you must precede them with a \. Other punctuation characters, such as quotation marks and @, do not do not have special meaning and simply match themselves literally in a regular expression.

8.1.2 Character Classes: [ ]

Individual literal characters can be combined into character classes by placing them within square brackets. A character class matches any character that is contained within it. Thus, the regular expression /[abc]/ matches any one of the letters a, b or c.

Negated character classes can also be defined—these match any character except those contained within the brackets. A negated character class is specified by placing a caret ( ^ ) as the first character inside the left bracket. The regular expression /[^abc]/ matches any one character other than a, b or c.

Character class can use a hyphen to indicate a range of characters. To match any lowercase character from the Latin alphabet, use /a-z/, and to match any letter or digit from the Latin alphabet, use /[a-zA-Z0-9]/.

Because certain character classes are commonly used, JavaScript regular expression syntax includes special characters and escape sequences to represent these common classes. For example, \s matches the space character, the tab character and any other Unicode whitespace character. \S matches any character that is not the Unicode whitespace character. Table 8.2 lists these characters and summarizes character class sytax.

Table 7.2. Regular expression character classes

Character Matches
[...] Any one character between the brackets.
[^...] Any one character not between the brackets.
. Any character except newline or another Unicode line terminator.
\w Any ASCII word character. Equivalent to [a-zA-Z0-9_]
\W Any character that is not an ASCII word character. Equivalent to [^a-zA-Z0-9_]
\s Any Unicode whitespace character.
\S Any character that is not Unicode whitespace. Note that the \w and \S are not the same thing.
\d Any ASCII digit. Equivalent to [0-9].
\D Any character ther than ASCII digit. Equivalent to [^0-9].
[\b] A literal backspace (special case).
Note that the special character class escapes can be used within square brackets. \s matches any whitespace character and \d matches any digit, so [\s\d] matches any one whitespace character or digit.

Note: there is one special case. As we'll see later, the \b escape has a special meaning. When used within a character class, however, it represents the backspace character. Thus, to represent a backspace character literally in a regular expression, use the character class with one element: /[\b]/.


    References

    (1) Aland Shalloway & James R. Trott, Design Patterns Explained, Second Edition.

    (2) Allen Holub, Holub on Patterns, Learning Design Patterns by Looking at Code

    (3) Eric Evans, Domain-Driven Design, Tackling complexity in the heart of software.

    Advertisement

puthik.com ©2008