So far, we've seen how to create regular expressions, what flags are, and how regular expressions work under the hood. But all we've seen are simple string expressions (put /code/ and /name/on screen). In this video, and the videos after this, we'll learn the different advanced or complex patterns we can create in regex.
For this lesson, we'll look at character classes.
What is a Character Class?
A character class in regex, also called a character set, allows you to specify a set of characters, for which the pattern can match one of those characters that apply to a string.
You create a character class using square brackets [...]:
In those square brackets, you specify the different characters for which one of them could match a string.
Let's look at a character class for letters.
For example, let's say you want to match an a or an e before “pple” in a string, you can do this:
In this pattern, you have the character class which contains a and e, then after the character class, you have pple. I applied the g flag so we can get all matches.
And let's say we have a string such as “Is it spelt epple or apple?”:
As you can see, "epple" and "apple" are matches for the pattern.
For this pattern, the regex engine looks for an "a" or an "e". Upon finding either of these characters, the regex engine checks if the character is followed by "pple", before it considers the substring a match.
This pattern will match "epple", which begins with "e", and "apple" which begins with "a". The character set here is made up of two characters: "a" and "e".
Note that this regular expression will not match “aepple”.
Let's say I change apple to aepple:
You see, "aepple" is not a match--only the "epple" part is a match. This is because a character class specifies a this or that value. It doesn't match “ae”, it only matches “a” or “e”.
You can create character classes for digits too. For example:
This character class in this pattern specifies 2 or 4 or 6. Remember, not 2 followed by 5 followed by 6. It is 2 or 4 or 6.. In the pattern, this means, 2 or 4 or 6, followed by "-", then "people".
Let's say we have a string like “He says there are 4-people, 6-people and 46-people”:
You see that 4-people and 6-people are matched.
Again, you see that the string 46-people is not matched.
That's because character classes specify a this or that, not a this and that.
You can create character classes for symbols too. For example:
The character class in this pattern specifies the ampersand or hyphen symbol. And the whole pattern means ampersand or hyphen followed by the letter s.
Let's say we have a string like “He used 5&s, 6%s and 7-s”. You see that &s and -s are matched.
%s is not matched because % is not part of the character class.
Range in Character Classes
In character classes, you can also specify a range.
Let's say we want to match "bing", "sing", "ring", "king", and "ping". You can use a character set in a regex pattern like this:
What if you want to match ALL four letter words that end in "ing", then your character set would look like this: [abcdefghijklmnopqrstuvwxyz]ing
While this works, there's a way to improve it--making it shorter and more concise. You do this with ranges. With a range, instead of typing a, b, c, d, up all to z, you can simply specify the range of a to z like this:
This range in the character class will match any word that starts with a letter between a to z, followed by "ing".
Let's say we have a string like “He is a king, he likes to sing. He rings the bell. He pings the phone. He uses bing. Oh what a thing. He doesn't know the word ling”.
Feel my rhymes? (with your glasses)
For the matches, you see that all letters between a to z that are followed by ing, are matched. So using ranges makes your pattern shorter and concise.
Also, you don't always have to use full ranges like a to z. You can also have shorter ranges like:
This pattern will match a substring which starts with a letter between a and m followed by ing.
Here's another example:
This pattern will match a substring which starts with a letter between r and w followed by ing.
You can also combine multiple ranges
The ranges we have looked at so far are lowercase ranges. You can have uppercase ranges too:
And if you do not want case strictness, you can simply add the case insensitive flag i as we saw in the previous videos:
The character class in this pattern will match any letter between a and z whether in lowercase or uppercase.
You can also apply ranges to digits. For example 0-9 will match all digits between 0 to 9:
2-5 will match all digits between 2 and 5
Unlike letters and digits, symbols do not have ranges, so you have to directly specify the symbols as we've seen earlier.
You can mix letter and digit ranges in the character set:
You can also add symbols to the character set:
Don't forget, this character set will match any letter between a and z, OR (emphasis on OR), any number between 0 and 9, OR the ampersand symbol, OR the hyphen symbol, followed by ing.