From the course: Natural Language Processing for Speech and Text: From Beginner to Advanced

Unlock the full course today

Join today to access over 24,700 courses taught by industry experts.

Rule-based: Regular expressions

Rule-based: Regular expressions

- [Instructor] Regular expressions, or RegEx, are sequences of characters that form search patterns. There are many individual and combinations of patterns that are possible. Some of the fundamental patterns are literal pattern, where the exact character or string is specified. For example, capital letter A to find all capital As, and hello as a string to find all hello in lowercase. Meta characters like square bracket to define a set. For example, [aeiou] will match any vowels. Wildcard or dot to match any single character, except new line. For example, s.t will match sit, sat, and set. Dash sign to specify for ranges. For example, [a-z] will match any lowercase. And backslash to escape the meta character to match it literally. For example, \d matches any digits, and \w matches any word character. In natural language processing, regular expression is used for tokenization. If you remember in the first chapter, we implemented regular expression in NLTK for this. Text cleaning, pattern…

Contents