Learning Sections show
Regular Expressions in Python
Regular expressions (regex) are a powerful tool for matching patterns in text. Python provides the re
module to work with regular expressions.
Basic Functions in the re
Module
- re.search(pattern, string): Searches for the first occurrence of the pattern within the string. Returns a match object if found, else None.
- re.match(pattern, string): Checks for a match only at the beginning of the string. Returns a match object if found, else None.
- re.findall(pattern, string): Returns a list of all non-overlapping matches of the pattern in the string.
- re.finditer(pattern, string): Returns an iterator yielding match objects over all non-overlapping matches.
- re.sub(pattern, repl, string): Replaces the matches with the specified replacement string.
Using re.search
import re
# Search for a pattern within a string
pattern = r'\bhello\b'
text = 'hello world'
match = re.search(pattern, text)
if match:
print('Match found:', match.group())
else:
print('No match found')
Using re.match
# Check for a match at the beginning of the string
pattern = r'world'
text = 'hello world'
match = re.match(pattern, text)
if match:
print('Match found:', match.group())
else:
print('No match found')
Using re.findall
# Find all non-overlapping matches in the string
pattern = r'\b\w+\b'
text = 'hello world'
matches = re.findall(pattern, text)
print(matches) # Output: ['hello', 'world']
Using re.sub
# Replace matches with a replacement string
pattern = r'\bhello\b'
replacement = 'hi'
text = 'hello world'
new_text = re.sub(pattern, replacement, text)
print(new_text) # Output: 'hi world'
Regular Expression Syntax
- .: Matches any character except a newline.
- \d: Matches any digit.
- \w: Matches any word character (alphanumeric + underscore).
- \s: Matches any whitespace character.
- \b: Matches a word boundary.
- ^: Matches the start of the string.
- $: Matches the end of the string.
- +: Matches one or more repetitions of the preceding character.
- *: Matches zero or more repetitions of the preceding character.
- ?: Matches zero or one repetition of the preceding character.
- {n}: Matches exactly n repetitions of the preceding character.
- {n, m}: Matches between n and m repetitions of the preceding character.