Regular expressions

In [1]:
import re
In [2]:
re.findall('is','This is an arbitrary sentence.')
Out[2]:
['is', 'is']
Special characters ^.$+-[]({}?)

. matches any single character except newline

In [4]:
re.findall('..is.','This is an arbitrary sentence.')
Out[4]:
['This ']
In [5]:
re.findall('w.{1,3}s','This is an arbitrary sentence which is full of a lot more words.')
Out[5]:
['words']
In [6]:
re.findall('w.{1,6}s','This is an arbitrary sentence which is full of a lot more words.')
Out[6]:
['which is', 'words']
In [ ]:
+ for repetitions
In [9]:
re.findall('.el+','She sells sea shells by the sea shore. She likes to sell them to elfs.')
Out[9]:
['sell', 'hell', 'sell', ' el']
In [10]:
re.findall('.el+s','She sells sea shells by the sea shore. She likes to sell them to elfs.')
Out[10]:
['sells', 'hells']
In [11]:
re.findall('e.+s','She sells sea shells by the sea shore. She likes to sell them to elfs.')
Out[11]:
['e sells sea shells by the sea shore. She likes to sell them to elfs']

Matching is "greedy" (gets largest matching substring) by default.

? changes it to "lazy" (shortest matching substring)

In [12]:
re.findall('e.+?s','She sells sea shells by the sea shore. She likes to sell them to elfs.')
Out[12]:
['e s',
 'ells',
 'ea s',
 'ells',
 'e s',
 'ea s',
 'e. She likes',
 'ell them to elfs']
In [2]:
[] used to enclose options: either l or s in the example below
In [15]:
re.findall('.e[ls].','She sells sea shells by the sea shore. She likes to sell them to elfs.')
Out[15]:
['sell', 'hell', 'kes ', 'sell', ' elf']
In [16]:
re.findall('.e[a-z].','She sells sea shells by the sea shore. She likes to sell them to elfs.')
Out[16]:
['sell', 'sea ', 'hell', 'sea ', 'kes ', 'sell', 'hem ', ' elf']
In [19]:
emailmatcher = '[a-z]+@[a-z]+\.[a-z]+'
re.findall(emailmatcher,'My addresss is ringland@buffalo,edu')
Out[19]:
[]
In [20]:
emailmatcher = '[a-z]+@[a-z]+\.[a-z]+'
re.findall(emailmatcher,'My addresss is ringland@buffalo.edu')
Out[20]:
['ringland@buffalo.edu']
In [ ]: