模块re提供对Python中正则表达式的支持。以下是此模块中的主要方法。
搜索模式的出现
re.search():此方法返回None(如果模式不匹配), 或者返回re.MatchObject, 其中包含有关字符串匹配部分的信息。此方法在第一个匹配项后停止, 因此它最适合于测试正则表达式, 而不是提取数据。
# A Python program to demonstrate working of re.match().
import re
# Lets use a regular expression to match a date string
# in the form of Month name followed by day number
regex = r "([a-zA-Z]+) (\d+)"
match = re.search(regex, "I was born on June 24" )
if match ! = None :
# We reach here when the expression "([a-zA-Z]+) (\d+)"
# matches the date string.
# This will print [14, 21), since it matches at index 14
# and ends at 21.
print "Match at index %s, %s" % (match.start(), match.end())
# We us group() method to get all the matches and
# captured groups. The groups contain the matched values.
# In particular:
# match.group(0) always returns the fully matched string
# match.group(1) match.group(2), ... return the capture
# groups in order from left to right in the input string
# match.group() is equivalent to match.group(0)
# So this will print "June 24"
print "Full match: %s" % (match.group( 0 ))
# So this will print "June"
print "Month: %s" % (match.group( 1 ))
# So this will print "24"
print "Day: %s" % (match.group( 2 ))
else :
print "The regex pattern does not match."
输出:
Match at index 14, 21
Full match: June 24
Month: June
Day: 24
将模式与文本匹配
re.match():此函数尝试将模式匹配到整个字符串。 re.match函数成功返回匹配对象, 失败则返回None。
re.match(pattern, string, flags=0)
pattern : Regular expression to be matched.
string : String where p attern is searched
flags : We can specify different flags
using bitwise OR (|).
# A Python program to demonstrate working
# of re.match().
import re
# a sample function that uses regular expressions
# to find month and day of a date.
def findMonthAndDate(string):
regex = r "([a-zA-Z]+) (\d+)"
match = re.match(regex, string)
if match = = None :
print "Not a valid date"
return
print "Given Data: %s" % (match.group())
print "Month: %s" % (match.group( 1 ))
print "Day: %s" % (match.group( 2 ))
# Driver Code
findMonthAndDate( "Jun 24" )
print ("")
findMonthAndDate( "I was born on June 24" )
查找所有出现的模式
re.findall():以字符串列表形式返回字符串中所有不重复的模式匹配项。从左到右扫描字符串, 并以找到的顺序返回匹配项(来源:Python文档:https://docs.python.org/2/library/re.html)。
# A Python program to demonstrate working of
# findall()
import re
# A sample text string where regular expression
# is searched.
string = """Hello my Number is 123456789 and
my friend's number is 987654321"""
# A sample regular expression to find digits.
regex = '\d+'
match = re.findall(regex, string)
print (match)
# This example is contributed by Ayush Saluja.
输出:
['123456789', '987654321']
正则表达式是一个巨大的话题。这是一个完整的图书馆。正则表达式可以做很多事情。你可以匹配, 搜索, 替换, 提取大量数据。例如, 下面的小代码非常强大, 可以从文本中提取电子邮件地址。因此, 我们可以使用easy.Lake regex查看python中的Web爬网程序和爬虫。
# extract all email addresses and add them into the resulting set
new_emails = set(re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", text, re.I))
我们将很快讨论正则表达式的更多方法。
如果发现任何不正确的地方, 或者想分享有关上述主题的更多信息, 请发表评论。
首先, 你的面试准备可通过以下方式增强你的数据结构概念:Python DS课程。