新闻动态

行业新闻企业新闻雷火电竞

正则表达式学习笔记

雷火电竞

re.match()的正则使用

尝试从字符串的起始位置匹配一个模式,如果不是起始位置匹配成功的话,match()就返回None

1.最常规的匹配

正则表达式学习笔记

import re contect = 'Hello 123 456789 World_this is a Regex Demo'res= re.match('^Hello\s\d\d\d\s\d{6}\s\w{10}.*Demo$', contect)print(res)print(res.group()) #获取匹配内容print(res.span()) #查看匹配长度print(len(contect)) #len统计字符串的数量 Hello 123 456789 World_this is a Regex Demo(0, 43)43 a_str = 'qwe 123 ghj'res = re.match('^q\w{2}\s\d{3}.*j$', a_str)print(res.group()) qwe 123 ghj

2.范匹配

contect = 'Hello 123 4567 World_This is a Regex'result = re.match('^H.*?Regex$', contect)print(result.group())print(result.span()) Hello 123 4567 World_This is a Regex(0, 36)

3.匹配目标–分组匹配(可用()进行分组匹配)

contect = 'qwe Hello 1234567 world_This is a Regex Demo'# result = re.match('^qwe\s(\w+)\s(\d{7}).*Demo$', contect) #括号分组result = re.match('^qwe\s(\w+)\s(\d{3}).*Demo$', contect) #第二组匹配前3个数print(result.group())print(result.group(1))print(result.group(2)) qwe Hello 1234567 world_This is a Regex DemoHello123 sssd = 'dasdjskL22222adjlsakjddd666666dasssssssa'result = re.match('^d.*L(\d+).*ddd(\d+)d.*a$', sssd)print(result)print(result.group(1))print(result.group(2)) 22222666666

4.贪婪匹配(尽可能多的去匹配)

content = 'Hello 1234567 world_This is a Regex Demo'result = re.match('^He.*(\d+)\s.*Demo$', content)print(result)print(result.group(1)) 7

5.非贪婪模式(尽可能少的去匹配)

content = 'Hello 1234567 world_This is a Regex Demo'result = re.match('^He.*?(\d+).*Demo$', content)print(result)print(result.group(1)) 1234567

6.匹配模式(针对换行) re.S

#re.S匹配包括换行在内的所有字符content = '''Hello 1234567 world_Thisis a Regex Demo'''result = re.match('^He.*?(\d+).*Demo$', content, re.S)print(result)print(result.group(1)) 1234567

7.转义

#错误content = 'price is $5.00' # .是正则里的特殊匹配符号result = re.match('price is $5.00', content)print(result) None #正确content = 'price is $5.00' # .是正则里的特殊匹配符号result = re.match('price is \$5\.00', content)print(result)print(result.group()) price is $5.00

尽量使用非贪婪模式

re.search()方法的使用

re.search()扫描整个字符串并返回第一个成功的匹配

content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'# result = re.match('He.*?(\d+).*?Wor', content) #Noneresult = re.match('Ex.*?(\d+).*?Wor', content)print(result) content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'result = re.search('He.*?(\d+).*?Wor', content)print(result)print(result.group(1)) 1234567 html = '''
  • 往事随风
  • '''result = re.search('(.*?).*?', html, re.S)print(result.group(1))print(result.group(2)) 老秦往事随风

    re.findall()

    拿到所有满足要求的数据

    找到数据的共同点,基本不一样的地方用.*?

    re.sub()

    替换字符串中每一个匹配的字符串后返回替换后的字符串

    content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'#第一个参数 正则表达式#第二个参数 要替换的字符串#第三个参数 原字符串content = re.sub('s', '7', content)print(content) Extra 7ting7 Hello 1234567 World_Thi7 i7 a Regex Demo Extra 7ting7 content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'content = re.sub('\d+', '66666666', content)print(content) Extra stings Hello 66666666 World_This is a Regex Demo Extra stings content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'#要替换的内容是在包含原字符串的本身后面去追加content = re.sub('(\d+)',r'\1 3333', content) #\1保留原始字符串 r表示追加 空格后面表示要追加的内容print(content) Extra stings Hello 1234567 3333 World_This is a Regex Demo Extra stings

    | 表示或 将阻碍匹配的数据替换成空白 方便后面的匹配

    re.compile()

    将正则字符串编译成正则表达式对象

    content = '''Hello 1234567 world_Thisis a Regex Demo'''pattern = re.compile('Hello.*?Demo', re.S) #正则表达式对象print(pattern)result = re.match(pattern, content)print(result) re.compile('Hello.*?Demo', re.DOTALL)