# 学习的目的

在处理一些信息的时候，需要通过编程使计算机在文本中有检索某种模式的能力。
例如要收集某个时期的文件，需要人眼手动去看时间戳；又例如邮件中含有某个字眼的就是中病毒了，需要你一一排查出来。

# 1 学习正则需要学会的基本的特殊符号和字符

# 特殊符号

literal 匹配文本字符串的字面值
| 或者的意思
. 可以匹配任意字符串，除了换行
^ 匹配字符串的起始部分
$ 匹配字符串的结尾部分
* 匹配 0 次或多次前面的正则表达式
+ 匹配 1 次或多次前面的正则表达式
？匹配 0 次或 1 次前面的正则表达式
{n} 匹配 n 次前面出现的正则表达式
[23] 代表 2 或者 3

# 特殊字符

\d 数字
\w 任意字母数字字符
\s 任何空格字符

# 几个示例

[2a][5f] 代表 25、2f、a5、af 多种组合
z.[0-9] 代表 z1 到 z9
0？[0-9]
</?[^>]+> 代表 HTML 的任何标签

# 2 学习正则所需要的几个重要函数

# re 模块函数

compile(pattern,flags = 0) 使用任意标记来编译正则表达式

# 正则表达式对象

match(pattern,string,flags = 0) 依据表达式搜索，然后展示
search (pattern,string,flags = 0) 依据表达式搜索，然后展示，但仅展示第一个
match 和 search 的区别在于在单词层面，search 是以单词中的字母为单位，而 match 是以整个单词为单位
findall (pattern,string,[flags]) 依据表达式搜索，返回所有，然后展示，
finditer
sub(pattern,repl,string,count= 0) 依据表达式搜索，然后展示

# 几个示例

	m=re.match('foo','food on the table')
	m.group()
	>>>'foo'
	这里要注意如果不是foo在food开头，就无法匹配到

	m=re.search('foo','seafood on the table')
	m.group()
	>>>'foo'

注意 group 的使用需要用到子组，及带括号的

	m=re.match('(\w\w\w)-(\d\d\d)','abc-123')
	m.group()
	m.group(1)
	m.group(2)
	m.groups()
	>>>'abc-123'
	>>>'abc'
	>>>'123'
	>>>'abc','123'

findall () 用来处理多个符合匹配的情况

	m=re.findall('foo', 'seafood on the foodtable')
	m.group()
	>>>['foo' , 'foo']

学习：忽略大小写 + 学习：findall 和 finditer 的区别

	m='This and that. '
	m=re.findall(r'(th\w+)', m, re.I)
	m.group()
	>>>['This' , 'that']

	m1=re.finditer(r'(th\w+)', m, re.I)
	g=m1.next()

学习：sub 和 subn 的替换

	re.sub('food', 'X' ,'seafood on the foodtable')
	>>>('seaX on the Xtable')

	re.subn('food', 'X' ,'seafood on the foodtable')
	>>>('seaX on the Xtable',2)

学习：split 的分割

	re.split(' : ','seafood : on : the : foodtable')
	>>> ('seafood','on','the','foodtable')

python

# 学习的目的

# 1 学习正则需要学会的基本的特殊符号和字符

# 特殊符号

# 特殊字符

# 几个示例

# 2 学习正则所需要的几个重要函数

# re 模块函数

# 正则表达式对象

# 几个示例

Python爬虫学习笔记

⛄壁纸小程序搭建详细教程+源码 - 简书☃