轩痕的竹林

Happy coding

python处理url

我现在有一些这样的数据,用来生成url,以提供给爬虫进行爬取。

Venture Capital
Banking (excl. Investment Banking)
Washington, D.C.
Nightlife in Washington, DC
Amazon $175mm Investment in LivingSocial (December 2010)

Twitter's Future

Florian Leibert / Noirin Shirley Incident

这些数据最终要以 - 进行连接,来生成地址。

上代码:

#ipython 下执行
import re
In [41]: path = re.sub("[?@ =#().&]", "-", "Banking (excl. Investment Banking)") 
In [42]: path
Out[42]: 'Banking--excl--Investment-Banking-'

In [43]: path1 = re.sub("-+","-",path)

In [44]: path1
Out[44]: 'Banking-excl-Investment-Banking-'

In [45]: path1.rstrip('-')
Out[45]: 'Banking-excl-Investment-Banking'


如果有一些新发现的奇怪字符,在

re.sub("[?@ =#().&]", "-", "Banking (excl. Investment Banking)") 

那一堆符号里面加上新符号就是了。