python处理url
我现在有一些这样的数据,用来生成url,以提供给爬虫进行爬取。
Venture Capital
Banking (excl. Investment Banking)
Washington, D.C.
Nightlife in Washington, DC
Amazon $175mm Investment in LivingSocial (December 2010)
Twitter's Future
Florian Leibert / Noirin Shirley Incident
这些数据最终要以 - 进行连接,来生成地址。
上代码:
#ipython 下执行 import re In [41]: path = re.sub("[?@ =#().&]", "-", "Banking (excl. Investment Banking)") In [42]: path Out[42]: 'Banking--excl--Investment-Banking-' In [43]: path1 = re.sub("-+","-",path) In [44]: path1 Out[44]: 'Banking-excl-Investment-Banking-' In [45]: path1.rstrip('-') Out[45]: 'Banking-excl-Investment-Banking'
如果有一些新发现的奇怪字符,在
re.sub("[?@ =#().&]", "-", "Banking (excl. Investment Banking)")
那一堆符号里面加上新符号就是了。