vendredi 29 mai 2015

Can re.findall() return only the part of the regex in parens?

Looping through some data, I want to capture string of numbers that appear as page IDs (with more than one per line.) However, I only want to match number strings as part of a particular URL, but I DON'T want to record the URL, just the number.

I am currently using re.findall to identify the right URLs, and then re.sub to extract the number strings.

views = re.findall(r"/view/\d*?.htm", line)
for view in views:
    view = re.sub(r"/view/(\d+).htm", r"\1", view)
    pagelist.append(view)

Is there a way to do something like

views = re.findall(r"/view/(\d*?).htm", r"\1", line)   #I know this doesn't work

where the original findall() only returns the part of the match in parens?

Aucun commentaire:

Enregistrer un commentaire