xml: Can re.findall() return only the part of the regex in parens?

vendredi 29 mai 2015

Can re.findall() return only the part of the regex in parens?

Looping through some data, I want to capture string of numbers that appear as page IDs (with more than one per line.) However, I only want to match number strings as part of a particular URL, but I DON'T want to record the URL, just the number.

I am currently using re.findall to identify the right URLs, and then re.sub to extract the number strings.

views = re.findall(r"/view/\d*?.htm", line)
for view in views:
    view = re.sub(r"/view/(\d+).htm", r"\1", view)
    pagelist.append(view)

Is there a way to do something like

views = re.findall(r"/view/(\d*?).htm", r"\1", line)   #I know this doesn't work

where the original findall() only returns the part of the match in parens?

xml

vendredi 29 mai 2015

Can re.findall() return only the part of the regex in parens?

Aucun commentaire:

Enregistrer un commentaire