Looping through some data, I want to capture string of numbers that appear as page IDs (with more than one per line.) However, I only want to match number strings as part of a particular URL, but I DON'T want to record the URL, just the number.
I am currently using re.findall to identify the right URLs, and then re.sub to extract the number strings.
views = re.findall(r"/view/\d*?.htm", line)
for view in views:
view = re.sub(r"/view/(\d+).htm", r"\1", view)
pagelist.append(view)
Is there a way to do something like
views = re.findall(r"/view/(\d*?).htm", r"\1", line) #I know this doesn't work
where the original findall() only returns the part of the match in parens?
Aucun commentaire:
Enregistrer un commentaire