Skip to content Skip to sidebar Skip to footer

Get Only The First Link Of A URLs List With BeautifulSoup

I parsed an entire HTML file, extracting some URLs with Beautifulsoup module in Python, with this peace of code: for link in soup.find_all('a'): for line in link : if '

Solution 1:

You can do it with a oneliner:

import re

soup.find('a', href=re.compile('^http://get.cm/get'))['href']

to assign it to a variable just:

variable=soup.find('a', href=re.compile('^http://get.cm/get'))['href']

I have no idea what exactly are you doing so i will post the full code from scratch: NB! if you use bs4 change the imports

import urllib2
from BeautifulSoup import BeautifulSoup
import re

request = urllib2.Request("http://download.cyanogenmod.com/?device=p970")
response = urllib2.urlopen(request)
soup = BeautifulSoup(response)
variable=soup.find('a', href=re.compile('^http://get.cm/get'))['href']
print variable

>>> 
http://get.cm/get/4jj

Solution 2:

You can do this more easily and clearly in BeautifulSoup without loops.

Assuming your parsed BeautifulSoup object is named soup:

output = soup.find(lambda tag: tag.name=='a' and "condition" in tag).attrs['href']
print output

Note that the find method returns only the first result, while find_all returns all of them.


Post a Comment for "Get Only The First Link Of A URLs List With BeautifulSoup"