Get Only The First Link Of A URLs List With BeautifulSoup
I parsed an entire HTML file, extracting some URLs with Beautifulsoup module in Python, with this peace of code: for link in soup.find_all('a'): for line in link : if '
Solution 1:
You can do it with a oneliner:
import re
soup.find('a', href=re.compile('^http://get.cm/get'))['href']
to assign it to a variable just:
variable=soup.find('a', href=re.compile('^http://get.cm/get'))['href']
I have no idea what exactly are you doing so i will post the full code from scratch: NB! if you use bs4 change the imports
import urllib2
from BeautifulSoup import BeautifulSoup
import re
request = urllib2.Request("http://download.cyanogenmod.com/?device=p970")
response = urllib2.urlopen(request)
soup = BeautifulSoup(response)
variable=soup.find('a', href=re.compile('^http://get.cm/get'))['href']
print variable
>>>
http://get.cm/get/4jj
Solution 2:
You can do this more easily and clearly in BeautifulSoup without loops.
Assuming your parsed BeautifulSoup object is named soup
:
output = soup.find(lambda tag: tag.name=='a' and "condition" in tag).attrs['href']
print output
Note that the find
method returns only the first result, while find_all
returns all of them.
Post a Comment for "Get Only The First Link Of A URLs List With BeautifulSoup"