How To Get The Text Under The Tag

November 30, 2022 Post a Comment

I'm trying to get the text under the tag I tried several different options: dneyot=driver.find_elements_by_xpath('//*[starts-with(@id, 'popover-')]/text()') dneyot=driver.find_ele

Solution 1:

If you want to get that text excluding the <b> node text then you need to use the below XPath:

//div[starts-with(@id, 'popover-')]

which will identify the div node and then by using find_elements_by_xpath() method, you can retrieve all the text from div node. Try the code below:

elements = driver.find_elements_by_xpath("//div[starts-with(@id, 'popover-')]") 
for element in elements:
    print(element.text)

Update:

I suspect, the above method may not work and we may not be able to identify/get that data using the normal methods - in that case you need to use JavaScriptExecutor to get the data like below :

driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")

xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
    lenght = int(driver.execute_script("return arguments[0].childNodes.length;", element));
    for i in range(1, lenght + 1, 1):
        try:
            data = str(driver.execute_script("return arguments[0].childNodes["+str(i)+"].textContent;", element)).strip();
            if data != None and data != '':
                print data
        except:
            print "=> Can't print some data..."

As your site is written in some other language other than English, you may not able to print/get some data.

For getting specific child nodes data, you need to do like below :

from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.get("file:///C:/NotBackedUp/SomeHTML.html")

xPath = "//div[starts-with(@id, 'popover-')]"
elements = driver.find_elements_by_xpath(xPath)
for element in elements:
    # For print b1 text
    b1Text = driver.execute_script("return arguments[0].childNodes[2].textContent", element);
    print b1Text

    # For printing b2 text
    b2Text = driver.execute_script("return arguments[0].childNodes[6].textContent", element);
    print b2Text

print("=> Done...")

I hope it helps...

Solution 2:

Using Beautifulsoup:

Find the div with the id = popover-34252127 inside the parent div.

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.your_url_here.com/")

soup = BeautifulSoup(page.content, 'html.parser')
data = soup.find("div", {"id": "popover-34252127"})
print(data)

Solution 3:

find_elements_by_xpath() returns a webelement - the basic object selenium actually works with.
Your xpath ends with /text() - that will return you the text content of a node in an xml document - not an object selenium expects. So, just change it not to have that suffix - that will return the element itself, and get its (the element's) text by calling .text in Python:

dneyot=driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]")
for element in dneyot:
    print("Период показов >3 дней", element.text)

Solution 4:

text() returns text node, selenium doesn't know how to handle it, it can only handle WebElements. You need to get the text for element with id "popover" and work with the returned text

elements = driver.find_elements_by_xpath("//*[starts-with(@id, 'popover-')]")
for element in elements:
    lines = element.text.split('\n')
    for line in lines:
        print("Период показов >3 дней", line)

Solution 5:

You can use Regular expression to get dates:

import re

#...

rePeriod = '(.*)(\\d{4}-\\d{2}-\\d{2} - \\d{4}-\\d{2}-\\d{2})(.*)'

dneyot = driver.find_elements_by_css_selector('div[id^="popover-"]')
for spisok in dneyot:
    m = re.search(rePeriod, spisok.text)
    print("Период показов >3 дней", m.group(2))

Python Developer