How Do I Ignore Tags While Getting The .string Of A Beautiful Soup Element?

July 30, 2022 Post a Comment

I'm working with HTML elements that have child tags, which I want to 'ignore' or remove, so that the text is still there. Just now, if I try to .string any element with tags, all I

Solution 1:

for child in soup.find(id='main'):
    if isinstance(child, bs4.Tag):
        print child.text

And, you'll get:

This is a paragraph.
This is a paragraph with a tag.
This is another paragraph.

Solution 2:

Use the .strings iterable instead. Use ''.join() to pull in all strings and join them together:

print ''.join(main.strings)

Iterating over .strings yields each and every contained string, directly or in child tags.

Demo:

>>> print ''.join(main.strings)

This is a paragraph. 
This is a paragraph with a tag. 
This is another paragraph.

Python Developer

How Do I Ignore Tags While Getting The .string Of A Beautiful Soup Element?

Solution 1:

Solution 2:

Post a Comment for "How Do I Ignore Tags While Getting The .string Of A Beautiful Soup Element?"