How Do I Ignore Tags While Getting The .string Of A Beautiful Soup Element?
I'm working with HTML elements that have child tags, which I want to 'ignore' or remove, so that the text is still there. Just now, if I try to .string any element with tags, all I
Solution 1:
for child in soup.find(id='main'):
if isinstance(child, bs4.Tag):
print child.text
And, you'll get:
This is a paragraph.
This is a paragraph with a tag.
This is another paragraph.
Solution 2:
Use the .strings
iterable instead. Use ''.join()
to pull in all strings and join them together:
print ''.join(main.strings)
Iterating over .strings
yields each and every contained string, directly or in child tags.
Demo:
>>> print ''.join(main.strings)
This is a paragraph.
This is a paragraph with a tag.
This is another paragraph.
Post a Comment for "How Do I Ignore Tags While Getting The .string Of A Beautiful Soup Element?"