Parsing Of Table From .docx File
I want to parse a table from a .docx file using Python and python-docx into some useful data structure. The .docx file contains only a single table in my case. I've uploaded it so
Solution 1:
You can use the snippet below to parse your document into a list where each row is a dictionary mapping the table header value to the column value.
from docx.api import Document
# Load the first table from your document. In your example file,# there is only one table, so I just grab the first one.
document = Document('Books.docx')
table = document.tables[0]
# Data will be a list of rows represented as dictionaries# containing each row's data.
data = []
keys = Nonefor i, row inenumerate(table.rows):
text = (cell.text for cell in row.cells)
# Establish the mapping based on the first row# headers; these will become the keys of our dictionaryif i == 0:
keys = tuple(text)
continue# Construct a dictionary for this row, mapping# keys to values for this row
row_data = dict(zip(keys, text))
data.append(row_data)
This will give you:
data = [
{u'Pub.': u'Penguin Books',
u'Auther': u'Edward de BONO',
u'Sr. No.': u'1',
u'Name of Book': u'Six Thinking Hats'
},
...
]
If you'd just want a tuple for each row, you should instead of creating a dictionary just set row_data
to the tuple value of text
, so in the loop instead of constructing the dict
, do:
# Construct a tuple forthis row
row_data = tuple(text)
data.append(row_data)
Now, data
would hold something like this instead:
data = [
(u'1',
u'Six Thinking Hats',
u'Edward de BONO',
u'Penguin Books'
),
...
]
Then you can skip constructing keys
, obviously (but still skip the first row!).
Post a Comment for "Parsing Of Table From .docx File"