Word tables to Excel: Python

Question: I’ve got a folder of 100+ Word documents, each containing a table or two. Can you format that into an Excel spreadsheet for me?

Answer: Yes – bask in the power and glory of Python.

Using the Python for Windows Extension (specifically the win32com.client library), you can drive both Word and Excel with Python. Knowing that this functionality exists is half the battle.

# import libraries
import win32com.client as win32
import os

myDir = r'C:\Projects\ProjectX'

# open invisible Excel app
XL = win32.Dispatch('Excel.Application')
XL.Visible = 0
# load pre-made workbook
XLbook = XL.Workbooks.Open(os.path.join(myDir,'ProjectX_Spreadsheet.xlsx'))
# navigate to first worksheet
XLsheet = XLbook.Worksheets(1)
# counter to keep track of Excel row
XLrow = 2

# loop through files in directory
for myFile in os.listdir(myDir):
 filepath = os.path.join(myDir,myFile)
 filename = os.path.splitext(myFile)[0]
 ext = os.path.splitext(myFile)[1]

 # check if *.docx {optional}
 if ext == '.docx':
 # open invisible Word app
 word = win32.Dispatch('Word.Application')
 word.Visible = 0
 # open Word doc to read
 doc = word.ActiveDocument

 # access first table in Word doc. For subsequent tables, increase index.
 table = doc.Tables(1)

 # get (Word) and set (Excel) some data
 XLsheet.Cells(XLrow,1).Value = table.Cell(Row=1, Column=1).Range.Text

 # get (Word) and set (Excel) some more data
 XLsheet.Cells(XLrow,2).Value = table.Cell(Row=2, Column=3).Range.Text

 # move to next row
 XLrow = XLrow + 1
 # close the current Word doc

# exit the Word app
del word
# save and close Excel app
del XL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s