How to code to extract text from a docx file

Written by jaime avelar
  • Share
  • Tweet
  • Share
  • Pin
  • Email
How to code to extract text from a docx file
Read a DOCX file using VB.NET. (word image by dinostock from

In the programming world, knowing how to read the Microsoft Word DOCX file format can save you time when you need contents in a DOCX file. DOCX is a new format available in Microsoft Office 2007 and greater. This format supports XML standard, which was developed for these versions of Microsoft Office products. XML is simply a set of rules for encoding documents such as a DOCX. You can use Microsoft Visual Basic .NET (VB.NET) to read and display the contents of a DOCX file.

Skill level:

Other People Are Reading

Things you need

  • Microsoft Visual Basic Express
  • Open XML Format SDK

Show MoreHide


  1. 1

    Open Microsoft Visual Basic Express and select "New Project..." from the left pane of your screen. Click "Visual Basic" under "Installed Templates" and double-click "Console Application."

  2. 2

    Click the "Toolbox" pane and double-click "Button" to add a new button to your Form. Double-click "Textbox" to add a new text box control to your Form.

  3. 3

    Click the "Project" menu and select "<projectname> Properties." Click "References," then select "Add." Select ".NET" and click "DocumentFormat.OpenXml." Click "OK." Double-click "Button1" to open the VB window.

  4. 4

    Copy and paste the following code in the very top of your VB.NET module to declare the namespaces:

    Imports System.IO

    Imports DocumentFormat.OpenXml.Packaging

    Imports DocumentFormat.OpenXml.Wordprocessing

  5. 5

    Copy and paste the following code under "Button1_Click" to define the path and document to open:

        Dim strDoc As String = "C:\docxFile.docx"
        Dim txt As String
        Dim stream As Stream = File.Open(strDoc, FileMode.Open)

    Edit the following line of code and type the path and name to your document:

        Dim strDoc As String = "C:\docxFile.docx"
  6. 6

    Copy and paste the following to call the procedure to actually read the document:

        OpenAndAddToWordprocessingStream(stream, txt)
  7. 7

    Copy and paste the following to display the text read in the text box control:

        Me.TextBox1.Text = txt
  8. 8

    Copy and paste the following procedure to open the DOCX file and return the contents read:

    Public Sub OpenAndAddToWordprocessingStream(ByVal stream As Stream, ByRef txt As String)
        Dim wordprocessingDocument As WordprocessingDocument = wordprocessingDocument.Open(stream, True)
        Dim body As Body = wordprocessingDocument.MainDocumentPart.Document.Body
        txt = body.InnerText.ToString
    End Sub
  9. 9

    Press "F5" to run the program, then click "Button1" to execute the code.

Don't Miss

  • All types
  • Articles
  • Slideshows
  • Videos
  • Most relevant
  • Most popular
  • Most recent

No articles available

No slideshows available

No videos available

By using the site, you consent to the use of cookies. For more information, please see our Cookie policy.