Table of Contents
Using Microsoft Word COM Object
To convert the given Microsoft Word document to PDF:
- Create a COM object using the
New-Object
cmdlet. - Use the
Open()
method to open the MS Word file. - Use the
SaveAs()
method to save the PDF file. - Use the
Close()
method to close the Word document. - Use the
Quit()
method to quit the Word application.
1 2 3 4 5 6 7 8 |
$wordApplication = New-Object -ComObject Word.Application $document = $wordApplication.Documents.Open("E:\Test\file.docx") $pdfFilePath = "E:\Test\PDFFile.pdf" $document.SaveAs([ref] $pdfFilePath, [ref] 17) $document.Close() $wordApplication.Quit() |
We must install Microsoft Word on our local machine to use the above solution. First, we used the New-Object
cmdlet with the -ComObject
parameter to create a new Word COM object and assign it to the $wordApplication
variable. What is a COM object, and why did we create a Word COM object; we could make a simple object, right?
In PowerShell, the Component Object Model component, also referred to as COM component, is a binary interface standard which allows software components to interact with each other, irrespective of any programming language or OS they are programmed or developed in.
The COM objects provide many properties, methods, and events that we can use in PowerShell scripts to perform automation and communicate with other applications. As we were supposed to interact with the Microsoft Word application, we had to use the COM object.
To use a COM object in PowerShell, we used the New-Object
cmdlet to create an instance of a COM object which referred to the COM component and lets us invoke its functions/methods, access properties and handle events.
So, the first line in the above script created a COM object representing the Microsoft Word application and assigned it to the $wordApplication
variable in PowerShell; now, this variable will be used to open, manipulate and accomplish other Word-related jobs.
The COM objects are specific to Windows OS and need the component or application they represent to be installed on our local machine. Remember, being familiar with PowerShell documentation and best practices is essential because COM objects sometimes involve complex data type transformations (conversions) and memory management.
Next, we used the Open()
method of the $wordApplication
object’s Document
property, which took the source file’s path as an argument and opened it; this document object was assigned to the $document
variable. After that, we defined a variable named $pdfFilePath
and initialized it with the file path and name for the converted PDF file (our output file).
Then, we used the $document
object’s SaveAs()
method to save the provided Word document as a PDF file. The SaveAs()
method took two arguments; the first was the file path and name for the output file, and the second was the PDF’s file format constant (17
); both arguments were passed as reference using [ref]
. Finally, we used the Close()
method to close the Word document and the Quit()
method to quit the Word application.
In the above example, we converted one file, which must have a .docx
extension, but what if we have multiple files; some of them are with a .doc
extension and others with .docx
? In that case, we will use the following solution.
1 2 3 4 5 6 7 8 9 10 11 |
$sourceFilesPath = "E:\Test" $wordApplication = New-Object -ComObject Word.Application Get-ChildItem -Path $sourceFilesPath -Filter *.doc? | ForEach-Object { $document = $wordApplication.Documents.Open($_.FullName) $pdfFilePath = "$($_.DirectoryName)\$($_.BaseName).pdf" $document.SaveAs([ref] $pdfFilePath, [ref] 17) $document.Close() } $wordApplication.Quit() |
This code is similar to the previous example. But here, we used Get-ChildItem
to retrieve all files from the given $sourceFilesPath
and filtered them using the -Filter
parameter to grab files with .doc
and .docx
extensions. Next, we used the ForEach-Object
cmdlet to loop over all files one at a time, open it, store the output file’s path and name, save the file as PDF and close the Word document. Finally, after going through all the Word files, we used the Quit()
method to quit the Word application.
we can also convert the MS Word file to a PDF file using Microsoft Office Interop API. See the following example for a demonstration.
1 2 3 4 5 6 7 8 9 10 11 12 |
Add-Type -AssemblyName Microsoft.Office.Interop.Word $sourceFilesPath = "E:\Test" $wordApplication = New-Object -ComObject Word.Application Get-ChildItem -Path $sourceFilesPath -Filter *.doc? | ForEach-Object { $document = $wordApplication.Documents.Open($_.FullName) $pdfFilePath = "$($_.DirectoryName)\$($_.BaseName).pdf" $document.SaveAs([ref] $pdfFilePath, [ref] [Microsoft.Office.Interop.Word.WdExportFormat]::wdExportFormatPDF) $document.Close() } $wordApplication.Quit() |
First, we added the Microsoft Office Interop API as Add-Type -AssemblyName Microsoft.Office.Interop.Word
, which represented a Word document. We used its wdExportFormatPDF
field as an argument in the SaveAs()
method to export the document into PDF format. It is an alternative to 17
(file format constant for PDF) in MS Office Interop API.
Using Microsoft Print to PDF Printer
To convert the MS Word file to PDF:
- Use the
New-Object
cmdlet to create a Word COM object. - Use the
Open()
method to open the provided Word document. - Use the
PrintOut()
method to print the Word document as a PDF file. - Use the
Close()
method to close the Word document. - Use the
Quit()
method to quit the Word application.
1 2 3 4 5 6 7 8 |
$wordApplication = New-Object -ComObject Word.Application $document = $wordApplication.Documents.Open("E:\Test\file.docx") $pdfFilePath = "E:\Test\PDFFile.pdf" $document.PrintOut([ref] $false, [ref] $false, [ref] 0, [ref] $pdfFilePath) $document.Close() $wordApplication.Quit() |
This code snippet is the same as the first example in the previous section, except for one difference. We used the PrintOut()
method of the $document
object to print the Word file as a PDF file at the specified destination, $pdfFilePath
. This method took four arguments which are briefly described below:
$false
denoted that we did not want to print the file to a physical printer.$false
represented that we did not want to display thePrint
dialogue box.0
specified to print all the pages of the given Word document.$pdfFilePath
states the destination for the PDF file where it should be stored.
All arguments were passed as reference using [ref]
. We can also do it for .doc
and .docx
files using the same code but with the PrintOut()
method.
1 2 3 4 5 6 7 8 9 10 11 |
$sourceFilesPath = "E:\Test" $wordApplication = New-Object -ComObject Word.Application Get-ChildItem -Path $sourceFilesPath -Filter *.doc? | ForEach-Object { $document = $wordApplication.Documents.Open($_.FullName) $pdfFilePath = "$($_.DirectoryName)\$($_.BaseName).pdf" $document.PrintOut([ref] $false, [ref] $false, [ref] 0, [ref] $pdfFilePath) $document.Close() } $wordApplication.Quit() |