Question

I'm using the following powershell script to open a few thousand HTML files and "save as..." Word documents.

param([string]$htmpath,[string]$docpath = $docpath)   

$srcfiles = Get-ChildItem $htmPath -filter "*.htm*"
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], "wdFormatDocument"); 
$word = new-object -comobject word.application 
$word.Visible = $False          

function saveas-document
{         
    $opendoc = $word.documents.open($doc.FullName);         
    $opendoc.saveas([ref]"$docpath\$doc.FullName.doc", [ref]$saveFormat);         
    $opendoc.close();
}       

ForEach ($doc in $srcfiles)     
{
    Write-Host "Processing :" $doc.FullName         
    saveas-document        
    $doc = $null   
}   

$word.quit(); 

The content converts splendidly, but my filename is not as expected.

$opendoc.saveas([ref]"$docpath\$doc.FullName.doc", [ref]$saveFormat); results in foo.htm saving as foo.htm.FullName.doc instead of foo.doc.

$opendoc.saveas([ref]"$docpath\$doc.BaseName.doc", [ref]$saveFormat); yields foo.htm.BaseName.doc

How do I set up a Save As... filename variable equal to a concatenation of BaseName and .doc?

Was it helpful?

Solution

Based on our comments above, it seems that moving the files is all you want to accomplish. The following works for me. In the current directory, it replaces .txt extensions with .py extensions. I found the command here.

PS C:\testing dir *.txt | Move-Item -Destination {[IO.Path]::ChangeExtension( $_.Name, "py")}

You can also change *.txt to C:\path\to\file\*.txt so you don't need to execute this line from the location of the files. You should be able to define a destination in a similar manner, so I'll report back if I find a simple way to do that.

Also, I found Microsoft's TechNet Library while I was searching. It has many tutorials on scripting using PowerShell. Files and Folders, Part 3: Windows PowerShell should help you to find additional info on copying and moving files.

OTHER TIPS

I was having problems just converting the filename from .html to .docx. I took your code above and changed it to this:

function Convert-HTMLtoDocx {
    param([string]$htmpath)
    $srcfiles = Get-ChildItem $htmPath -filter "*.htm*"
    $saveFormat = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument
    $word = new-object -comobject word.application
    $word.Visible = $False

    ForEach ($doc in $srcfiles) {
        Write-Host "Processing :" $doc.fullname
        $name = Join-Path -Path $doc.DirectoryName -ChildPath $($doc.BaseName + ".docx")
        $opendoc = $word.documents.open($doc.FullName)
        $opendoc.saveas([ref]$name.Value,[ref]$saveFormat)
        $opendoc.close()
        $doc = $null
    }  #End ForEach

    $word.quit()
} #End Function

The problem was the save format. For whatever reason, so save a document as a .docx you need to specify the format at wdFormatXMLDocument not wdFormatDocument.

This does a recursive walk of a root folder and writes and .doc to .htm filtered:

$docpath = "\\sf-xyz-serverabc01\ChangeTheseDocuments"
$WdTypes = Add-Type -AssemblyName 'Microsoft.Office.Interop.Word, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c' -Passthru
$srcfiles =  get-childitem  $docpath  -filter "*.doc" -rec | where {!$_.PSIsContainer}  | select-object  FullName
$saveFormat = $WdTypes | Where {$_.Name -eq 'WdSaveFormat'}
$word = new-object -comobject word.application
$word.Visible = $False

function saveas-filteredhtml
    {
    $opendoc = $word.documents.open($doc.FullName);
    $Name=($doc.Fullname).replace("doc","htm")
    $opendoc.saveas([ref]$Name, [ref]$saveFormat::wdFormatFilteredHTML);
    $opendoc.close();
    }

ForEach ($doc in $srcfiles)
    {
    Write-Host "Processing :" $doc.FullName
    saveas-filteredhtml
    $doc = $null
    }

$word.quit();

I know this is an older post but I am posting this code here so that I can find it in the future

**

This does a recursive walk of a root folder and Converts Doc and DocX to Txt

**

Here is a LINK to the diffierent formats you can save to.

$docpath = "C:\Temp"
$WdTypes = Add-Type -AssemblyName 'Microsoft.Office.Interop.Word, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c' -Passthru
$srcfiles =  get-childitem  $docpath  -filter "*.doc" -rec | where {!$_.PSIsContainer}  | select-object  FullName
$saveFormat = $WdTypes | Where {$_.Name -eq 'WdSaveFormat'}
$word = new-object -comobject word.application
$word.Visible = $False

function saveas-filteredhtml
    {
        $opendoc = $word.documents.open($doc.FullName);
        $Name=($doc.Fullname).replace(".docx",".txt").replace(".doc",".txt")
        $opendoc.saveas([ref]$Name, [ref]$saveFormat::wdFormatDOSText); ##wdFormatDocument
        $opendoc.close();
    }

ForEach ($doc in $srcfiles)
    {
        Write-Host "Processing :" $doc.FullName
        saveas-filteredhtml
        $doc = $null
    }

$word.quit();
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top