Question

I'm looking to create an automated Powershell script with task scheduler to do a mass rename of auto-generated PDFs and then save them to a second folder. The original name is irrelevant but is generally in the form 0013238974.pdf. These each need to be renamed based on text contained within the file. Example:

TEXT TEXT TEXT 

$ACCT_ID

TEXT TEXT TEXT

Thus the new name of the file would need to be $ACCT_ID.pdf, and then saved in the new destination. I've got no problem with the move, that's just a simple

Get-ChildItem -Path C:\Original\PDF\Generation\Folder -Include *.pdf -Recurse |
copy-item -destination C:\The\Folder\I\Need\Them\In

But I'm stumped after that when it comes to extracting the information from the already generated PDF and saving the renamed version as $ACCT_ID.pdf.

I considered running it through a separate PDF print command instead of open/resave, but that doesn't solve my $ACCT_ID extraction problem.

Thanks for any insight on this.

Was it helpful?

Solution

There isn't any build-in functionality for reading PDF files in PowerShell so your best bet is to use a third party .NET component. There are several commercial and also at least a few free open source alternatives.

Here's a few lines of example code using iTextSharp to read the PDF:

Add-Type -Path .\itextsharp.dll
$pdfReader = New-Object iTextSharp.text.pdf.PdfReader("C:\file.pdf")
$textFromFirstPage = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdfReader, 1)
$pdfReader.Dispose()

How you go about finding your account id after that of course depends on the text of your files.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top