OK, this is none of my business, but I have to ask:
Do you really need to convert al 54,000 files now? Are you, or someone, going to read them in the near future? Why not put them in a folder that Spotlight can index, and then convert when needed? Please feel free to ignore these questions.
If that pops up than Word has a problem which you canât just ignore. Even if youâd find a GUI-less, silent way to convert without showing you that message, then you would end up with converted but faulty documents next morning.
And thereâs little chance that any script or other automated process could select the correct encoding for you.
If you still don't have a solution, and haven't already done so, then you might do an Internet search of "batch convert doc to pdf".
I got many hits, some (many) are for Windows, but if you're using Sharepoint then it suggests that someone on your team is using Windows. It might be easier to do this conversion in Windows than on a Mac. I read somewhere that Adobe Acrobat Pro Win could do this type of batch conversion, from .doc to .pdf.
Yep; I'll now explain. I have over 54,000 .doc files in which I need to batch search-and-replace certain sections/paragraphs of text before I can legally publish the content of each of those 54,000 files on my Web site. I have two options:
I already have a server-level program that can batch search-and-replace in .docx (XML) files. However, before I can make use of that program, I first need to convert my 54,000 .doc files to .docx.
[PREFERRED] I would love to be able to batch search-and-replace directly in the 54,000 old, original .doc files so that I don't even have to bother with first converting them to .docx. I do, in fact, already have an Automator workflow that individually opens each specified .doc file, searches-and-replaces (using wildcards) as required, saves the files, and then closes the files. The deal-breaker, however, is that it results in MS Word taking the following steps:
A. opens all specified .doc files at once
B. then it begins to edit the files one-by-one
C. then it begins to save the files one-by-one
D. then it begins to close the files one-by-one
It will not begin Step B, Step C, or Step D until the previous step has been completely finished for all specified files. So, if I attempt to batch search-and-replace in -- for example -- 500 files, Step A results in MS Word systematically opening 500 file windows before it will even begin to do the actual work of searching-and-replacing in any one of the 500 files. As you might imagine, attempting to open 500 windows soon causes MS Word to hang and become completely unresponsive, necessitating a force-quit. This workflow would be absolutely perfect if I could alter it in such a way that MS Word would open, search-and-replace, save, and close 1 file at a time.
Have you considered/tried writing a simple Word VBA macro, that cycles through the .doc files in a given folder one at a time, doing the search and replace, and then save as PDF?
Via VBA, you can tell Word to auto-confirm, auto-accept changes. If you don't want to keep Word running for too long, then put a timer (or number of files) to limit each run.
No need to split -- the main issue is dealing with large numbers of .doc files. Whether the output is PDF or .docx is easily handled, at least by Word.
I've been researching and testing the last couple of days, trying my best to create what you envisioned. Unfortunately, I just can't get anything to work as required. Maybe it's because I'm using a Mac and most instructions are for Windows. Or, maybe it's because I'm using a different version (2011) of Word for Mac that conflicts with the Mac instructions that I have found. What's become quite clear to me is that I'm in way over my head.
It was written for Word Windows, so you will need to make some adjustments, particularly the folder paths and file open/save. Test it using a Test Folder with 2 or 3 .doc files (copied from your source).
If you get stuck, try a google on "Word 2011 Mac VBA file open", for example.
Replace "file open" with whatever task you need help with.
If you still can't make it work, post your Word VBA macro here, and I'll try to help.
I still canât get it. Iâve been at my desk literally all day working on this yet again. VBA is completely alien to me, and I canât make work any of the instructions that I have found. I donât even know for sure when/where/how to activate this particular VBA, so âtestingâ it has proven very difficult. Anyway, following is the code that I have thus far. I changed only the initial âPathToUseâ declaration. After that, I could not identify anything else to change.
Option Explicit
Public Sub BatchReplaceAll()
Dim FirstLoop As Boolean
Dim myFile As String
Dim PathToUse As String
Dim myDoc As Document
Dim Response As Long
PathToUse = "/Users/Jason/Desktop/macro-testing"
'Error handler to handle error generated whenever
'the FindReplace dialog is closed
On Error Resume Next
'Close all open documents before beginning
Documents.Close SaveChanges:=wdPromptToSaveChanges
'Boolean expression to test whether first loop
'This is used so that the FindReplace dialog will
'only be displayed for the first document
FirstLoop = True
'Set the directory and type of file to batch process
myFile = Dir$(PathToUse & "*.doc")
While myFile <> ""
'Open document
Set myDoc = Documents.Open(PathToUse & myFile)
If FirstLoop Then
'Display dialog on first loop only
Dialogs(wdDialogEditReplace).Show
FirstLoop = False
Response = MsgBox("Do you want to process " & _
"the rest of the files in this folder", vbYesNo)
If Response = vbNo Then Exit Sub
Else
'On subsequent loops (files), a ReplaceAll is
'executed with the original settings and without
'displaying the dialog box again
With Dialogs(wdDialogEditReplace)
.ReplaceAll = 1
.Execute
End With
End If
'Close the modified document after saving changes
myDoc.Close SaveChanges:=wdSaveChanges
'Next file in folder
myFile = Dir$()
Wend
End Sub
@calbear, sorry you're having so much trouble. If you can't make any sense of VBA, then you'll need to get help from someone who does.
Will just a little knowledge of VBA, it should be possible to mod the VBA macros available on the internet.
Sorry, but I don't have time right now to take on a project like this.
There are a number of tech support, tech help, programming services web sites available. You might try:
I haven't used it recently, but I did years ago, and it was pretty good.
I believe their basic service is free, at least for a short period (30 days).
Most likely you will be able to find a VBA expert there to help you.
I think I've almost got a solution for batch-converting .doc to .docx, 1-by-1. I now have a script (which I saved as a "service" for the Finder) that works like a charm for converting batches of .doc to .pdf, 1-by-1, without losing any formatting. Here's that script:
property theList : {"doc", "docx"}
on run {input, parameters}
set output to {}
tell application "Microsoft Word" to set theOldDefaultPath to get default file path file path type documents path
repeat with x in input
try
set theDoc to contents of x
tell application "Finder"
set theFilePath to container of theDoc as text
set ext to name extension of theDoc
if ext is in theList then
set theName to name of theDoc
copy length of theName to l
copy length of ext to exl
set n to l - exl - 1
copy characters 1 through n of theName as string to theFilename
set theFilename to theFilename & ".pdf"
tell application "Microsoft Word"
set default file path file path type documents path path theFilePath
open theDoc
set theActiveDoc to the active document
save as theActiveDoc file format format PDF file name theFilename
copy (POSIX path of (theFilePath & theFilename as string)) to end of output
close theActiveDoc
end tell
end if
end tell
end try
end repeat
tell application "Microsoft Word" to set default file path file path type documents path path theOldDefaultPath
return output
end run
However, when I try to edit it for .docx conversion, I get this error:
The action âRun AppleScriptâ encountered an error.
Here's my edited code for conversion to .docx:
property theList : {"doc"}
on run {input, parameters}
set output to {}
tell application "Microsoft Word" to set theOldDefaultPath to get default file path file path type documents path
repeat with x in input
try
set theDoc to contents of x
tell application "Finder"
set theFilePath to container of theDoc as text
set ext to name extension of theDoc
if ext is in theList then
set theName to name of theDoc
copy length of theName to l
copy length of ext to exl
set n to l - exl - 1
copy characters 1 through n of theName as string to theFilename
set theFilename to theFilename & ".docx"
tell application "Microsoft Word"
set default file path file path type documents path path theFilePath
open theDoc
set theActiveDoc to the active document
save as theActiveDoc file format format DOCX file name theFilename
copy (POSIX path of (theFilePath & theFilename as string)) to end of output
close theActiveDoc
end tell
end if
end tell
end try
end repeat
tell application "Microsoft Word" to set default file path file path type documents path path theOldDefaultPath
return output
end run
I see no reason why it wouldn't work if I can just figure out what, exactly, the "error" is and fix it. I don't know how to go about researching the error because there is no error #/name/ID.
I think you are going in circles now As far as I can tell your script does the same as Chrisâ script from above, which I packed as doc>docx version in the applet.
I showed you the correct output format in the earlier post:
save as file format format document file name theFilename without maintain compatibility
(You have written âfile format format DOCXâ which doesnât exist.)
If you want to save it with Compatibility remove the without maintain compatibility part.
To find the available formats and other options open Wordâs AppleScript dictionary with Script Editor.
â
The critical difference is that the new version converts files 1-by-1, rather than first opening all of the files at once before even beginning to convert the first file (which -- in the case of my particularly-sized files -- would crash MS Word after about 50 windows) .
So, just to clarify, the following script works perfectly to convert batches of hundreds/thousands of .doc files to .docx, without losing any formatting, graphics, tables, etc. I saved it as a "service" (doc2docx1by1) whereby I highlight all files in the Finder that I wish to convert and then select the "doc2docx1by1" service from the Finder's contextual menu.
property theList : {"doc"}
on run {input, parameters}
set output to {}
tell application "Microsoft Word" to set theOldDefaultPath to get default file path file path type documents path
repeat with x in input
try
set theDoc to contents of x
tell application "Finder"
set theFilePath to container of theDoc as text
set ext to name extension of theDoc
if ext is in theList then
set theName to name of theDoc
copy length of theName to l
copy length of ext to exl
set n to l - exl - 1
copy characters 1 through n of theName as string to theFilename
set theFilename to theFilename & ".docx"
tell application "Microsoft Word"
set default file path file path type documents path path theFilePath
open theDoc
set theActiveDoc to the active document
save as theActiveDoc file format format document file name theFilename without maintain compatibility
copy (POSIX path of (theFilePath & theFilename as string)) to end of output
close theActiveDoc
end tell
end if
end tell
end try
end repeat
tell application "Microsoft Word" to set default file path file path type documents path path theOldDefaultPath
return output
end run
Does anyone here know if it is even possible to create an âalways running until manually stoppedâ AppleScript that will perform a COMMAND-PERIOD action on any MS Word file that remains open for more than 15 seconds? I am asking this because I am running batch operations on thousands of .doc/.docx files and MS Word âstalls outâ fairly often on certain files. Pressing COMMAND-PERIOD immediately fixes the problem and enables the batch operation to proceed again.