Translating PDF Files with Free Tools

translation_articles_icon

ProZ.com Translation Article Knowledgebase

Articles about translation and interpreting
Article Categories
Search Articles


Advanced Search
About the Articles Knowledgebase
ProZ.com has created this section with the goals of:

Further enabling knowledge sharing among professionals
Providing resources for the education of clients and translators
Offering an additional channel for promotion of ProZ.com members (as authors)

We invite your participation and feedback concerning this new resource.

More info and discussion >

Article Options
Your Favorite Articles
Recommended Articles
  1. ProZ.com overview and action plan (#1 of 8): Sourcing (ie. jobs / directory)
  2. Getting the most out of ProZ.com: A guide for translators and interpreters
  3. Réalité de la traduction automatique en 2014
  4. Does Juliet's Rose, by Any Other Name, Smell as Sweet?
  5. The difference between editing and proofreading
No recommended articles found.

 »  Articles Overview  »  Art of Translation and Interpreting  »  Translation Techniques  »  Translating PDF Files with Free Tools

Translating PDF Files with Free Tools

By Eric Le Carre | Published  03/4/2009 | Translation Techniques | Recommendation:RateSecARateSecARateSecARateSecARateSecI
Contact the author
Quicklink: http://www.proz.com/doc/2264
Author:
Eric Le Carre
France
English to French translator
 

See this author's ProZ.com profile

If, like me, you receive many requests for quotation for PDF documents, especially for PDF marketing materials, and you don't know where to start because you don't own any PDF editing software applications like Abbyy PDF Transformer, Nuance PDF Converter or Solid Converter PDF, this article will show you how to count words in your PDF files, extract the texts and keep their formatting.

This solution is more a workaround than a fully functional solution for translating PDF files, as it may require some manual editing work. However, it is well suited for short to mid-sized PDF documents, especially for PDF marketing materials.

Please note that this solution doesn't work if your PDF file is password-protected and has PDF security options turned on.

The tools you need are the following:


For information on how and where to install these programs, especially AbracadabraCompteur 2, read the accompanying documentation.

Counting Words Counting words is the basic step you need to perform to know how many words there are in your PDF file and provide your customer with quoting and pricing information.

To count words with AbracadabraCompteur 2:

  1. In Adobe Reader, select Tools > Word Counter > Current Page to count the words from the currently displayed page or Tools > Word Counter > Document to count the words in a PDF files with more than one page.
  2. To count word with Translator's Abacus:
  3. Double-click the WordCount.exe file, the executable file for Translator's Abacus. For my part, I put it under C:\Program Files\Translator'sAbacus3.1 and created a shortcut on the Windows Desktop.
  4. In the Translators Abacus window, click Add files.
  5. In the Open File window, select the PDF file or files whose words you want to count.
  6. Click Report Word Count.
  7. The word count is displayed in your Web Browser.
  8. In the Translators Abacus window, click Exit to quit the application.

Extracting the text... There is a special way of extracting the text from the PDF file.

In Adobe Reader, select Editing > Select All, then select Editing > Copy. You can also use the key combinations Ctrl+A (Select All) and Ctrl+C (Copy). All the selected text is then copy into the Windows Clipboard.

...and Keeping its Formatting Using AutoUnbreak, you can keep the basic formatting attributes of the original PDF files (font names, sizes, colors, etc.) and remove most of the carriage returns/ line breaks that you get when you simply cut and paste the contents of a PDF file into an empty RTF or MS Word document.

To keep the format of your original PDF file:

  1. Double-click the AutoUnbreak.exe, the executable file for AutoUnbreak, to start the application.
  2. In the AutoUnbreak main window, click 1. Paste to paste the contents of the Windows Clipboard into AutoUnbreak.
  3. When the contents of the Windows Clipboard are in the AutoUnbreak main window, click 2. Unbreak! to remove the carriage returns/line breaks.
  4. In the Processing done! message window that appears, click OK.
  5. Back into the AutoUnbreak main window, click 3. Copy results.
  6. In the Text copied to clipboard message window that appears, click OK.
  7. Back into the AutoUnbreak main window, click Quit to close AutoUnbreak.
  8. Start your MS Word processor.
  9. In an empty MS Word page, press Ctrl+V to copy the resulting text from your AutoUnbreak session into MS Word.

You can now compare your MS Word text and the original text from the PDF file to determine whether there are still unremoved carriage returns/ line breaks and/or any other remaining formatting issues. These will have to be manually fixed.

When you are happy with your new MS Word document, you can start translating it with the translation memory system of your choice.

Happy translating!

This article was written with KompoZer, an open source WYSIWIG (What You See Is What You Get) HTML editor.


Copyright © ProZ.com, 1999-2024. All rights reserved.
Comments on this article

Knowledgebase Contributions Related to this Article
  • Link to Autounbreak (Posted by Virginie Mombey Indaki Caura on 05/21/2009)
    The link provided in this article for Autounbreak seems to be no longer available You can still download Autounbreak from the following link: http://www.tucows.com/preview/500305

     
Want to contribute to the article knowledgebase? Join ProZ.com.


Articles are copyright © ProZ.com, 1999-2024, except where otherwise indicated. All rights reserved.
Content may not be republished without the consent of ProZ.com.