From Shoebox to SQL

translation_articles_icon

ProZ.com Translation Article Knowledgebase

Articles about translation and interpreting
Article Categories
Search Articles


Advanced Search
About the Articles Knowledgebase
ProZ.com has created this section with the goals of:

Further enabling knowledge sharing among professionals
Providing resources for the education of clients and translators
Offering an additional channel for promotion of ProZ.com members (as authors)

We invite your participation and feedback concerning this new resource.

More info and discussion >

Article Options
Your Favorite Articles
Recommended Articles
  1. ProZ.com overview and action plan (#1 of 8): Sourcing (ie. jobs / directory)
  2. Réalité de la traduction automatique en 2014
  3. Getting the most out of ProZ.com: A guide for translators and interpreters
  4. Does Juliet's Rose, by Any Other Name, Smell as Sweet?
  5. The difference between editing and proofreading
No recommended articles found.

 »  Articles Overview  »  Technology  »  CAT Tools  »  From Shoebox to SQL

From Shoebox to SQL

By Danilo Nogueira (X) | Published  06/7/2005 | CAT Tools | Recommendation:RateSecARateSecARateSecARateSecARateSecI
Contact the author
Quicklink: http://fin.proz.com/doc/225
Author:
Danilo Nogueira (X)
Brazil
English to Portuguese translator
 
View all articles by Danilo Nogueira (X)

See this author's ProZ.com profile
From Shoebox to SQL
Are you old enough to have had a shoebox glossary? In the old times, many of us did. Most of the cards were blank; many were incomplete, all usually all out of order. We were always in the hope that someday we would have the time and courage to complete, correct and alphabetize the stuff, which most of us never did. Shoebox glossaries were cumbersome, but considered very practical, because the index cards could be arranged in alphabetical order, to facilitate searches.

Any self-respecting CAT tool will be able to import your glossaries and make good use of them.
We now have computers, of course, and the few of us who still have index-card glossaries only keep them as a memento of things past. Some of us are computer wizards and can scare up extraordinary glossary templates in less time than you or I can spell "lexicography." Most of us however have very pedestrian computer skills, a fact that led me to collect a few good hints on the art and craft of keeping glossaries for the use of non-geeks.
  1. Do not use MSWord tables
  2. First of all, please, do not format your glossaries as MS Word tables. They are very neat and all, but as the glossary grows, the file will become too big for even a fast machine. In addition, long MSWord files are accident-prone and more than a colleague has learned, to her dismay, that her treasury of words was lost because the file went bad without warning or notice.

  3. Eliminate paragraph marks within cells
  4. Before converting the table into a tab-separated text, however, edit out all paragraph marks within cells. They won't do any harm in a tab-separated file until you sort it in alphabetical order-something that you're bound to do sooner or later. You may replace the paragraph marks with plain spaces, or perhaps with a bullet. To replace them with a space, open the search-and-replace panel, enter ^p in the search box, hit the tab key once to jump to the "replace with" box and hit the spacebar once. To replace the paragraph mark with a bullet, hit the space bar once, hold down the Alt key and enter 0149 in the numerical keyboard, then hit the spacebar again. Then start replacing. Every paragraph mark will be replaced with a bullet like this: •, sided by two spaces. This only works if the Num Lock light is on. If it is off, hit the Num Lock key.

  5. Convert MSWord tables into tab-separated text
  6. Tables should be converted into text. Under the "table" menu there is a neat option to convert tables into text, and it even lets you chose the separator. By all means select a tab. There are very many advantages in using tabs and we will deal with each one at its time. For now, suffice it to say that if you choose a paragraph mark, you will have something like

    current assets

    ativo circulante

    ... which may look very well onscreen, but will make havoc with your glossary when you alphabetize the list.

  7. Keep your glossaries as plain text files
  8. Glossaries should be stored as plain text files. Plain text files may be ugly, but they are very useful for several purposes-especially for keeping glossaries. Text files are smaller-and thus faster to load-than Word files and, in addition, less accident-prone. And even a glossary with many thousands of items will not overburden a ram-starved system. In addition, they can be opened with Word, Excel, WordPad and NotePad, among other programs, a very important characteristic, because, as you will see, each of those programs can offer its own advantages.

    One disadvantage is that the file will look disorganized, because some terms will be longer than others and consequently the translations will not be nicely aligned as one would wish.

    assets ativo

    in accordance with generally accepted accounting principles de acordo com os princípios contábeis geralmente aceitos

    There is nothing you should do regarding this. You can add tab until the target terms are well aligned, but this temptation should be resisted at all costs, because such files make further manipulation more complicated.

  9. Leave one tab only between source and target
  10. If you have a glossary where source and target have been aligned with the use of multiple tabs, ask Word to replace

    ^t^t

    with

    ^t

    Do it several times, until Word duly informs you that it could not find ^t^t in the document.

    The glossary will look uglier, but, on the other hand, will open correctly in MSExcel, which is a great advantage.

  11. Make the gap between source and target term visible
  12. Another disadvantage is that sometimes the gap between source and target will be almost invisible:

    current assetsativo circulante

    This is not a problem in itself, but, if it disturbs you, you may replace each tab with, say, five spaces and a tab. But, please, keep in mind that the order is spaces+tab. Reversing that order, that is, using tab+spaces will create problems later on. For now, please, just believe me.

  13. Create a "comments" column
  14. If you like to add comments to your entries, you may prefer to add them to a separate column, like this

    reserve reserva insurers often use "reserve" where Brazilian practice requires "provisão," not "reserve."

    This practice reduces clutter. Of course, you may even have separate columns for comments on the source language and on the target language, resulting in a four-column format.

  15. Spell-check your glossaries
  16. How can you check the spelling of a bilingual glossary? Of course, if it is still in table form, you can use the Tools menu to format each column in a language and check it using word's spell checker-provided you have checkers for the two languages involved, that is.

    If the files are tab-separated, you will have to use a trick: open the file using MS Excel. Excel will open it easily, not before asking a few easy questions. Within Excel, select a column, click tools > check spelling, select the appropriate language and go ahead. Then, select the next column and repeat the process.

  17. Let all of your glossaries have the same direction
  18. There is a very good chance you have accumulated glossaries in both directions, say English > Portuguese and Portuguese > English. Even if you live in one of those jurisdictions where you must work in both directions, keeping glossaries in both directions has several disadvantages. Open the glossaries "in the wrong direction" using Excel and simply drag the first column to the right side of the second and you will have it reversed. Just save again as plan text and it will be over.

  19. Give your glossaries consistent names
  20. Then save the file as "text only." Word will alert you that your will be losing format, but that should not worry you. Go and save it with some name you can remember. And use a standard system to name all of your glossaries. For instance: glo_medical_jones.txt, meaning a glossary of medical expressions that you received from your friend Jones. If all of your glossaries have names beginning with "glo" or something of that sort, it will e easier to find them by merely asking windows to search for glo*.txt, for instance.

  21. Sort and filter your glossaries using MSExcel
  22. Open the glossary using MSExcel. Select the whole enchilada by clicking on the little empty square to the left of the "A" column and above line 1. Then, Data > Sort and sort your glossary based on any column you want. This is not only good when you want to alphabetize the glossary based on the target column, but also when you want to retrieve all items that originally came from medical_jones and have spread all over the place when you alphabetized the thing for the first time.

    You can also use the Data > Filter > Autofilter tool, which is very intuitive, to retrieve all medical_jones stuff.

  23. Have your glossaries opened in Wordpad
  24. Plain text files open in NotePad. NotePad is very fast and light and has a basic search function that is often good enough to deal with most needs. However, it exceedingly ugly. One thing you can do is right-click on any text file, then open with (not plain "open"!), choose program, and Windows will open a list of programs. Select Wordpad, which is much prettier and has a slightly better search function. There is a box asking if want to use WordPad to open all plain text files. Select it.

    From now on, whenever you double click on a txt file, it will be opened using WordPad. This does not prevent you from opening the file using another program to open it, of course. Open MSWord and you will see that it can open the file nicely, provided you select "open all files." Or right click on the name of the file on Windows Explorer see that you can open it with other programs, too.

  25. Give your glossaries a specific extension
  26. If you are a bit more adventurous, you can give your glossaries a specific extension. Using Windows Explorer, select the file, press F2 and overwrite its name, keeping the main portion and replacing the extension with, say, gloss (if you are under Win 98 or 95, use a three-letter extension). If you cannot see the extension (what comes after the dot) go open Windows Explorer, go to Tools > Folder Options > View and uncheck "hide file extensions for known file types."

    This has the advantage that you can assign your glossaries to WordPad, using the process indicated above, and still have other plain text files opened with NotePad or whatever program you like. MSWord and MSExcel will still open it, if you choose "File > Open > All files."

  27. Separate glossaries or big momma?
  28. Some people like to keep different glossaries for different subjects, on the grounds that different terms have differing meanings according to the subject. This is very true, of course. But it does not mean that a term that you put in your medical glossary will never crop up with the same meaning in a corporate annual report.

    To merge all files, create a word file, and then use Insert > File to merge the separate files one after the other, saving the doc after each insert, for safety's sake. After all files are inserted, save the file again, as plain text. If you have decided to give your glossary a special extension, remember to use it.

  29. Identify big-momma entries
  30. Do you want entries in the combined glossary to reflect the individual glossary they came from? There are at least two easy ways to do it.

    Imagine you have a glossary called "medical_jones," meaning a glossary of medical expressions that you received from your friend Jones. You can: open it using MSWord and ask word to replace

    ^p

    with

    medical_jones^p

    ... and, lo and behold, all entries will end with "medical_jones."

    Alternatively, you can open the glossary using MSExcel. If this is the case, on the top cell of the first empty column write medical_jones. The cell will be framed by a thick black line. On the bottom right side of the cell, you will see a small square. Using the mouse pointer, drag the square a few cells down and the worksheet will start rolling. Let it roll until you reach the end of the entries and then release the mouse bottom. Again medical_jones will appear in the end of all cells. Save the file as a txt, again.

    When you merge this glossary into the big momma, the entries will be clearly distinguished from "smith_tractors," if that is what you wanted.

  31. Give your glossary a shortcut and its own icon
  32. Instead of fishing your HD for your glossary every time you need, it, give the glossary a shortcut. Using WindowsExplorer, right click on the file and drag it to the desktop area, when you release the mouse button, Windows will ask you if you want to create a shortcut. Click on that option and a shortcut will be created.

    Right-click on the shortcut and a long menu will be shown. One of the items, probably the last, will be "properties." Click on properties > general > shortcut > change icon and select one of the many offered by Windows.

    If you prefer to keep several glossaries this will not work. Instead, you should place all glossaries in the same folder and create a shortcut and/or icon to that folder.

    While you are at it, drag the shortcut to the MSOffice toolbar. If you don't use a toolbar, try it, by all means: Start > programs > MSOffice > MSOffice tools. This will create a shortcut bar that you can adapt to your own ways. Drag into it the icons of the programs or files you use more frequently.

  33. Use your glossaries in connection with CAT tools
  34. I am a strong supporter of CAT tools. Some translators roundly refuse to use the stuff or use it only when the client demands. This, in my opinion, is an error. CAT tools are of great help to the translator, once you learn to work with them and you should start delving into the matter NOW.

    Any self-respecting CAT tool will be able to import your glossaries and make good use of them. A good glossary for a CAT tool will have comments segregated from the entries, like I explained above. When the source word appears in the source text, the glossary function will automatically search the glossary and permit you to insert the translation in the right place, either automatically or semi-automatically, depending on the software itself and on how you configure it. The point is that if you do not segregate the comments in a separate column, the program will insert the comment together with the translation and you will have something like this:

    The Company is building a new facility ["instalation" para "instalação" é falso cognato] for the processing of watered-down milk.

    Because CAT software inserts glossary items automatically or semi-automatically, it is a good thing to included frequently used terms. My CAT glossary, for instance, includes the months of the year and the names of certain cities, such as London, called Londres in Portuguese, which crop up very often in the things I do.

  35. Use specialized software
  36. If you have a large number of glossaries, you may need more than the above. The purpose of this article was just to help the small guys who have no need, no desire and no money for specialized software. I may return to the matter in the future to deal with a few pieces of supped-up software that can do a thing or two with a large number of glossaries. Not today, please. I already have a headache.

    A final word about SQL: this means Structured Query Language, a kind of secret language used by computer wizards to charm databases (including glossary databases) into doing what they want. But I must stop. The headache is really killing me.




Comments on this article

Knowledgebase Contributions Related to this Article
  • No contributions found.
     
Want to contribute to the article knowledgebase? Join ProZ.com.


Articles are copyright © ProZ.com, 1999-2024, except where otherwise indicated. All rights reserved.
Content may not be republished without the consent of ProZ.com.