•  How can I correctly recuperate/extract Arabic, Chinese and Hebrew text?

dimitri_c's picture

Hello -

Context:
As a packaging designer, my clients are often requesting that we add Arabic, Chinese and Hebrew texts on packaging.
More often than not, I receive 4 different types of text sources :

— 1.
A .PDF containing a scan of the text needed (so nothing more than a picture).

— 2.
A .PDF whereby I can select the text and copy it into MS Word® but the sentence is no longer the same as what I had copied from the .PDF file! (e.g. some of the characters are replaced by strange ones).

— 3.
A .PDF file where the client has inserted an annotation which displays correctly in the Acrobat® but the moment that I do a copy/paste into MS Word®, the whole sentence is inversed (text appearing left to right instead of right to left -> For Arabic).

— 4.
Sometimes I also receive Excel® files.

After copying the text into MS Word®, I save the file in .PDF format and I then re-open it in Adobe Illustrator®.

Softwares:
• Adobe® CS5 (Illustrator,...)
• MS Office® 2011
• Mac OS 10.6.8

Question:
Is there an easier, faster and more reliable method to extract such "exotic" text?
e.g. a plug-in, another software, another method (instead of having to pass through MS Word®, ...)

Thank you for your precious help.

- Dimitri

Joshua Langman's picture

Well, there are Middle Eastern and Asian versions of InDesign. I don't see InDesign mentioned in your post anywhere, but I would guess that if you're designing packaging, your art eventually ends up in InDesign, right? If you used one of these versions, ideally — I can't promise this — you should be able to copy the text from a PDF and paste it correctly without needing to go through Word, Illustrator, or anything else.

I haven't done this myself, though, so someone correct me if I'm wrong.

Tom Gewecke's picture

One of your problems is probably that MS Word for Mac has never supported Arabic/Hebrew, so you should definitely avoid that. OpenOffice will normally work better. But I would agree with Joshua you should try the ME version of Adobe CS.

PDF format is also unfortunately not reliable for copy/paste of non-Latin scripts because sometimes the way it gets created results in garbage internal coding.

dimitri_c's picture

Hello -

Thanks for your replies...
In fact, we only use Adobe Illustrator and Photshop to make packaging. Our agency only use InDesign for "editon" jobs (books, ad, annual report, ...).

I'll try with Libre Office or something similar.

Have a nice day.

- Dimitri www.thebend.be

Syndicate content Syndicate content