Greek encoding trouble

gabrielhl's picture

Hello dear typophiles,

I'm working on typesetting a document with mixed latin and greek text. My problem is not being able to find out what font was used for the greek, or even a suitable replacement.

The "manuscript" that I received is an XML file, so I have no information on what font was used for the typesetting of the previous edition, and I'm having a hard time finding out what font was used.

The main problem, rather than knowing the exact typeface used (it could be changed for the new edition) is that the encoding is very strange: for all regular greek letters, the actual text is in latin letters, following the encoding of the old "WinGreek" font - a for alpha, b for beta, etc. However there are also accented characters which are indicated in the xml by an entity, i.e. &iacugr; for small iota with acute; these can be easily converted to their actual unicode points or even a private area one, etc.

I've already searched trough a lot of sites to try to find something that matches this encoding - let's call it "WinGreek Extended", but so far I haven't been successful. Does anyone know of such a font?

Contact with the previous typesetter is very very hard (previous problems took literally a year to be answered) so any help is appreciated.

Thanks!

John Hudson's picture

This is probably a custom encoding, and most likely the oldGreekKeys encoding, as this was the most popular for Polytonic Greek among scholars. Ideally, you would want to identify the custom encoding and then convert the Greek text to Unicode.

You may find this paper useful:
http://ucbclassics.dreamhosters.com/djm/unicodeTalk/unicodeMontrealAPA.html

gabrielhl's picture

Thank you for the link John, I will read it.

DTY's picture

This sounds like a mixture of TEI XML and one of the old 8-bit encodings (GreekKeys, WinGreek, or one of the others). If they're intermingled within the same passages, this may be some sort of strange hack by the previous typesetter. It would be more work to convert it automatically into Unicode if it mixes two systems; if you can't get the information from the previous typesetter, you may need to put together a conversion table yourself.

Syndicate content Syndicate content