Hi, I have a very difficult case I am trying to solve. I have a text with lots of RTL issues. For example the following sentence:
nochmals schriftlich auftreten. Geht ja das Volk שעטנו und ציצת sowie auch ,מצרים
look how it should look correctly:
The text surrounded by the green line shows the correct order. I could apply HTML tags like <span dir="rtl">
which brings the sentence in the right order but when I connect it with the text coming after it, the order changes again. The only solution I found so far is to apply the HMTL tag, turn the HTML to pdf, apply OCR and copy and paste with Acrobat Reader (nothing else works) to word. But the problem is that OCR creates more mistakes and I have a text over 3000 pages.
I invested many days, maybe weeks to solve it but without success. Is there any other solution you can suggest helping me out?
Greetings
To get any help you would need, I think, to zoom back a bit, and share some of your assumptions with us.
I have 'a text'
- Where is it coming from ?
- Where do you need it to go ? What final output format do you need ?
Have you tried working with Mellel ?
https://www.mellel.com/
Where do you see Keyboard Maestro fitting into this process ?
PS if the source is plain text, you may need to check that its encoding is utf-8
, which should encode LTR RTL bidirectional interlacing correctly.
The segment reversal which you display there might, I think, result from a coding mishap at some point in the pipeline upstream, if for example there has been any inadvertent slippage between unicode and non-unicode encoding and decoding.
UAX #9: Unicode Bidirectional Algorithm
One of the commonly used pieces of software which (at least last time I checked) fails to implement standard Unicode bidirectional handling is Sublime Text, and there may be others.
So the question arises – when you load the file and see that reversal of segments, what software are you using ?
Hi today, after many many hours I solved the issue. The solution is to wrap the text with the html tag <span dir="rtl">
with any ltr character at the beginning and at the end of the line, then convert it to pdf and copy paste from the pdf.
I know this issue is complicated but people who are using ltr and rtl languages will appreciate the solution.
1 Like