I actually have a number of interesting problems that are mind bending to me; but you guys can solve. I will think about it and frame the issue (not to mess up like in this post) carely and post it here.
I am back guys. I am having a slight issue with the lookaround regex. Apparently, it doesn't support complex characters.
IN the search field, I said I have consonants; which are normally simple characters such as b, d. But, X-sampa contains complex consonants such as tS and t_>.
In the regular regex, putting these complex characters along with the simple consonants works fine. But, the lookaround (lookforward) is not working.
But, the syllable can be CVC as well as CVCC (at the end of words).
List of vowels: (a|e|i|o|u|@|1)
List of Consonants: tS_>|J|p_>|t_>|Z|?|tS|ts_>|dZ|[b-df-hj-np-tv-z]
t_> @ r r a => t_> @ r . r a
l @ b b @ s @ => l @ b . b @ . s @
g @ r r @ f @ => g @ r . r @ . f @
f @ t @ n @ => f @ . t @ . n @
? a b a r @ r @ => ? a . b a . r @. r @
? a z @ n @ => ? a . z @ . n @
? a l @ q q @ s @ => ? a . l @ q . q @ . s @
g e b s => g e b s
if there is xx consonant (identical consonant reduplicated), insert the [dot] in between.
there should not be CCV
I was almost there with this macro, if not for the failure of the lookahead regex for complex consonants. X-sampa.kmmacros (12.0 KB)
In the macros: the first step, I insert [.] if there are two CC before a V. that to avoid CCV sequence (prohibited).
IN the second step I Insert [.] after every vowel. This the one discussed in this forum. That generates the right syllable structure for the rest of the word.
That's very helpful – do you think that those rules form a more or less determinate syllabic grammar of the material, or do they sometimes allow for ambiguities which are, for example, lexically or positionally resolved ?
There could be factors such as word category (noun, verb, names etc). That will be resolved manually when necessary because the rules will be very complicated if we include other factors. My aim is to capture the basic verb paradigms.
Mmm ... not easy It would need some back-tracking ...
If we sequence the patterns to try as:
[cvccEnd, cvc, cv]
that yields a mixed bag of success and failure:
t_> @ r r a => t_> @ r . r a
l @ b b @ s @ => l @ b . b @ s
g @ r r @ f @ => g @ r . r @ f
f @ t @ n @ => f @ t
? a b a r @ r @ => ? a b
? a z @ n @ => ? a z
? a l @ q q @ s @ => ? a l
g e b s => g e b s
and if we simply reorder the sequence of testing to:
[cvccEnd, cv, cvc]
then the result is different, but still patchy:
t_> @ r r a => t_> @
l @ b b @ s @ => l @
g @ r r @ f @ => g @
f @ t @ n @ => f @ . t @ . n @
? a b a r @ r @ => ? a . b a . r @ . r @
? a z @ n @ => ? a . z @ . n @
? a l @ q q @ s @ => ? a . l @
g e b s => g e b s
Thank you. But, if you don't have time, I have found some means of improving the solution by @stevelw.
This one ([(a|e|i|o|u|@|1)]) (?=([b-df-hj-np-tv-z]|(tS_>|J|p_>|t_>|Z|?|tS|ts_>|dZ)) [(a|e|i|o|u|@|1)]) seems to capture the complex consonants as well.
t_> @ r . r a
l @ b . b @ . s @
g @ r . r @ . f @
f @ . t @ . n @
? a . b a . r @ . r @
? a . z @ . n @
? a . l @ q . q @ . s @
g e b s
One thing that jumps to the eye there is that you may be getting a glitch with your ? consonant.
As that has a meaning in regular expressions (it optionally matches any single character), you need to escape it with a preceding backslash \? to treat it as a string literal.
(and if you are entering the backslash itself in a string context where that also has a special use, then you may need to double it: \\?)