Assign all Chinese Character Snippets into Variables

Alice_Shi · July 7, 2019, 2:00pm

Is there a way to assign every snippet of Chinese into variables?
For example:

argue (with sb); argue (about/over sth) ★ v.争吵,争论 
I don't want to argue with anyone.
我不想和谁吵架。

variable_01 = 争吵,争论
variable_02 = 我不想和谁吵架。

Thanks in advance

PS: \p{Han} match any Chinese characters in Regular Expression.

ccstone · July 7, 2019, 3:58pm

Hey Alx,

Can it be done? Sure – but it's not exactly straightforward.

Run the macro. See the “foundText_” items in the variable inspector panel of the Keyboard Maestro Editor preferences.

-Chris

Find Contiguous Chinese Glyphs and Punctuation and Save to Dynamic Variables v1.00.kmmacros (8.5 KB)

Alice_Shi · July 8, 2019, 1:02am

@ccstone Works great! For loop here is the best solution I guess.
But it seems like there's a little problem about nested variables.

After I display the variable ***foundText_%Variable%varIndex%***.
What I got after running the macro are
'foundText_1', ''foundText_2'' instead of
'争吵,争论', '我不想和谁吵架。'

I've added a prompt to your previous macro hoping it's easier to examine▼
Find Contiguous Chinese Glyphs and Punctuation and Save to Dynamic Variables v1.00_with prompt to examine.kmmacros (9.5 KB)

gglick · July 8, 2019, 1:25am

Did you check the foundText variables in the KM preferences' Variables pane after running the macro? You should be able to find the variables with the Chinese text there:

11%20AM

If you'd like to include a prompt that displays the found text in every loop, this modification to the version you uploaded should do the trick (new actions marked with yellow):

Find Contiguous Chinese Glyphs and Punctuation and Save to Dynamic Variables v1.01_with prompt to examine.kmmacros (10.0 KB)

25%20AM

ccstone · July 8, 2019, 1:26am

Hey Alx,

When I run the macro I get two dynamically created variables.

foundText_1

With content:

争吵,争论

foundText_2

With content:

我不想和谁吵架。

You're saying that you do not get this result after running my macro?

-Chris

Alice_Shi · July 8, 2019, 2:22am

@gglick This is working! seems like “Filter Variable ‘Variable’ with Value of Named Variable” is a must. ▼

I've searched the forum and got this: What does "Filter Variable ‘Variable’ with Value of Named Variable" do? - Questions & Suggestions - Keyboard Maestro Discourse_2019-07-08

Still a little confused though but I'll try to digest it...

Anyway, Problem solved! Hooray~
@ccstone Thanks for the solution!
@gglick Thank you too for the explanation.

JMichaelTX · July 8, 2019, 6:03am

Looks like Chris @ccstone has provided a solution to your request.
I have a couple of thoughts that you might find helpful.

Storing Results of Matches

May I ask why you want to store the results into separate KM Variables? How do you plan to use these variables?

IME, it is usually better to store the results of multiple matches into a single variable, usually with each match being on a separate line. This makes it much easier to process the results further and/or to save the results to a file.

The exception to this is when you want to store each match into a known entity, like "Name", "Address", "Phone", etc.

Chinese Punctuation

I know next to nothing about Chinese, but in a bit of research I did it would appear that Chinese punctuation is different from punctuation of Indo-European languages. So, if you expect there may be Chinese punctuation in your source text, the RegEx metacharacter [:punct:] that Chris uses may need to be adjusted/expanded.

After a bit more study, it is not clear to me whether or not the RegEx character class \p{han} includes Chinese punctuation or not. Perhaps Chris will know. Here is one reference that I found most helpful, yet not conclusive: Use regular expression to match ANY Chinese character in utf-8 encoding .

OK, I did one experiment using this for a Chinese comma:
逗号

If that is a comma, then the RegEx \p{han} DID match it. So maybe mystery is solved?

Well, I hope this will add more information than confusion. Any experts on Chinese Regex please jump in.

ComplexPoint · July 8, 2019, 1:11pm

\p{Han} (not always recognized without the leading upper case character) matches 汉子 (Chinese characters) but not the double-width punctuation characters used in Chinese text.

逗号 is not itself a comma. (It's the Chinese name for a comma, and consists of two 汉子 - Han zi)

JMichaelTX · July 9, 2019, 12:30am

Thanks for the correction and clarification.

How would you do a RegEx search (match) for the double-width punctuation characters used in Chinese text?

Alice_Shi · July 16, 2019, 10:41am

@JMichaelTX

About Storing Results of Matches

What I really want to do is find each Chinese line of the texts then set its style (Such as bold them) ▼ You can test it in Pages app.

Search each line include Chinese Character then set its style.kmmacros (8.2 KB)

About Chinese Punctuation

After some research, I've found that you can use [\p{Han}] to match any Chinese Character except punctuation(Chinese punctuation and European punctuation). But you can match any Chinese character including both Chinese punctuation and European punctuation by using [\p{Han}[:punct:]]

@ComplexPoint
\p{Han} can't match any punctuation (Chinese or European, double-width or not)

JMichaelTX · July 16, 2019, 8:00pm

This is good to know. Could you please provide a test case for this, that includes Chinese characters and Chinese punctuation?

Thanks.

ComplexPoint · July 16, 2019, 10:15pm

That hasn't been my experience – [:punct:] doesn't match CJK (double-width) punctuation in Regex engines that I'm using. See here for example, the two single-width commas are matched, but not the Chinese punctuation:

Possibly some other engines differ in their definition of [:punct:] ?

(My experience is that you simply have to build a Regex which lists them – there are examples in various Stack Overflow discussions)

ccstone · July 18, 2019, 6:33pm

Hey Rob,

Is that the Atom editor?

-Chris

ComplexPoint · July 18, 2019, 6:58pm

That's right.

Alice_Shi · July 23, 2019, 12:16pm

Oh, It really depends on the Regex engines.

In Expressions app, [\p{Han}[:punct:]] can match both.

But in here in regex101, it doesn't match CJK (double-width) punctuation

Assign all Chinese Character Snippets into Variables

Storing Results of Matches

Chinese Punctuation

About Storing Results of Matches

About Chinese Punctuation

Options