How to search and replace for characters in variables

I have a text file taken from a database. Each line in effect represents a separate record, and each line contains contains data separated by “,” .

I have worked out in Keyboard Maestro how to:

  1. read the text file into a variable, and
  2. carry out a first search e.g.search for “,” and replace with ;

I now wish to do some further search and replaces, but there are a number of things which I cannot work out: whether I can just search using String Match or Regular Expression for the following:

  1. searching for a character right at the beginning of the text read into a variable (a "), and
  2. searching for a specific character at the end of a line (a ") and replace with a ;, and
  3. searching for a specific character at the beginning of a line (a "), and
  4. searching for a character right at the end of the text (not at the end of the line) and replacing with another character, a ;.

Any help would be very gratefully received.

Looks like you are wanting to do a lot of data cleanup in preparation for getting the actual data you want. RegEx can likely use the source file, and pull the desired data as Capture Groups.

So, if you can describe in detail your ultimate objective, it will be easier for us to help. Chris (@ccstone) is the RegEx guru, but I’ll try to help if he doesn’t jump in soon.

What we need is:

  • Example of source data

  • Example of data you want in the end

Thank you for the response.

Essentially the structure of the data is as follows:

“Data data”,“Data data”,“data data”,“Data data”,“Data data”
“Data data”,“Data data”,“data data”,“Data data”,“Data data”
“Data data”,“Data data”,“data data”,“Data data”,“Data data”
“Data data”,“Data data”,“data data”,“Data data”,“Data data”

And needs to end up as follows (ignore the difference in capitalisation):

Data data;data data;data data;data data;data data;
Data data;data data;data data;data data;data data;
Data data;data data;data data;data data;data data;
Data data;data data;data data;data data;data data;

As I mentioned in my first post replacing “,” with ; is something I can do. It is the other substitutions/removals I cannot figure out.

Doing that is trivial. May I ask what you plan to do with the data next?

BTW, real data is always preferable to made up data, because often the made up data overlooks something important in the real data.

I am exporting the data from an old MSDOS database (DataPerfect) which will then be subsequently imported into another more modern database or used to merge with pro-forma data in a word-processor.

For example:

“Title”,“First Name”,“Middle Name”,“Last Name”,“first address”,“second address”,“Town”,“County”,“Postcode”,“Landline”,“Mobile”,“email”,“date of birth”

of the type of data present in one of the databases.

I can already perform all the search and replaces in Text Wrangler (or any other text editor), it is just looking for the few technical steps to accomplish it with Keyboard Maestro (which will make the whole process much quicker).

Any pointers to deal with this would be very much appreciated.

Hey Victor,

To be blunt – verbal descriptions of text parsing without accompanying valid data samples suck.

Regular Expressions have to be constructed and tested. (Unless they conform to already established recipes.)

So JM is exactly right about what we need.

  • 1 or more data samples.
  • 1 or more examples of the expected outcome.

Think of it this way – a written description of a complex regular expression without good sample data is like a written description of how to assemble a jigsaw puzzle without an accompanying picture of the completed puzzle.

Note that made up dummy data is very prone to causing errors. It is also uninteresting, and you want the interest of people who are trying to help you.

Sorry to be pedantic, but I deal with this kind of communication problem over, and over, and over…

-Chris

Doing that conversion is trivial:

  • Search and Replace " with nothing
  • Search and Replace , with ;

If your data does not contain any other uses of quotes, and any other uses of commas (eg within the quoted text) then that is all you need.

If it does, then that highlights the issue of not including the actual real data you want and the actual real output you want.

If the data is confidential/private, then you obviously can’t post it here, but what you can do is run the search and replace on it to convert every letter or digit to “x”. For this regular expression parsing purposes, it makes no difference what letter or digit is used, so you remove all identifying information and leave the actual problem data and results.

Note that properly parsing Comma Separated Values (CSV) is actually a very difficult task unless your data is restricted.

Thank you for the further replies.

Here is an entirely representative sample of data:

“01046”,“NN9JLJ999”,“Dutch”,“24-07-2015”,“24-07-2025”,“Vurg. van Amsterdam”,“Smith, Frank James”,“Kingdom of the Netherlands”,"759"
“01047”,“999999999”,“Italian”,“10-06-2014”,“09-06-2024”,“Roma”,“Jones, Gail”,“Italy”,"761"
“01048”,“999999999”,“British”,“11-12-2009”,"11-12-2019,“IPS”,“Lewis, Prabhleen”,“United Kingdom of Great Britain and Northern Ireland”,"762"
“01049”,“999999999”,“British”,“28-10-2009”,“28-10-2019”,“IPS”,“James, Justine Rosemary”,“United Kingdom of Great Britain and Northern Ireland”,"764"
“01050”,“C9YM99G99”,“German”,"10-05-2011,“09-05-2021”,“Botschaft, London”,“Lloyds, Gordon James”,“German”,"765"
“01051”,“999999999”,“British”,“16-11-2008”,“16-11-2018”,“FCO”,“Richards, Lark Louise”,“United Kingdom of Great Britain and Northern Ireland”,"766"
“01052”,“AU 999999”,“Polish”,18-11-2008,“18-12-2018”,“konsul w londynie”,“Pilsudski, Malgorzata Anna”,“Republic of Poland”,"767"
“01053”,“999999999”,“British”,“19-10-2007”,“19-10-2017”,“IPS”,“Angelworth, Hamish Gordon”,“United Kingdom of Great Britain and Northern Ireland”,"773"
“01054”,“999999999”,“British”,“24-03-2011”,“24-12-2021”,“IPS”,“Hitchcock, John”,“United Kingdom of Great Britain and Northern Ireland”,"774"
“01055”,“999999999”,“British”,“09-05-2014”,09-02-2025,“IPS”,“Truaffant, William Ian”,“United Kingdom of Great Britain and Northern Ireland”,“775”

And here is what I would like to end up with:

01046;NN9JLJ999;Dutch;24-07-2015;24-07-2025;Vurg. van Amsterdam;Smith, Frank James,Kingdom of the Netherlands;759;
01047;999999999;Italian;10-06-2014;09-06-2024;Roma;Jones, Gail;Italy;761;
01048;999999999;British;11-12-2009;11-12-2019,;IPS;Lewis, Prabhleen;United Kingdom of Great Britain and Northern Ireland;762;
01049;999999999;British;28-10-2009;28-10-2019;IPS;James, Justine Rosemary;United Kingdom of Great Britain and Northern Ireland;764;
01050;C9YM99G99;German;10-05-2011,;09-05-2021;Botschaft, London;Lloyds, Gordon James;German;765;
01051;999999999;British;16-11-2008;16-11-2018;FCO;Richards, Lark Louise;United Kingdom of Great Britain and Northern Ireland;766;
01052;AU 999999;Polish;,18-11-2008,;18-12-2018;konsul w londynie;Pilsudski, Malgorzata Anna;Republic of Poland;767;
01053;999999999;British;19-10-2007;19-10-2017;IPS;Angelworth, Hamish Gordon;United Kingdom of Great Britain and Northern Ireland;773;
01054;999999999;British;24-03-2011;24-12-2021;IPS;Hitchcock, John;United Kingdom of Great Britain and Northern Ireland;774;
01055;999999999;British;09-05-2014;,09-02-2025,;IPS;Truaffant, William Ian;United Kingdom of Great Britain and Northern Ireland;775;

The other databases are entirely consistent with this. As I originally wrote the only difficulty I have is removing the " at the beginning of each line, and replacing the " at the end of each line and replacing it with a ;.

After a little examining of regular expressions (as a complete beginner), in Text Wrangler I can

  1. Use a ‘normal’ search, search for “,” and replace with ;, then
  2. With a Grep search, search ^"(.+)"$ (to remove the "s from the beginning and end of the line and replace with \1;, to get just the end of the line with a ;

which produces exactly what I need.

However applying the same principle to Keyboard Maestro, the regular expression did not work, but only works if I add (?m) to the beginning (after trying to understand (at least for me) cryptic comments on this page: https://wiki.keyboardmaestro.com/Regular_Expressions):

Search: (?m)^"(.+)"$
Replace: $1; (TextWrangler appears to use the \ while Keyboard Maestro appears to use the $ instead).

If I missing anything or doing anything wrong, I would be grateful for comments.

Nope, that is exactly correct. I would use simply three search and replaces:

“,”
(?m)^"
(?m)"$

replace each with nothing.

But your solution is perfectly good.

(?m) is a flag that means "the source text is multi-line. The result of that is that ^ matches at both the start of the text and the start of any line within the text, and that the $ matches at both the end of the text and the end of any line within the text.

And yes, for capture replaces Keyboard Maestro (and others) use $1, TextWrangler (and others) use \1.

Hey Victor,

Okay! Now we have something real to work with.

The quoting in your input data is NOT consistent in places. (e.g Some double-quotes are missing.)

Is this an error YOU introduced, or is this a problem with the original data?

I have changed the input data to be properly quoted in the CSV_TEXT for this example.

BUT it is vitally important to know if the original data is likely to contain quoting errors.

As I suspected there are special needs in the data, and you cannot simply do a rote replace of commas – since there are non-separator commas in the data.

Provided the original data is correctly consistent this job is simple.

If the original data has errors this job is still not too difficult, because understanding the exact structure and composition of the data file makes it relatively easy to repair.

-Chris


Massage CSV Data.kmmacros (5.5 KB)