How to clean this tab-delimited glossary?

Try this. The action that does all the work is a script that's a straightforward translation of your instructions into Perl.

Clean ALYB Data.kmmacros (2.7 KB)

Image of macro

The script at the heart of the macro is here:

Perl script
#!/usr/bin/perl

sub addprefix {
	my $string = shift;
	$string =~ s/^(.+?)\((.+?-)\)$/$2$1/;
	$string =~ s/\s+$//;
	return $string;
}

sub leftclean {
	my $string = shift;
	$string =~ s/^([^ -]+)-([^ -]+)( .+)?$/$1 . lc($2) . ";$1-$2$3"/e;
	return $string;
}

sub rightclean {
	my $string = shift;
	$string =~ s/-//g;
	return $string;
}

while (<>) {
	my ($left, $right) = split("\t");
	
	# Process the left side
	$left = addprefix($left);
	$left = leftclean($left);
	
	# Process the right side
	chomp $right;
	$right = addprefix($right);
	$right = rightclean($right);
	
	print "$left\t$right\n";
}
2 Likes