A common task in scientific writing is adding scientific names after the English name. This is a highly laborious task with a lot of room for error. It seems remarkable that nobody has come up with software to do the task β unless I am missing something?
I am wondering if a macro could achieve it? I have a Word document that is peppered with English bird names and I need to add scientific names that are contained in a separate document. Can anyone provide code for a macro that will read the file containing the bird names/scientific names and have the macro insert the scientific after the first instance of the bird name.
Here are some rules the macro needs to follow:
add the scientific name after the first mention of each English name in the main text
scientific names need to be in italics
once a genus (the first of the two scientific names) has been mentioned in the main text, the genus needs use the first initial
any mention of the English name in the bodies of tables always needs full scientific name adding
in figure captions, each English name needs scientific adding, but as with the main text, once a scientific has been used it does not need repeating and once a genus has been mentioned then only the first initial should be be used
May I give a second opinion? CP's advice is fine, but I sometimes see other ways to solve problems.
It is possible to do this in KM, fairly easily, since KM can manipulate Word documents by using Word's menus. The problem as I see it is you need to have a valid set of data which KM can use indicating every possible common/scientific name of everything that could appear in the text. Are you willing to show a piece of that data file? Is it a text file? You probably need to convert it to a text file for KM to be able to do this. For example, this text would probably be manageable:
Coyote,Hungrius Animus
Roadrunner,Fastus Birdus
What I would envision is a loop over each line of the text variable containing these relationships and then using a KM action to replace the first item in the line (ie, the item before the comma) and insert the second item in the line (the item after the comma), in italics, after it. Inserting a character string in italics is the only tricky part I can see, but it's possible.
I don't have Word so I can't write the code for you, but I could write similar code that could work with Pages, and you could adapt it to work with Word.
Here's a hastily thrown together demonstration of how this task might be tackled using KM β a possible beginning and proof of concept.
It could probably be simplified allot, but hereβs a take based of an 80% finalised conditionally formatting macro I had lying around (that I really should finalise soon). This task can probably be tackled in many different ways, and especially the large group towards the end, the "Initialise repeated occurrences of genus"-group, (attempting to comply to your third rule), seems overly complicatedly built to me β there must be ways to simplify this stage. (Possibly by doing it already when the scientific names where added in the first place? But then the macro would have to be built the other way around, for the scientific names to somehow be added in the order they appear in the text, instead of, as they are now, in the order of the list of English and scientific name pairs.) β But this is anyways all I had time to throw together today before leaving on a small vacation.
Further, I have found out that Pages documents (more than probably also .docx's, but I do not have Word installed) can contain elements (images for one) that seem to create a mismatch between the matching range given by the For Each, and the substring returned by the Get Substring action. I have not gotten fully on top of what elements and/or characters that are handled differently between these two actions, so it as of now difficult to know how this macro performs with your Word-documents. I guess the missmatch have something to do with some level of formatting getting lost somewhere between these two actions. And it should probably be possible to find out what these special characters and elements are, and somehow search for them (within the addScientificName_beforeInsertion clipboard), count them, and compensate for them in the local__positionOfInsertion variable calculation.
Lastly, this macro is not stress tested, and not perfected in any way. It is really messily built; probably have loads of misspellings (English is not my first language); it does not comply to your last two rules; and there are probably loads of cases where the logic breaks and it inserts scientific names wrongly or not at all. This macro is merely a possible proof of concept, a starting point β But additional logic can always be added until it meet all of your rules, and tackle most edge cases β If this seems like a route you'd like to go further with.
EDIT: Added more species of same genus to the sample text EDIT2: Noticed now that your first rule asked for the scientific name only to be added after first mention in the text. Makes sense. Added a break from loop to comply to this. (Probably then not necessary to loop at all, but the method I use in this macro relies on the match range given by the For Each.) EDIT3: Rephrased something to make it slightly clearer how Iβve thought.
That's a great effort. Below is a paragraph of my actual text, and below that, the output from the macro. It has worked well on this short paragraph. I will continue to test and check for any problems.
Actual text
I have categorised the distances that North American landbirds normally migrate in the Americas into three broad categories. The first are long-distance migrants which migrate between North and Central or South America such as Blackpoll Warbler, Swainson's Thrush and Yellow-billed Cuckoo, which migrate several thousands of kilometres each season. Species in the second category are medium-distance migrants that migrate between North and Central America, or occasionally between northern and southern North America, such as Grey Catbird, Hooded Warbler and Tree Swallow. The third category consist of short-distance migrants that often migrate within North America, if at all, such as Brown Thrasher, Eastern Towhee and Brown-headed Cowbird. Unsurprisingly, most North American landbirds that have reached the Western Palearctic are long-distance migrants, and this group accounts for 51% of species to have reached the region. Of the remaining species, around 40% are medium distance migrants and 10% are short-distance migrants. However, divisions between the three categories are fuzzy, and it is not always straightforward pigeon-holing a species into a single category.
Output from macro
I have categorised the distances that North American landbirds normally migrate in the Americas into three broad categories. The first are long-distance migrants which migrate between North and Central or South America such as Blackpoll Warbler Setophaga striata, Swainson's Thrush Catharus ustulatus and Yellow-billed Cuckoo Coccyzus americanus, which migrate several thousands of kilometres each season. Species in the second category are medium-distance migrants that migrate between North and Central America, or occasionally between northern and southern North America, such as Grey Catbird Dumetella carolinensis, Hooded Warbler S. citrina and Tree Swallow Tachycineta bicolor. The third category consist of short-distance migrants that often migrate within North America, if at all, such as Brown Thrasher Toxostoma rufum, Eastern Towhee Pipilo erythrophthalmus and Brown-headed Cowbird Molothrus ater. Unsurprisingly, most North American landbirds that have reached the Western Palearctic are long-distance migrants, and this group accounts for 51% of species to have reached the region. Of the remaining species, around 40% are medium distance migrants and 10% are short-distance migrants. However, divisions between the three categories are fuzzy, and it is not always straightforward pigeon-holing a species into a single category.
I have to say, though is not addressing your question in any way, that personally I wouldn't trust a macro to do this properly. My own practice is to do it from the get go so you can be on top of it. I also avoid quite often common names anyway because they are often inaccurate and defeat the whole point of scientific naming. It is also true that huge numbers of organisms now don't have any common names anyway. Birds are totally the outlier here, mammals too I suppose.
This would take us away from discussing the software here of course. I do have some scientific names on snippets on Keyboard Maestro.
But doing it as I go rapidly gets very messy, especially when dealing with dozens and possibly hundreds of scientific names.
Say I have given the English name and scientific name, and then I edit the text and give English name higher up the document. I then need to delete the scientific name lower down and move the scientific name to the first instance. On top of that, I then need to figure where the genus name is given and the start initialising genus names. If this sounds confusing, that's because it is!
That's a new requirement, which you didn't mention in your original post. Even so, that's not too hard.
I don't know what that means.
My solution would be two macros. One that deletes all the genus names, and the other that inserts all the genus names after the first occurrence of the English name.
Fact is I don't know what your needs fully are. Often there are low tech solutions and changing the structure of what you are writing. It sems to me you could mention individual species less? Especially given that you saying the categories are 'fuzzy'. Papers with endless list of obscurely named bacteria for example, which I see quite often, are not that informative and I have known people just give a straight list somewhere, even as an appendix as if anybody will read it.
The other problem, which macros will not help with either, that some people are now getting very fussy about formats and styles that you might have to conform too. However these aren't Keyboard Maestro solutions so I will stop there.
I was considering saying that he could switch to MS Word, which has better features, like Indexes, but I can't see anything in that product that would do what he wants either.
Try doing it the other way round. Put all your scientific terms in with the abbreviated format. Continue working on your document. When you've just about finished replace only the first abbreviated term with the full version.
If you need to "reset" you can abbreviate all your terms again, move things around, then re-do your first-occurrence expansion.
Honestly -- this is a minefield and you are very unlikely to get it completely right. That's where editors earn their pay!