How Do I Identify Evernote Duplicate Notes & Assign Tags?

I have a great number of duplicates in evernote that I would like to find an delete the extra note. Evernote directs to use View – all notes – sort by title then review the list of notes. I have 30,824 notes. It will take a lot of time to scan the list of notes. I have KM activating Evernote, Showing all Notes and sorting by title. But I do not see how to have KM match the titles and then tag the duplicates as duplicates so that I can see a list of duplicates only.

Any thoughts?

It will require AppleScript to accomplish your task.

Even so, it will take a while (10-60+ minutes) to go through every note and compare titles.
Are the dup titles exactly the same? If there is even one character difference, then an exact comparison will fail.

If we can do an exact compare, then the process is simple.

The titles would be exactly the same. My thinking is that if I can get the duplicates tagged, then I would have the duplicates that I could review without reviewing all notes. I don’t know enough AppleScript to accomplish this.

OK, I’ve been working on some generic EN Mac AppleScript code that will help us here.

Some info I need for the script:

  1. For the initial sort of Notes, we will sort on Note Title, ascending. What 2nd level sort do you want?
  • Date Created, Date Updated?
  • This will determine which Note we assign as β€œoriginal”, and which as β€œdup”
    .
  1. What tags do you want assigned to the dup Notes?
  • You could us something like:
    • β€œdup.1” – for the original
    • β€œdup.2” – for the actual dup
  • This would let you search/filter to get either, or both (using tag:dup.*)
  • But its up to you and how you want your workflow to go
    .
  1. Once the dups have been identified by Title, you could also compare checksums on the body/contents. If the same, the the Notes are truly identical. If so, then the script could just delete the dup, if you want.

I have found a very fast sort engine, which took only ~3 sec to sort 18,000 Notes, so that’s one possible bottleneck avoided. But I’m still not sure how long it will take to do the actual title comparison. I’m working on it.

Date Updated would probably be best. I agree with item 2. #3 would be very helpful. I don’t worry too much about the time it may take. I can let it run overnight it you think its ok to do so.

So you would want to identify the dup with the latest (most recent) Date Updated as the "original" Note?

Since they would be exact duplicates, would it really matter? I can go with either. I thought that if I had updated the note, then I would want that one. But now I realize that if I have undated the note it would not be a true duplicate. So the original should be the original, if I understand correctly.

How are the dup notes created? Manually, or from some import like from email or web clipper?
Obviously, if you start out with two exact dups, and then you update one or both, then they are no longer true dups, even if their Titles are the same.

So, I'm now thinking that the "original" should be the note with the oldest Date Created. But if you prefer different, that is obviously your choice. I'll setup the script so that the date to use is set at the top as a property, making it easy to change.

Good news: An AppleScript colleague has come up with a method for very, very quickly determining the dup items in a list. If this works out as indicated by early testing, it should reduce the time to just a few minutes. I should have something within a day or two.

The originals would be imported. They would likely be PDFs. Also there may be more than one copy of the same PDF. My work flow takes a downloaded PDF or a scanned pdf and runs it through PDFPen for OCR and then its added to Evernote. I have found that the work flow doesn’t always work correctly and multiple copies are imported.

I really appreciate your help.

OK, thanks for the info. That helps.

I have a proof-of-concept script working now. It took only ~4 sec to get 18,000+ Notes, and identify ~300 dup Note sets. Looks very promising.

@1_Hominid, I should have the script ready for you tomorrow. I’m in final testing now.

I hope you don’t mind if I rename your topic title to make the subject a bit clearer:

FROM:
Help with Evernote plase

TO:
How Do I Identify Evernote Duplicate Notes & Assign Tags?

@1_Hominid, the above is still true, but I found a bug in Evernote when creating tags that I'm working on making sure does not affect this script. Barring more bugs, should be ready tomorrow . . . (developers famous last words LOL)

OK, I think the script is finally ready for you to use.
Please let us know if this script/macro works for you.

It actually ran very fast on my iMac-27, with 18K+ Evernote Notes, as you can see from the AppleScript Log, taking on ~39 sec:

I'm sure with 30,000 notes it will be slower for you, but if you have reasonable recent/fast Mac, it should take no more than ~80 sec. But it is best to be prepared for several minutes. I'm make sure that Evernote was the only app running, and of course the KM Engine.

Do make sure Evernote Mac is running, and do a sync, and let it complete, before you trigger this macro.

##example Output
You will get two script prompts to confirm continuing with the script:

####Script Dialog Showing Results
(it will automatically close in 5 sec, but results have been placed on clipboard.

####Open New Evernote Query Window with Results
This shows a Note list filtered by the dup tags:
any: tag:Dup.orig tag:Dup.dup

Your window may appear different. I have manually changed my window to show the Note List on Top, and sorted by Title. Of course you can change the filter (Search actually) anytime, now or later.

###Script Requirements

  • BridgePlus (BPLib) Script Library
    (download at bottom of page)
  • Required to:
    β€’ Flatten Lists
    β€’ Sort Lists
  • Install the file BridgePlus.scptd (from the zip file) into your ~/Library/Script Libraries folder (create the folder if need be)
    • This is a very safe and reliable script library written by the well-known AppleScript guru Shane Stanley

###Script Properties You Can Change
You can find these near the top of the script.

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--- PROPERTIES CHANGABLE BY USER ---
property dupSortBy : "creation date" -- "creation date" OR "modification date"
property dupSortDir : "ASC" -- "ASC" OR "DESC"

--- These Tags will be DELETED At the Start of the Script ---
--    (thus any Notes with these tags will no longer have these tags)
property ptyTagOrig : "Dup.orig"
property ptyTagDup : "Dup.dup"

property ptyMaxDupSets : -1 -- limit the tagging of Dup Sets for testing.  Set to -1 for ALL
property ptyLogDupSets : false
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The KM Macro is very simple -- it has one Action: Execute AppleScript
You can also run this script from the Script Editor app.

##Macro Library   Get Dup Evernote Note List & Assign Tags


####DOWNLOAD:
<a class="attachment" href="/uploads/default/original/2X/6/6ae690296aee6af9024c73aeb93f1cf22acb27cd.kmmacros">Get Dup Evernote Note List & Assign Tags.kmmacros</a> (16 KB)
**Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.**

---

###Use Case

* Identify and tag Duplicate Evernote Notes

---

###ReleaseNotes

* See above
* **Make sure Evernote is running and fully sync'd BEFORE triggering this macro.**

REQUIRES:

* [BridgePlus (BPLib) Script Library](https://www.macosxautomation.com/applescript/apps/BridgePlus.html)
* KM 7.3.1+
* macOS 10.11.6
* Evernote 6.11.1+ (do NOT run using any Evernote BETA).

---


<img src="/uploads/default/original/2X/4/4fab5d97d677ce71e61d9c69c3ce1c31cc4f293f.png" width="619" height="708">

---

###AppleScript 

```applescript
property ptyScriptName : "EN Get Dup Note List & Tag"
property ptyScriptVer : "1.2"
property ptyScriptDate : "2017-08-21"
property ptyScriptAuthor : "JMichaelTX"
(*

PURPOSE: Search All EN Notes to Idenfiy Duplicate Notes by Title,
and assign dup tags to those Notes.

        Dup Note Sets are logged as tags are assigned.
        Upon completion, a new Evernote window is shown,
        filtered to "any:" of the dup tags.

REQUIRED:

  1. macOS El Capitan 10.11.6+
    (may work on Yosemite 10.10.5, but no guarantees)

  2. Mac Applications
    β€’ Evernote Mac 6.9.2+

  3. EXTERNAL OSAX Additions/LIBRARIES/FUNCTIONS
    β€’ BridgePlus (BPLib) Script Library
    BridgePlus Script Library
    β€’ Required to:
    β€’ Flatten Lists
    β€’ Sort Lists

INSTALLATION: See AS:  How to Install AppleScripts or JXA Scripts

*)

use AppleScript version "2.5" -- El Capitan 10.11.6+
use scripting additions
use framework "Foundation"
use BPLib : script "BridgePlus"

property LF : linefeed

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--- PROPERTIES CHANGABLE BY USER ---
property dupSortBy : "creation date" -- "creation date" OR "modification date"
property dupSortDir : "ASC" -- "ASC" OR "DESC"

--- These Tags will be DELETED At the Start of the Script ---
--    (thus any Notes with these tags will no longer have these tags)
property ptyTagOrig : "Dup.orig"
property ptyTagDup : "Dup.dup"

property ptyMaxDupSets : -1 -- limit the tagging of Dup Sets for testing.  Set to -1 for ALL
property ptyLogDupSets : false
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

set scriptResults to "TBD"

try
  --~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  set frontApp to path to frontmost application as text -- use for dialogs
  
  --- GET NOTE COUNT and CONFIRM PROCESSING ---
  
  
  tell application "Evernote"
    set nbList to every notebook
    set numNotes to 0
    repeat with oNB in nbList
      set numNotes to numNotes + (count of notes in oNB)
    end repeat
  end tell
  
  log "Num of Notes: " & numNotes
  set msgStr to "Number of Notes to Process: " & numNotes & LF & LF & Β¬
    "Script will scan ALL Notes to determine duplicate Notes based on Note Title," & LF & Β¬
    "Then you will be asked to confirm assigning tags to these Notes as follows:" & LF & Β¬
    tab & "β€’ Original Note:     " & tab & tab & ptyTagOrig & LF & Β¬
    tab & "β€’ Duplicate Notes: " & tab & ptyTagDup & LF & LF & Β¬
    "This could take between 5 sec and 10 minutes, depending on the number of Notes and the speed of your Mac" & LF & LF & Β¬
    "Click \"Continue\" to Process ALL " & numNotes & " Notes."
  
  if not my continueScript(msgStr) then error "User Cancelled"
  
  set startTime to current application's NSDate's |date|()
  
  --- GET PROPERTIES OF ALL NOTES ---
  
  (*
    β€’ this is much faster than getting a Note Object list, and
        using a repeat loop to get properties.
    β€’ The noteLinkList will be used to get the actual Note object
        when we need to process the dup Note list.
  *)
  
  tell application "Evernote"
    set {noteLinkList, creDateList, modDateList, titleList} to {note link, creation date, modification date, title} of every note of every notebook
    
  end tell
  
  --- CONVERT LIST of LISTS to SINGLE, FLAT LIST ---
  --    (one item per Note) (Requires BridgePlus)
  
  set noteLinkList to my flattenList(noteLinkList)
  set creDateList to my flattenList(creDateList)
  set modDateList to my flattenList(modDateList)
  set titleList to my flattenList(titleList)
  
  
  --- SORT BY Title, Date, Note Link ---
  
  if (dupSortBy = "creation date") then
    set dateList to creDateList
  else
    set dateList to modDateList
  end if
  
  set {titleList, dateList, noteLinkList} to my sortMultiLists({titleList, dateList, noteLinkList}, {"ASC", dupSortDir, "ASC"})
  
  
  ------------------------------------
  --  GET DUP NOTE LIST --
  ------------------------------------
  
  set {dupNoteList, dupNoteCount} to my getDupItemList(titleList)
  
  set elapTime to (-(round ((startTime's timeIntervalSinceNow()) * 100)) / 100.0)
  log ("Time to Get Dup Note List: " & elapTime)
  
  ------------------------------------
  --  ASSIGN TAGS TO DUP NOTES --
  ------------------------------------
  
  log "Num of Dup Sets: " & dupNoteCount
  set msgStr to "Number of Duplicate Note Sets to Assign Tags to: " & dupNoteCount & LF & Β¬
    "This make take between 10 sec and 10 minutes to complete."
  
  if not my continueScript(msgStr) then error "User Cancelled"
  
  set startTime to current application's NSDate's |date|()
  
  tell application "Evernote"
    
    --- CREATE TAGS IF NEED BE, OR DELETE if They EXIST ---
    -- MUST sync before/After due to EN Mac BUG
    
    my sync()
    
    --- DELETE TAGS IF THEY EXIST ---
    set syncNeeded to false
    
    if ((tag named ptyTagOrig exists)) then
      delete tag ptyTagOrig
      log "Tag Deleted: " & ptyTagOrig
      set syncNeeded to true
    end if
    if ((tag named ptyTagDup exists)) then
      delete tag ptyTagDup
      log "Tag Deleted: " & ptyTagDup
      
      set syncNeeded to true
    end if
    
    if (syncNeeded) then my sync()
    
    --- CREATE TAGS ---
    
    if (not (tag named ptyTagOrig exists)) then
      make tag with properties {name:ptyTagOrig}
      log "Tag CREATED: " & ptyTagOrig
    end if
    if (not (tag named ptyTagDup exists)) then
      make tag with properties {name:ptyTagDup}
      log "Tag CREATED: " & ptyTagDup
    end if
    
    my sync()
    
    set iDupSet to 0
    
    ----------------------------------
    repeat with oDup in dupNoteList
      --------------------------------
      
      --- EXIT Repeat IF NOT All DupSets AND Max DupSets Have Been Processed ---
      if ((ptyMaxDupSets β‰  -1) and (iDupSet β‰₯ ptyMaxDupSets)) then exit repeat
      
      set iDupSet to iDupSet + 1
      if (iDupSet mod 10 = 0) then -- display notify every 10 dup sets
        set msgStr to "Processing Dup Set #" & iDupSet
        set msgTitleStr to ptyScriptName
        display notification msgStr with title msgTitleStr sound name "Tink.aiff"
      end if
      
      set noteTitle to item 1 of oDup
      if (ptyLogDupSets) then log "DupSet: " & iDupSet & tab & noteTitle
      
      ------------------------------------------
      repeat with iNL from 2 to (count of oDup)
        ----------------------------------------
        
        set noteLink to item (item iNL in oDup) in noteLinkList
        if (iNL = 2) then
          set tagStr to ptyTagOrig
        else
          set tagStr to ptyTagDup
        end if
        
        set oNote to find note noteLink
        assign tag tagStr to oNote
        
      end repeat
      
    end repeat
  end tell
  
  set elapTime to (-(round ((startTime's timeIntervalSinceNow()) * 100)) / 100.0)
  log ("Time to Assign Tags to Notes: " & elapTime)
  
  set scriptResults to "OK" & LF & "SUCCESS!" & LF & numNotes & " Notes were Processed and Checked for Dups" & LF & Β¬
    dupNoteCount & " Dup Note Sets were found." & LF & Β¬
    "Tags were assigned as follows:" & LF & tab & "β€’ Original Note: " & ptyTagOrig & LF & Β¬
    tab & "β€’ Dup Notes: " & ptyTagDup
  
  set the clipboard to scriptResults
  
  
  display dialog scriptResults & LF & Β¬
    "(copied to clipboard)" with title ptyScriptName Β¬
    buttons {"OK"} Β¬
    default button Β¬
    "OK" with icon note Β¬
    giving up after 5
  
  tell application "Evernote"
    activate
    
    set enQuery to "any: tag:" & ptyTagOrig & " tag:" & ptyTagDup
    set oWin to open collection window
    set query string of oWin to enQuery
    
  end tell
  
  my sync()
  --~~~~~~~~~~~~~ END TRY ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
on error errMsg number errNum
  
  if errNum = -128 then ## User Canceled
    set errMsg to "[USER_CANCELED]"
  end if
  
  set scriptResults to "[ERROR]" & return & errMsg & return & return Β¬
    & "SCRIPT: " & ptyScriptName & "   Ver: " & ptyScriptVer & return Β¬
    & "Error Number: " & errNum
end try
--~~~~~~~~~~~~~~~~END ON ERROR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- RETURN THE RESULTS TO THE KM EXECUTE SCRIPT ACTION ---
return scriptResults

--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--    HANDLERS (functions)
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

on sync()
  local msgStr, msgTitleStr, isSync
  
  tell application "Evernote"
    set msgStr to "Waiting on EN Mac SYNC to Complete"
    set msgTitleStr to "Synchronize EN Mac"
    display notification msgStr with title msgTitleStr sound name "Tink.aiff"
    synchronize
    
    set isSync to isSynchronizing
    repeat while isSync
      delay 0.1
      set isSync to isSynchronizing
    end repeat
  end tell
  
  set msgStr to "Sync COMPLETE!"
  display notification msgStr with title msgTitleStr sound name "Tink.aiff"
  
end sync


--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
on getDupItemList(pSourceList)
  (*  VER: 1.2    2017-08-14
---------------------------------------------------------------------------------
  PURPOSE:  Get a List of Dup Items, with indexes, found in Source List
  PARAMETERS:
    β€’ pSourceList   ┃ text  ┃ Source List to search for duplicate items (exact match)
    
  RETURNS:  List of Lists  ┃ Each Item in main list is list with these items:
                β€’ text       ┃ Source items which had dups
                β€’ integer    ┃ Index of first item in Source List
                β€’ integer    ┃ Index of second item in Source List
                β€’ [integer ┃ additional items for each dup found, one item per dup]
                EXAMPLE:
                  { Item in Source List,
                    Index in Source List to first dup
                    Index in Source List to 2nd dup
                    . . .
                    Index in Source List to nth dup }
                    
                  { "11.16.2011 [WED] Daily Notes", 
                    239, 
                    240, 
                    241 }, 
                  { "15-Minute Retirement Plan | Fisher Investments | Jul 12, 2012.pdf", 
                    279, 
                    280 }, 
                    . . .
                    nth Dup Set
                  
  AUTHOR:  JMichaelTX refactored script by Shane Stanley
  
  REQUIRES:
    β€’ macOS 10.11.6+
    β€’ use framework "Foundation"
  REF:
    1.  Shane Stanley, 2017-08-13
           http://lists.apple.com/archives/applescript-users/2017/Aug/msg00053.html

β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”
*)
  Β¬
    local time1, theCount, countedDupes, duplicatedValues, dupItemList, thisValue, thisIndex, thisInfo, startTime, elapTime, dupItemCount, msgStr, msgTitleStr
  
  set startTime to current application's NSDate's |date|()
  
  set pSourceList to current application's NSArray's arrayWithArray:pSourceList
  set theCount to pSourceList's |count|()
  -- get a counted set of the duplicate instances of any duplicated values
  set countedDupes to current application's NSCountedSet's setWithArray:pSourceList
  countedDupes's minusSet:(current application's NSSet's setWithSet:countedDupes)
  
  
  -- get the indices of the duplicated values' first and dupe instances
  
  -- USE THIS for NO SORT --
  set duplicatedValues to countedDupes's allObjects()
  
  ---  USE THIS to SORT on Source Item ---
  ### NOW REPLACED by BPLib Sort at Bottom
  
  ###  set duplicatedValues to countedDupes's allObjects()'s sortedArrayUsingSelector:"compare:"
  
  set dupItemList to {}
  repeat with thisValue in duplicatedValues
    -- Value and first index.
    set thisIndex to (pSourceList's indexOfObject:(thisValue)) + 1
    set thisInfo to {thisValue as text, thisIndex}
    -- Indices of dupes.
    repeat (countedDupes's countForObject:(thisValue)) times
      set thisIndex to (pSourceList's indexOfObject:(thisValue) inRange:({thisIndex, theCount - thisIndex})) + 1
      set end of thisInfo to thisIndex
    end repeat
    set end of dupItemList to thisInfo
  end repeat
  
  ### ADD BPLib SORT of RESULTS on First Index (Item 2) ###
  --   This sorts the results in the same order as the Source List
  
  set dupItemList to BPLib's sublistsIn:dupItemList sortedByIndexes:{2} ascending:{true} sortTypes:{}
  
  set elapTime to (-(round ((startTime's timeIntervalSinceNow()) * 100)) / 100.0)
  set dupItemCount to count of dupItemList
  
  set msgStr to ((dupItemCount as text) & " Dup Items found in " & elapTime as text) & " sec"
  set msgTitleStr to "getDupItems() Handler"
  display notification msgStr with title msgTitleStr sound name "Tink.aiff"
  
  return {dupItemList, dupItemCount}
  
end getDupItemList
--~~~~~~~~~~~~~~~ END OF handler getDupItemList ~~~~~~~~~~~~~~~~~~~~~~~~~


on continueScript(pMsgStr)
  
  beep
  
  display dialog pMsgStr Β¬
    with title ptyScriptName Β¬
    buttons {"Stop", "Continue"} Β¬
    default button Β¬
    "Continue" with icon caution
  
  set buttonStr to button returned of result
  
  if (buttonStr = "Continue") then
    set continueBol to true
  else
    set continueBol to false
  end if
  
  return continueBol
  
end continueScript



on flattenList(pList)
  set flatList to BPLib's listByFullyFlattening:pList
  return flatList
end flattenList


on sortMultiLists(pListOfLists, pSortDirList)
  (*
    REQUIRES:
      use framework "Foundation"
      use BPLib : script "BridgePlus"
  *)
  local rowsList, listCount, sortByList, iL, oSort
  
  --- Setup the Sort ---
  set rowsList to BPLib's colsToRowsIn:pListOfLists
  
  set listCount to count of pListOfLists
  set sortByList to {}
  
  --- Sort Order by List as Passed in pListOfLists ---
  
  repeat with iL from 1 to listCount
    set end of sortByList to iL
  end repeat
  
  --- Convert Text Sort Direction to Boolean ---
  --   (true means ascending)
  
  repeat with oSort in pSortDirList
    set contents of oSort to ((oSort as text) starts with "ASC")
  end repeat
  
  --- Do the Sort
  set rowsList to BPLib's sublistsIn:rowsList sortedByIndexes:sortByList ascending:pSortDirList sortTypes:{}
  
  --- Get Sort Results ---
  set pListOfLists to BPLib's colsToRowsIn:rowsList
  
  return pListOfLists
end sortMultiLists

```

It ran great!!!

OK
SUCCESS!
29285 Notes were Processed and Checked for Dups
1233 Dup Note Sets were found.
Tags were assigned as follows:
β€’ Original Note: Dup.orig
β€’ Dup Notes: Dup.dup

Thanks a lot.

1 Like

That's good to hear. Do you remember how long it took?

I didn’t time it, but probably about 10 minutes.