Bash Script Results Are Different When Run From Keyboard Maestro

broskees · November 23, 2021, 7:08pm

So I have the following bash script I wrote:

#!/bin/bash

# check for arguments
if [[ $# -eq 0 ]]; then
	echo 'No URL supplied'
	exit 1
fi

## scan URL for links, 
## follow to see if they exist
#spider_output=$(wget --spider -e robots=off -w 1 -r -p -nd $1 2>&1)
spider_output=$(wget --spider -e robots=off -r -p -nd $1 2>&1)

## if wget exited with an error
## (i.e. if any broken links were found)
if [[ $? -ne 0 ]]; then
    # remove unnecessary text
    broken_links=$(echo "$spider_output" | grep --no-group-separator --before-context 3 'broken link!' | grep --invert-match 'broken link!' | grep --invert-match ':443' 2>&1) # pull only necessary data
    broken_links=${broken_links//HTTP request sent, awaiting response... /} # remove useless text
    broken_links=$(echo "$broken_links" | sed -e 's/--[0-9]*-[0-9]*-[0-9]* [0-9]*:[0-9]*:[0-9]*--  //g' 2>&1) # remove timestamp

    #echo "$spider_output" | grep 'unable to resolve' # print falures to resolve

    # Format results as JSON
    broken_links=($broken_links) #turn into array
    
    declare -i count=0
    declare -a unproc_errors=()
    for word in ${broken_links[@]}
    do
        if [[ "$word" == *"https"* ]]; then
            (( count++ ))
            unproc_errors[$count]="$word";
        else
            unproc_errors[$count]+=" $word";
        fi
    done
    unset unproc_errors[0] # removes the first element (which is empty)

    count=0
    declare -a errors=()
    for item in "${unproc_errors[@]}"
    do
        url=$(cut -d ' ' -f 1 <<< "$item")
        error=$(cut -d ' ' -f 2- <<< "$item")
        errors[$count]=$(echo "{\"error\":\"$error\",\"url\":\"$url\"}")
        (( count++ ))
    done
    errors_json=$(IFS=,; printf %s "${errors[*]}")

    json="{\"site\":\"$1\",\"errors\":[$errors_json]}"

    echo $json

    # exit with error
    exit 1
fi

## otherwise, exit silently with success
exit 0

Which when invoked from command line via bash link_checker.sh SITEDOMAIN | jq I get the following results:

{
  "site": "SITEDOMAIN",
  "errors": [
    {
      "error": "403 Forbidden",
      "url": "SITEDOMAIN/wp/xmlrpc.php?rsd"
    },
    {
      "error": "404 Not Found",
      "url": "SITEDOMAIN/app/plugins/wp-map-block/assets/assets/images/layers.png"
    },
    {
      "error": "404 Not Found",
      "url": "SITEDOMAIN/app/plugins/wp-map-block/assets/assets/images/layers-2x.png"
    }
  ]
}

However, when run with this macro:

I get these results for that same site:

{
  "site": "SITEDOMAIN",
  "errors": []
}

What's going on?

(Also, I saved the results from echo $PATH in terminal to my ENV_PATH variable in keyboard maestro to try and remove any discrepancies between the two environments)

ccstone · November 23, 2021, 10:33pm

Hey Joe,

Double-quote your $KMVAR_URL variable:

"$KMVAR_URL"

If that doesn't help then please post an actual testable macro.

-Chris

broskees · November 24, 2021, 10:13pm

The quotes didn't change anything unfortunately. However, I tried removing jq from the comamnd and echoing out the results of wget specifically into a window and the results I got were interesting:

Spider mode enabled. Check if remote file exists. --2021-11-24 17:08:28-- https://SITEDOMAIN/ Resolving SITEDOMAIN (SITEDOMAIN)... 162.159.134.42 Connecting to SITEDOMAIN (SITEDOMAIN)|162.159.134.42|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Remote file exists and could contain links to other resources -- retrieving. --2021-11-24 17:08:29-- https://SITEDOMAIN/ Reusing existing connection to SITEDOMAIN:443. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] index.html.tmp.tmp: Bad file descriptor Cannot write to ‘index.html.tmp.tmp’ (Success). Found no broken links.

Specifically this portion:

Bad file descriptor Cannot write to ‘index.html.tmp.tmp’ (Success).

I believe this is most likely a permissions issue from KM's side, but I'm not sure where to start in solving this issue.

ccstone · November 24, 2021, 11:58pm

Hey Joe,

If you want help then please post a “working” minimal test-case macro for people to work with.

If you place the burden of reconstructing your work to test it on others you reduce the likelihood that anyone will bother.

Debugging by eye is frequently a lost cause.

A proper test-case macro often conveys subtle details left out of other forms of explanation.

-Chris

broskees · November 26, 2021, 3:15pm

Hey there, sorry about that. I didn't know you could export macros as a file, so I didn't really know what you meant at first. I've attached the macro and the shell script to this comment. These both include the changes I made to debug the last comment. I'm also using ggrep and gsed to ensure use of the linux version of those commands in mac environments. Unfortunately I can't include .sh files to this forum, so I'm just pasting it here.

Find broken links in website.kmmacros (2.6 KB)

#!/bin/bash

# check for arguments
if [[ $# -eq 0 ]]; then
	echo 'No URL supplied'
	exit 1
fi

## scan URL for links, 
## follow to see if they exist
#spider_output=$(wget --spider -e robots=off -w 1 -r -p -nd $1 2>&1)
spider_output=$(wget --spider -e robots=off -r -p -nd $1 2>&1)

echo $spider_output

## if wget exited with an error
## (i.e. if any broken links were found)
if [[ $? -ne 0 ]]; then
    # remove unnecessary text
    broken_links=$(echo "$spider_output" | ggrep --no-group-separator --before-context 3 'broken link!' | ggrep --invert-match 'broken link!' | ggrep --invert-match ':443' 2>&1) # pull only necessary data
    broken_links=${broken_links//HTTP request sent, awaiting response... /} # remove useless text
    broken_links=$(echo "$broken_links" | gsed -e 's/--[0-9]*-[0-9]*-[0-9]* [0-9]*:[0-9]*:[0-9]*--  //g' 2>&1) # remove timestamp

    #echo "$spider_output" | grep 'unable to resolve' # print falures to resolve

    # Format results as JSON
    broken_links=($broken_links) #turn into array
    
    declare -i count=0
    declare -a unproc_errors=()
    for word in ${broken_links[@]}
    do
        if [[ "$word" == *"https"* ]]; then
            (( count++ ))
            unproc_errors[$count]="$word";
        else
            unproc_errors[$count]+=" $word";
        fi
    done
    unset unproc_errors[0] # removes the first element (which is empty)

    count=0
    declare -a errors=()
    for item in "${unproc_errors[@]}"
    do
        url=$(cut -d ' ' -f 1 <<< "$item")
        error=$(cut -d ' ' -f 2- <<< "$item")
        errors[$count]=$(echo "{\"error\":\"$error\",\"url\":\"$url\"}")
        (( count++ ))
    done
    errors_json=$(IFS=,; printf %s "${errors[*]}")

    json="{\"site\":\"$1\",\"errors\":[$errors_json]}"

    echo $json
fi

ComplexPoint · November 17, 2022, 12:16pm

The vanilla shell instance used by Keyboard Maestro inherits no paths or other variables from the completely separate shell instance used by Terminal.app.

So, for example, any specially installed commands (wget for example ?) may run without a full path in Terminal.app, but will need either a full path, or an explicitly set PATH variable, in the KM instance of the shell.

Cold thread - just saw the date

Bash Script Results Are Different When Run From Keyboard Maestro

Options