Mouse/Screen Coordinates – Simulating Click on ChatGPT Send Button

Roy_McCoy · July 6, 2024, 2:44am

I'm lumping a couple of questions together here because they all relate to the same desired macro – hope that's okay.

The general problem is that when a ChatGPT window is narrow beyond a certain point (somewhere between 50 and 60% on my 13" MacBook Air, I think), for some reason hitting return doesn't send the query as it should, but gives me an undesired return in the text field. I've collaborated for hours with ChatGPT on this, but don't as yet have something that works.

One clunky thing we tried today, triggered by the return key, was getting the size and position of the front browser window, expanding it to full screen to have it always the same size and position when clicking on the Send button, and then restoring the original window size and position. This was impractical (aside from not working), as there would be a delay in any event and particularly when delays were inserted as needed in the AppleScript.

So I'm ready to dump this approach and try something else, but I'm curious about the click and would like to get it to work even though I don't plan on using this version of the macro in any event. Here's the AppleScript with my best stab at the needed click percentages:

-- Use shell script to get screen dimensions
tell application "System Events"
set screenResolution to do shell script "system_profiler SPDisplaysDataType | awk '/Resolution:/{print $2, $4}'"
set spacePos to offset of " " in screenResolution
set screenWidth to text 1 through (spacePos - 1) of screenResolution
set screenHeight to text (spacePos + 1) through end of screenResolution
end tell

-- Define the coordinates for the "Send" button, ensure these are correct
set clickX to screenWidth * 0.82 -- calculated using cmd-shift-4/KM coordinates
set clickY to screenHeight * 0.93 -- calculated using cmd-shift-4/KM coordinates

-- Manipulate Brave Browser window
tell application "System Events"
tell application process "Brave Browser"
set frontWindow to the first window
-- Get the current position and size of the window
set {windowX, windowY} to position of frontWindow
set {windowWidth, windowHeight} to size of frontWindow

	-- Maximize the window for consistent button location
	set position of frontWindow to {0, 0}
	set size of frontWindow to {screenWidth, screenHeight}
	
	-- Wait for the UI to update
	delay 1
end tell

-- Activate Brave Browser
tell application "Brave Browser" to activate

-- Simulate the click on the Send button
tell application process "Brave Browser"
	delay 2 -- Additional delay to ensure Brave is active and ready
	click at {clickX, clickY}
	
	-- Restore the original window size and position
	delay 1
	set position of frontWindow to {windowX, windowY}
	set size of frontWindow to {windowWidth, windowHeight}
end tell

end tell

Both cmd-shift-4 and KM gave me about the same coordinates for the Send button {895, 956} and the lower right-hand corner of the screen {1470, 956}. I'm ready to give up on getting the click since I think my calculation was correct or should have been, but I'm still curious as to why the cmd-shift-4/KM coordinates were so incongruous with the results of the shell script (2560 x 1664). Maybe someone can explain this to me before I go on to the other approach.

DanThomas · July 6, 2024, 10:45am

I admit I didn't read through your question thoroughly, but here's one thought about the coordinates:

You probably have a Retina display, which probably has a DPI of 144 instead of the usual 72 (or something like that). In this case, KM uses "nominal" coordinates, which are half of the actual coordinates.

As for clicking the button, I think you might be able to do it through its XPath, but I'm not an expert on how to do this.

noisneil · July 6, 2024, 3:05pm

Are you using Chat GPT in a browser? If so, my suggestion would be to use MacGPT instead.

Nige_S · July 6, 2024, 3:17pm

I assume that's ChatGPT the bot, and not an actual support person, because...

It's a bit route one, but try using ⌘-Return instead. In Safari, at least, that works whatever the window size.

(And to think that I created a ChatGPT account just to find that out...)

Roy_McCoy · July 6, 2024, 7:57pm

DanThomas:

You probably have a Retina display, which probably has a DPI of 144 instead of the usual 72 (or something like that). In this case, KM uses "nominal" coordinates, which are half of the actual coordinates.

Yes, I have a Retina display and your explanation suffices, thanks.

As for clicking the button, I think you might be able to do it through its XPath, but I'm not an expert on how to do this.

I'll look into this, though I don't presently know anything about XPath. Thanks again.

Roy_McCoy · July 6, 2024, 8:15pm

noisneil:

Are you using Chat GPT in a browser? If so, my suggestion would be to use MacGPT 5 instead.

I looked at it, thanks, but will be happy with doing ChatGPT in a browser if I can get the queries to send without having to manually click on the Send button, and aside from that the last thing I need is yet another thing trying to get a place on my menu bar. (Boy, do I ever despise the notch.) I was nonetheless ready to give MacGPT a spin when I saw " Try out MacGPT Today" on the page you linked to, but it turned out a demo version is not being offered as suggested. I downloaded his GeePeeTee for my phone along with two other GPT apps including OpenAI's, but use my phone very little and may not get much if any use out of these.

Roy_McCoy · July 6, 2024, 8:42pm

Nige_S:

I assume that's ChatGPT the bot, and not an actual support person, because...

Yes, it was the bot. I'd probably say I chatted with OpenAI if I got to a person.

It's a bit route one, but try using ⌘-Return instead. In Safari, at least, that works whatever the window size.

I'll try that right now. [...] It works, thanks! I'll happily go with this, since the reply to my first ChatGPT query about using XPath to click on a button recommended the use of Selenium and a Python shell script to obtain the desired result, and I'm not ready for that.

(And to think that I created a ChatGPT account just to find that out...)

Despite all the bad press ChatGPT has received (along with AI in general), I've wound up very pleased and impressed by it. I've generally asked myself, why bother with a Google search or a Wikipedia article (especially given the reasons to hate both) if I just want an answer to a question? I even subscribed at $20/month, though I may not continue that if I try going back to the free tier and it turns out to be adequate for my needs. Anyway, I suggest you too may wind up liking it more than you expect.

noisneil · July 6, 2024, 8:54pm

FWIW, I don't use it in the menu bar; I use it via the windowed app. It's free as far as I know, unless he's started charging for it, which he explicitly said in his mailouts that he wasn't planning to do.

griffman · July 6, 2024, 10:21pm

Sure looks like he's charging now:

-rob.

AaronLA · July 6, 2024, 10:35pm

Did you try clicking the button using xpath? I just tried inside my chatGPT account and it worked no problem.

Add a new action "Click Google Chrome Link" and enter this path

/html/body/div[1]/div[1]/div/main/div[1]/div[2]/div[1]/div/form/div/div[2]/div/div/button

See screenshot:

screenshot-2024-07-06-at-3-33-46-pm

Roy_McCoy · July 7, 2024, 5:25am

It works in Chrome but not in Brave, and even if it worked in Brave I'd run into other problems involving the use of the return key. So I'll drop this venture and just hit cmd-return when I want to send a ChatGPT query, which was proposed to me and which works. But thanks – I may come back to this XPath thing later.

noisneil · July 7, 2024, 5:59am

Ah.

I have no problem clicking via XPath in Brave with the Click Link in Front Browser action.

Roy_McCoy · July 7, 2024, 6:15am

That works, thanks. But now we're back to the same problem I had when I was trying to click on the button using coordinates. The basic idea was that I wanted to get the return key to send the query when the browser window wasn't sufficiently wide. But then I'm back at wanting the button click only when I'm in a ChatGPT tab. This is what I was going around and around with with the ChatGPT bot. I want to put the click in an If Then Else action so it occurs only under that given circumstance. This seems like something I should be able to do, but I haven't been able to do it.

noisneil · July 7, 2024, 6:24am

Try this at the start of your macro:

DanThomas · July 7, 2024, 12:15pm

The problem with that is it will eat the hotkey,

You should put the macro in its own group, then set the group like this:

noisneil · July 7, 2024, 12:59pm

I use a Stream Deck so much that sometimes forget about hotkeys getting eaten. It's easily accounted for by simulating the hotkey in the else section, but I agree that it's less elegant than your suggestion.

DanThomas · July 7, 2024, 1:13pm

Unless something's changed, and I don't believe it has, this is not a good idea. You can't guarantee it will work.

Paraphrasing the the wiki:

If you try to pass on the hotkey by "typing" it in the macro, what happens "will vary depending on many unpredictable factors though Keyboard Maestro will try to ensure no macro is triggered in response to its own typing."

Many of us old-timers on the forum have considered this one of those "don't try this at home" kinds of things, but YMMV, IANAL, YOLO...

Roy_McCoy · July 7, 2024, 1:45pm

noisneil:

Try this at the start of your macro:

Thanks, but then I don't get the return if it's not a GPT window or tab – as I now see DanThomas has noted.

DanThomas:

You should put the macro in its own group, then set the group like this:

I never noticed these Available in all windows / Enabled when a focused window [...] options, and I wonder how far back in the KM history they go. If all the way, I'm the more embarrassed.

That said, my KM Groups panel is as crowded as my (undesiredly notched, grr-r) menu bar, so I'll put this condition in the macro itself if I can, which I assume is possible even though I think GPT and I were trying this and it wasn't working somehow. (This has been, for a while now, one of those cases where much more time and trouble is put into elaborating a working macro than is saved by using it, but I'm sure everyone here knows the pleasure of finally getting it going whatever it takes. I'm hoping to visit Peter Lewis in heaven, by the way, as I'm sure he's going to have an absolutely fabulous pad there.) [...] But no, the AI chat doesn't include anything about the KM FrontBrowserURL token, so I think that's my salvation.

And it is. Here's my macro. Thanks, everybody!

Brave - Click on ChatGPT Arrow.kmmacros (3.4 KB)

I now see DanThomas has apparently disapproved this saying you can't guarantee the simulated key will work, but it's working for me here so I'm not going to worry about it unless at some point it doesn't.

Just one question: GPT affably asked me to send it the macro that worked, and I again suffered having forgotten how to export the macro in text-only format. So I manually typed it thus:

If all of the following are true:
The text:
%FrontBrowserURL%
contains chatgpt.com
execute the following actions
Click Link: /html/body/div[1]/div[1]/div/main/div[1]/div[2]/div[1]/div/form/div/div[2]/div/div/button
otherwise execute the following actions
Simulate keystroke: Return

I know I used to be able to do this, but I can't remember how and it doesn't appear to be provided for in the KM File > Export options. What it may be is that in those cases I just copied the actions in the macro and this worked because they were all simple one-step actions, and it didn't work now because the If Then Else action is more complex. But if there's some way to get the entire text of such an action/macro I'd like to know what it is. Thanks again.

DanThomas · July 7, 2024, 1:57pm

Then feel embarrassed - they've been there as long as I can remember.

I have 184 groups. Get used to it. For macros that are available in only specific apps, I name the group the name of the app, to make it easier to find.

Hence, my recommendation to not do this.

Anytime you try to force KM to do something it doesn't want to do, you're asking for trouble. Seriously, there's no downside to putting it in its own group. What's one more group? (Which is why I have 184 groups ).

@peternlewis, feel free to jump in here, if you've got the time (I know how busy you are.)

Copy as Text.

noisneil · July 7, 2024, 2:09pm

I'm sure you're right, and I can imagine there are circumstances where it might pose problems, but in my personal experience, it's never not worked, or triggered any errant macros. I see it a bit like using found images; it's never my first choice, but other considerations can sometimes lead you to it. I think, all things considered, I would feel the same as @Roy_McCoy and prefer not to create yet another macro group to house a single macro, if the simulated hotkey is working as intended.

Mouse/Screen Coordinates – Simulating Click on ChatGPT Send Button

Options