Operator from OpenAI

OpenAI just introduced Operator

Using its own browser, it can look at a webpage and interact with it by typing, clicking, and scrolling... Operator is one of our first agents, which are AIs capable of doing work for you independently—you give it a task and it will execute it... Operator can “see” (through screenshots) and “interact” (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations.

Up to now, I have been delighted if I can record my actions and play them back (e.g. QuicKeys, CoScripter), or create scripts by hand that automate tasks (e.g. QuicKeys and Keyboard Maestro).
Operator promises to leap ahead and simply perform a task when you give it a description, without even having to demonstrate the task once!
This is a new and amazing world that is coming into existence.

Sounds dangerous – the functional correctness of LLM-generated code is tailing off at barely better than a coin toss. They model syntax, but their suppliers essentially cross their fingers and hope to get lucky, some of the time, on meaning.

"Leap ahead and simply perform" would quickly end in tears.

See the Wolfram LLM Benchmarking Project

2 Likes

PS we've been through this cycle before:

but not before damage had been done, to Apple and to others.


Look before you leap, and always go easy on the Kool-Aid.

1 Like

Experience has shown that most users can barely articulate what they want to do, never mind describe it well enough that an Operator can take over. So expect a lot of false starts and clarifications -- they'll have to explain a dozen times to avoid "having to demonstrate the task once".

And note that Operator is (currently) limited to web sites/web apps -- you won't be using it to drive your laptop anytime soon.

Not knocking it, nor their ambition -- if it can do half the things claimed it will be pretty amazing. And they've at least considered "Safety and privacy" this time round!

Understandably – questions are built from concepts, and until there's been enough experimentation to acquire the relevant concepts, a clear and relevant question is very hard to frame.

( once the concepts are in place, the questions are often no longer needed ... )


but LLMs form no concepts all – just the statistics of word distributions.

2 Likes

For a mere $200/month you too can be part of the beta experience.

No thanks.. I got pulled into funding the Tesla Autopilot beta years ago, not falling for this again.

1 Like

I'm curious: does Operator limit itself to pubic macOS APIs, or does it use undocumented APIs also?

It uses its own browser and does not operate outside of it, so it'll make use of whatever functionality they've built into their own browser and so aren't necessarily dependent on any public APIs

1 Like

Unleashing blind parroting on either of those would be far too risky.

( A recipe, in the aftermath, for huge class actions )

A browser is, by design, a sandbox without any access to the host system.

I wanted him to think about it. I'm not worried about you.

1 Like

:slight_smile: either route leads to the same place.

A much bigger story for OpenAI's prospects today (2025-01-27) is a big setback to the pitch which they have been making to investors – that more spending leads to better outcomes for AI models.

Nvidia's (related) share-price – which had depended on an assumption of huge future demand for chips, driven by deep learning – went over a slightly alarming cliff.

See: Chinese AI startup DeepSeek is threatening Nvidia's AI dominance | Fortune

( DeepSeek seems to be getting competing LLM results for what looks like 3%-5% of the cost – not good news for OpenAI investors )


But stepping back, I'm not sure how this thread got into Questions and Suggestions in the first place ...

( seems more like a puff for someone else's product )

(now moved to Outback Lounge Update – left here on the grounds that the OP can't see the Outback Lounge section)


And in the context of a $1 trillion market panic, all bets are off for product rollouts and roadmaps:

DeepSeek buzz puts tech stocks on track for $1.2 trillion drop - The Economic Times

a significant drop in global tech stocks,
raising doubts about the high valuations of AI-driven companies

1 Like

Well, the Outback Lounge isn't accessible to all so you might want to move it back to where the OP can see it perhaps...

1 Like

With no disrespect to the OP, this thread is undeniably off-topic in "Questions & Suggestions", and if an alternative topic area is not provided, there are reasons for that. I hope the community leaders will treat this is a "one-off" exception.

2 Likes

If it sounds too good to be true, it probably is.

What seems like magic come with limitations that do not get reveal until one use it.
You can take a look at this article:

I looking forward to the day when the operator can work alongside with third party like Keyboard Maestro as it really make sense because there is no foolproof way for such automation to cover all cases (eg edge case).

1 Like

Who needs OpenAI anymore, now "we" have DeepSeek?

1 Like

Perhaps people who want a less... shall we say "regulated" -- response? See this (free) Guardian article. Such people might also want to review the privacy policy.

Similar applies to other services, of course. But, for some reason, certain governments that are quite happy letting (western) corporations (ab)use our data get rather twitchy when China is involved. So I suspect that the answer to "who needs OpenAI anymore" will be "those who aren't allowed to use DeepSeek" -- at least until more "acceptable" concerns start deploying the R1 model themselves.

1 Like

I wouldn't want to use DeepSeek. All of its political views are approved by the Chinese Communist Party. Congress will probably debate banning it from America for being a "Chinese weapon."

Statement from the last sentence was probably last true 15 years ago

It’s just a model you can run locally (with good enough hardware), even without access to the internet