Using its own browser, it can look at a webpage and interact with it by typing, clicking, and scrolling... Operator is one of our first agents, which are AIs capable of doing work for you independently—you give it a task and it will execute it... Operator can “see” (through screenshots) and “interact” (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations.
Up to now, I have been delighted if I can record my actions and play them back (e.g. QuicKeys, CoScripter), or create scripts by hand that automate tasks (e.g. QuicKeys and Keyboard Maestro).
Operator promises to leap ahead and simply perform a task when you give it a description, without even having to demonstrate the task once!
This is a new and amazing world that is coming into existence.
Sounds dangerous – the functional correctness of LLM-generated code is tailing off at barely better than a coin toss. They model syntax, but their suppliers essentially cross their fingers and hope to get lucky, some of the time, on meaning.
"Leap ahead and simply perform" would quickly end in tears.
Experience has shown that most users can barely articulate what they want to do, never mind describe it well enough that an Operator can take over. So expect a lot of false starts and clarifications -- they'll have to explain a dozen times to avoid "having to demonstrate the task once".
And note that Operator is (currently) limited to web sites/web apps -- you won't be using it to drive your laptop anytime soon.
Not knocking it, nor their ambition -- if it can do half the things claimed it will be pretty amazing. And they've at least considered "Safety and privacy" this time round!
Understandably – questions are built from concepts, and until there's been enough experimentation to acquire the relevant concepts, a clear and relevant question is very hard to frame.
( once the concepts are in place, the questions are often no longer needed ... )
but LLMs form no concepts all – just the statistics of word distributions.
It uses its own browser and does not operate outside of it, so it'll make use of whatever functionality they've built into their own browser and so aren't necessarily dependent on any public APIs
A much bigger story for OpenAI's prospects today (2025-01-27) is a big setback to the pitch which they have been making to investors – that more spending leads to better outcomes for AI models.
Nvidia's (related) share-price – which had depended on an assumption of huge future demand for chips, driven by deep learning – went over a slightly alarming cliff.
With no disrespect to the OP, this thread is undeniably off-topic in "Questions & Suggestions", and if an alternative topic area is not provided, there are reasons for that. I hope the community leaders will treat this is a "one-off" exception.
What seems like magic come with limitations that do not get reveal until one use it.
You can take a look at this article:
I looking forward to the day when the operator can work alongside with third party like Keyboard Maestro as it really make sense because there is no foolproof way for such automation to cover all cases (eg edge case).
Perhaps people who want a less... shall we say "regulated" -- response? See this (free) Guardian article. Such people might also want to review the privacy policy.
Similar applies to other services, of course. But, for some reason, certain governments that are quite happy letting (western) corporations (ab)use our data get rather twitchy when China is involved. So I suspect that the answer to "who needs OpenAI anymore" will be "those who aren't allowed to use DeepSeek" -- at least until more "acceptable" concerns start deploying the R1 model themselves.
I wouldn't want to use DeepSeek. All of its political views are approved by the Chinese Communist Party. Congress will probably debate banning it from America for being a "Chinese weapon."