AI Breakthrough: Self-Operating Computer Framework Created by OthersideAI

Late Nights Inspire Groundbreaking AI Innovation

Late nights with a newborn can lead to unexpected breakthroughs. Such was the case for OthersideAI developer Josh Bickett, who had an idea for a groundbreaking new “self-operating computer framework” while feeding his daughter in the middle of the night.

“I’ve been really enjoying time with my daughter, who’s four weeks old, and I had a lot of new lessons in fatherhood and all that stuff. But I also had a little bit of time, and this idea kind of came to me because I saw different demos of GPT-4 vision. The thing we’re working on now can actually happen with GPT-4 vision,”

– Josh Bickett

With his daughter cradled in one arm, Bickett sketched out the basic framework on his computer.

“I just found an initial implementation… it’s not super good at clicking the mouse in the right way. But what we’re doing is defining the problem: we need to figure out how to operate a computer.”

– Josh Bickett

When OthersideAI co-founder and CEO Matt Shumer saw the new framework, he recognized its tremendous potential.

“This is a milestone in the road to getting to the equivalent of a self-driving car but for a computer. We have the sensors now. We have the LIDAR systems. Next, we build the intelligence,”

– Matt Shumer

The framework developed by Bickett allows the AI to control both the mouse clicks and keyboard triggers, similar to a person.

“It’s like an agent like autoGPT except it’s not text based. It’s vision based so it takes a screenshot of the computer and then it decides mouse clicks and keyboards, exactly like a person would,” explained Bickett.

Shumer elaborated on how this framework represents a major advance over previous approaches that relied solely on APIs.

“A lot of things that people do on computers, right, you can’t really do with APIs, which is how a lot of other people are approaching this problem, [when] they want to build an agent. They built it on top of the publicly available APIs for this service, but that doesn’t extend to everything,”

– Matt Shumer

The framework takes screenshots as input and outputs mouse clicks and keyboard commands, just as a human would. But the real potential lies in the advanced computer vision and reasoning models that can be plugged into it.

“The framework will just be like plug and play, you just plug in a better model and it gets better,”

– Josh Bickett
Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts