Skip to content

Shell and emacs scripts the facilitate a voice-controlled GPT4 co-worker

Notifications You must be signed in to change notification settings

pv-pterab-s/emacs-pinky-saver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Emacs Pinky Saver

This repository serves as a very simple implementation of a voice-driven GPT4 co-worker for emacs. This is not an emacs package, but rather strung-together elisp functions and shell scripts that demonstrate how disturbingly trivial it is to build an emacs AI co-worker. See the blog post for a quick writeup. Notably, it was surprisingly simple to record audio with very low latency and equally surprisingly simple to transcribe text with OpenAI as well as generate speech with OpenAI.

Most surpisingly, however, was how easy it was to define an iterative algorithm with a simple initial prompt:

In this conversation, we seek to satisfy the english instruction
`<INSTRUCTION>` by executing bash shell script code. In this conversation, you
will reply with bash code that I will then execute. I will record the STDOUT
and STDERR streams that the code produces and send it back to you. You will
consider the outputs and reply with more bash code (if needed) to continue
satisfying the instruction. We will iterate together in this fashion -
essentially giving you shell access to my computer.

Only reply with bash code. Do not reply with any formatting like backticks. I
need to be able to execute what you send me without reformatting or filtering.

Assume that any ambiguous references in the english instruction always resolve
to one of: a filename, a directory name, a variable names in file, or a
function name in a file. The context of the instruction is the directory named
`<DIRECTORY>`. Thus, before writing bash code to fulfill the instruction, you
must write bash code to collect enough information to define any ambiguous
references in the english instruction. Always assume ambigous references
resolve to _something_ - you've just got to collect enough information to
figure it out.

After you have fulfilled the instruction, reply in english by writing bash
code that echo's the word `REPLY` followed with the reply. Avoid multi-line
replies or replies that are very long. I will play back the reply using text
to speech software.

Setup Sketch

The emacs code is a simple elisp script ui.el that depends on an installed and configured gptel package. Critically, gptel must be configured to utilize the gpt-4-1106-preview model - an exclusive model you only have access to if you have paid at least $1 on OpenAI API charges in the past. This requires forcibly setting the gptel-model variable. We assume that you have a valid OpenAI API key that you must define in text-to-speech.sh and transcribe_audio.sh as well as in the gptel configuration. Finally (and horribly) the script is hard-coded to run at ~/voice-interface.

Email or file bugs if there are problems. I will formalize this work if there is interest.

Usage

ui.el defines C-c = as a toggle for voice recording. Source the ui.el with eval-buffer (after setup as described above). From dired-mode, begin recording with the toggle, make your request, and toggle the recording again. Once toggled off, emacs will process your voice recording, query the AI with the initial prompt, above, and start executing the AI's instructions.

WARNING The AI can make mistakes and you might say something horrible on accident like "Man, I hate my harddrive!" I hope you understand the ramifications :D

About

Shell and emacs scripts the facilitate a voice-controlled GPT4 co-worker

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published