API endpoint #14

nathan-skynet · 2024-04-26T11:14:25Z

I'm looking to find documentation for using Wyoming Whisper/Piper on a curl request but I can't find what endpoint was used and what data to send to it.

I even look in the code but either I don't understand it or I'm really stupid.

Tomywang999 · 2024-06-28T05:35:50Z

Same here, hope some kind of reference document can be published

GRbit · 2024-07-16T15:43:17Z

Same here.

What can be understood from the README.md is that the body of the request should be encoded with JSONLand the paypload should be in PCM audio format. We can also see description of some requests, but no real examples.

I suggested that https://github.com/rhasspy/wyoming-faster-whisper is run via tcp:// protocol and tried to send a describe request to it. As I understood request should look like this {"type":"describe"}\n\n. Server never replied, unfortunately.

I would really appreciate any help with sending an example request to any Wyoming server. It's a bit hard for me to understand how it works reading the code.

GRbit · 2024-07-16T16:50:59Z

I've ended up installing wireshark and capturing packages from HA to piper/whisper.

It's indeed TCP (mine code had some errors, I didn't send FIN after write), but the packages looks a bit different from what I assumed from the README.md

For example synthesize request described as

synthesize - request to generate audio from text
- text - text to speak (string, required)
- voice - use a specific voice (optional)
  - name - name of voice (string, optional)
  - language - language of voice (string, optional)
  - speaker - speaker of voice (string, optional)

And from the Format desciption I would assume that I should set type to "synthesize" and put the text and voice in "data".
But it turned out that it should be on the next line as "Additional data". That's a surprise.

So, I guess for any request you can try putting the request under data JSON key or sending it as "Additional data" and one of two will work. Then try to handle what you got form the server.

synesthesiam · 2024-07-18T04:11:13Z

Wyoming author here. The first line of the TCP message is JSON with the event type and the size of the "additional data" (JSON) plus the size of the binary payload (PCM audio).

The "additional data" was needed because Python has a limit to how many characters can be on a single line, so large JSON messages would be cut off.
My solution was to add another section for additional data and merge it with the (small) JSON data from the header.

GRbit · 2024-07-18T08:12:43Z

Greetings @synesthesiam! Thank you for your work on the protocol and for taking the time to respond to our request.

If you have some more time, I would like to ask one more question. Is there somewhere I can read about the API? I mean, I got the idea with line of JSON, line of additional JSON and PCM audio, the real question for me is the spec of the messages. Is there a place where I can see examples of each request? Otherwise I cannot predict if the data should come in the first line under data key or in the second line as "additional data". Are messages always come as "additional data"? If so, in which cases data key from the first line is used?
I have experience with some HTTP APIs where there is a specification like OpenAPI or JSON:API. It would be great to have something similar for Wyoming, even if it's not HTTP but TCP.

Also, I'm very curious about the limitation you mentioned "Python has a limit on how many characters can be on a single line". Can you tell me more about this? I've never heard of this before.

sdetweil · 2024-11-23T21:46:11Z

there is no doc, but the events are documented in the source, here in the wyoming sub folder.

i just added some private events for my app.

i think the limit must be in the server, because everything else in the world can handle big json content

sdetweil · 2024-11-23T21:54:11Z

but the 'api' is read til you get to a lf
examine the content and decide if you need to read more, if so read length binary bytes, no delimiter
and then repeat

GRbit · 2024-11-25T19:07:18Z

but the events are documented in the source

@sdetweil could you be so kind to show me a specific place where this is? I think it would be immensely helpful if such a pointer would be somewhere in readme. Looking at the code I can see events classes, they have class fields that are described in the readme, yet it's not clear how to come up with a working JSON from it.

but the 'api' is read til you get to a lf
examine the content and decide if you need to read more, if so read length binary bytes, no delimiter
and then repeat

Thank you, that is basically how it is and that is what I found with Wireshark. At the same time it doesn't help to understand the structure of the messages. I think what I'm asking for could be called an "API specification". Currently, the only way to understand it is to read the code or reverse engineer the server-client connection, and both options are quite time consuming. Having an example for each event type would make the project and protocol much more accessible and easier to work with for new developers.

sdetweil · 2024-11-25T19:33:15Z

ok, you found the wyoming folder here which has the types
its pretty simple, json(has all elements in double quote, i didnt use them here )

{ type: .... , event-type-property:value...,length:number}

and if it is an event w big data, AudioChunk
the binary data comes immediately AFTER the lf, with
length specified IN the event json

event-type-property
text ,and language for Transcribe, text for Transcript (although i think it needs another, whether it is a partial or final result)
data for Audiochunk
audiostart has parms that tell the format
etc

the readme here in the wyoming project lists the event specific properties. those are the event object keys (in js)

sdetweil · 2024-11-25T19:42:51Z

my app needs notification of hot word detection(it changes to ui to indicate listening) and then the command string (transcript). i can can use the not detected event too.

i connect a js app to the open tcp socket and read the events my manager service sends .
the messages are in json, so i can
parse them and process by the type

sdetweil · 2024-11-26T14:38:51Z

here is the output of response from describe and from my custom, Hotword and Command events
at my js app reading from the tcp uri of my manager service (the describe response is just placeholder, as this is not one of the service types supported by Wyoming at the moment.)

response from Describe() request

{"type": "info", "version": "1.6.0", "data_length": 514}

{"asr": [{"name": "google-streaming", "attribution": {"name": "Sam Detweiler", "url": "https://github.com/sdetweil/wyominggoogle"}, "installed": true, "description": "google cloud streaming asr", "version": "1.0.0", "models": [{"name": "google-streaming", "attribution": {"name": "rhasspy", "url": "https://github.com/rhasspy/models/"}, "installed": true, "description": "google cloud streaming asr", "version": "1.0.0", "languages": ""}]}], "tts": [], "handle": [], "intent": [], "wake": [], "mic": [], "snd": []}

my manager sends unsoliticited events to inform the reco process, and the is_final tells me if reco is completed or still in progress
see #33
I have a terrible cold and have lost my voice.. so I get unuseful reco a lot

{"type": "hotword", "version": "1.6.0"}

{"type": "command", "version": "1.6.0", "data_length": 41}
{"text": "commercial", "is_final": false}
{"type": "command", "version": "1.6.0", "data_length": 41}
{"text": "commercial", "is_final": false}
{"type": "command", "version": "1.6.0", "data_length": 41}
{"text": "commercial", "is_final": false}
{"type": "command", "version": "1.6.0", "data_length": 45}
{"text": "commercial man", "is_final": false}
{"type": "command", "version": "1.6.0", "data_length": 45}
{"text": "commercial map", "is_final": false}
{"type": "command", "version": "1.6.0", "data_length": 45}
{"text": "commercial map", "is_final": false}
{"type": "command", "version": "1.6.0", "data_length": 45}
{"text": "commercial map", "is_final": false}
{"type": "command", "version": "1.6.0", "data_length": 45}
{"text": "commercial map", "is_final": false}
{"type": "command", "version": "1.6.0", "data_length": 45}
{"text": "commercial map", "is_final": false}
{"type": "command", "version": "1.6.0", "data_length": 45}
{"text": "commercial map", "is_final": false}
{"type": "command", "version": "1.6.0", "data_length": 44}
{"text": "commercial map", "is_final": true}

skewballfox · 2024-12-25T17:46:18Z

Something worth tracking upstream is OpenAPI is planning to add support for streaming json formats formats in 3.2.

Mentioning since you're using using openapi as a spec in wyoming/http/conf, I think once that's implemented users should be abile to use tools for generation/validation

skewballfox mentioned this issue Dec 25, 2024

Consider defining protocol in an IDL #19

Closed

Fizzixnerd mentioned this issue Jan 5, 2025

Pitch for a Successor Protocol #37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API endpoint #14

API endpoint #14

nathan-skynet commented Apr 26, 2024

Tomywang999 commented Jun 28, 2024

GRbit commented Jul 16, 2024

GRbit commented Jul 16, 2024

synesthesiam commented Jul 18, 2024

GRbit commented Jul 18, 2024 •

edited

Loading

sdetweil commented Nov 23, 2024

sdetweil commented Nov 23, 2024

GRbit commented Nov 25, 2024

sdetweil commented Nov 25, 2024 •

edited

Loading

sdetweil commented Nov 25, 2024 •

edited

Loading

sdetweil commented Nov 26, 2024 •

edited

Loading

skewballfox commented Dec 25, 2024 •

edited

Loading

API endpoint #14

API endpoint #14

Comments

nathan-skynet commented Apr 26, 2024

Tomywang999 commented Jun 28, 2024

GRbit commented Jul 16, 2024

GRbit commented Jul 16, 2024

synesthesiam commented Jul 18, 2024

GRbit commented Jul 18, 2024 • edited Loading

sdetweil commented Nov 23, 2024

sdetweil commented Nov 23, 2024

GRbit commented Nov 25, 2024

sdetweil commented Nov 25, 2024 • edited Loading

sdetweil commented Nov 25, 2024 • edited Loading

sdetweil commented Nov 26, 2024 • edited Loading

skewballfox commented Dec 25, 2024 • edited Loading

GRbit commented Jul 18, 2024 •

edited

Loading

sdetweil commented Nov 25, 2024 •

edited

Loading

sdetweil commented Nov 25, 2024 •

edited

Loading

sdetweil commented Nov 26, 2024 •

edited

Loading

skewballfox commented Dec 25, 2024 •

edited

Loading