Question 1

What does a full voice interaction stack include?

Accepted Answer

A complete voice interaction stack includes ASR (speech recognition), TTS (speech synthesis) and NLU/dialogue, plus an acoustic front-end (echo cancellation, noise suppression, beamforming). VoxEdge AI delivers the end-to-end pipeline with sub-300ms end-to-end latency, running on-device.

Question 2

Which edge hardware do you support?

Accepted Answer

NVIDIA Jetson (Orin/Nano), Qualcomm Robotics (RB5), Rockchip (RK3588/RV1126), Google Edge TPU and custom NPUs — via TensorRT, RKNN, SNPE, TFLite and ONNX Runtime.

Question 3

How do you pick a microphone array?

Accepted Answer

A 2-mic linear array suits near-field (within 1m), a 4-mic circular array suits mid-range (3-5m), and 6+ mics suit far-field (5m+). Choice also depends on SNR, cost and mechanical space. We provide 2- to 8-mic array solutions.

Question 4

How do we start?

Accepted Answer

Send us your robot, target silicon and requirements via the contact form. We'll come back with a tailored acoustic and voice assessment, usually within two business days.

VoxEdge AI — Edge Voice AI for Robots

Voice AI development & deployment for robots

Core technology

Applications

Edge silicon

Contact