Saltar al contenido principal

Voice

Voice service is a REST API to which you can send audio files to be processed and get the result of the voice recognition process. The service offers a service to enroll a new voice, and another one to authenticate a voice.

The product is for speaker verification and voice liveness detection. It is based on the use of a voice template, which is a string that contains the voice biometric information. This voice template can be used to authenticate the voices in the future.

Supported operations are:

Enrollment

This endpoint is used to enroll a new voice. It receives one or more audio files, and returns a voice template. The voice template is a string that contains the voice biometric information. This voice template can be used to authenticate the voices in the future. The audios can be encrypted or not, and encoded in base64. The returned template is always encrypted and encoded in base64. It accepts 1 audio, or 3 to 5 audios, to perform a text independent or text dependent enrollment, respectively:

  • 1 audio for text-independent enrollment.
  • 3 to 5 audios for text-dependent enrollment.
FieldDescription
audiosArray of strings. Each position of the array is an audio raw buffer encoded in base64 RFC4648. Maximum two files. It accepts 1 audio, or 3 to 5 audios.

Type supported

  • WAV
  • MP3
  • Opus/OGG
  • AAC
  • WMA
  • PCM ulaw and mulaw
  • FLAC
  • ALAC (mov)
  • MP4
  • AIFF

Example request:

curl --location '{IDENTITY_API_BASE_URL}/voice/enrollment' \
--header 'x-api-key: {IDENTITY_API_APIKEY}' \
--header 'Content-Type: application/json' \
--data '{
"audios": ["JVBERi0xLjQKJeLjz9MKNSAwIG9iago8P..."]
}'

Example response:

200 OK

{
"serviceResultCode": 200,
"serviceResultLog": "Service executed ok",
"timestamp": "2024-07-12T09:43:36Z",
"serviceTransactionId": "99999999-9999-9999-9999-999999999999",
"serviceResult": {
"operation_result": 3,
"template": "BgEBAQIvimhg/Th98mTNID4BPHKsJsf...",
"template_type": "text-dependent",
"validate_audios_result": [
{
"audio_position": 0,
"matching_score": 0.9999997019767761,
"multiple_speakers_score_detected": -3.4028234663852886e+38,
"result_code": 3,
"snr_db_detected": 18.781143188476562,
"speech_length_ms_detected": 4200,
"speech_relative_length_detected": 0.65625
},
{
"audio_position": 1,
"matching_score": 1,
"multiple_speakers_score_detected": -3.4028234663852886e+38,
"result_code": 3,
"snr_db_detected": 17.34685707092285,
"speech_length_ms_detected": 5000,
"speech_relative_length_detected": 0.6868131756782532
},
{
"audio_position": 2,
"matching_score": 1,
"multiple_speakers_score_detected": -3.4028234663852886e+38,
"result_code": 3,
"snr_db_detected": 17.34685707092285,
"speech_length_ms_detected": 5000,
"speech_relative_length_detected": 0.6868131756782532
}
]
},
"serviceTime": "638"
}

Authentication

This endpoint is used to authenticate a voice. It receives an audio file and a voice template, and returns a boolean value that indicates if the voice belongs to the same person as the one in the voice template, and a probability that indicates the similarity between the two voices. The audio can be encrypted or not, and encoded in base64. The voice template must be encrypted and encoded in base64.

FieldDescription
audioaudio raw buffer encoded in base64 RFC4648
templateBiometric template buffer, obtained from Enrollment(), encrypted and encoded in base64 RFC4648.

Example request:

curl --location '{IDENTITY_API_BASE_URL}/voice/authentication' \
--header 'x-api-key: {IDENTITY_API_APIKEY}' \
--header 'Content-Type: application/json' \
--data '{
"audio": "JVBERi0xLjQKJeLjz9MKNSAwIG9iago8P...",
"template": "BgEBAQI+d368i49ITeoPlmCi5zbYp3kdvTsk6otTOl...."
}'

Example response:

200 OK

{
"serviceResultCode": 200,
"serviceResultLog": "Service executed ok",
"timestamp": "2024-07-13T19:43:36Z",
"serviceTransactionId": "99999999-9999-9999-9999-999999999999",
"serviceResult": {
"liveness_score": 0,
"match": true,
"matching_score": 1,
"operation_result": 3,
"tracking_message": "",
"tracking_status": -1
},
"serviceTime": "1708"
}