Name: New Generation of Speaker ID and Language ID

Text: cc)
p h ri

ets

ci

NEW GENERATION OF
SPEAKER ID & LANGUAGE ID

AGENDA
1. A b o u t
2. S I D & LID Technology
3. U s a g e
4. Integration

phor9±eitin

SPEECH TECHNOLOGIES
Speaker/Voice
Recognition

Who speaks?
John Do,

Gender
Recognition
Language
Recognition

What gender?
or e t e

• • • 1 1 0 1 .

What language?
- lglish/Ar ? ?

410111004,--

audio (speech)

Speech
Recognition

Keywords hunting
Kilt s p o t t e d
W h a t w a s said?
"I will come at 5!"

Phone3An

www.phonexiacom

COOPERATION

1
it

We offer:
• Consultations
• S p e e c h technologies
system design / technology / adaptation & optimisation
/ delivery / technical support / maintenance

• C u s t o m research and development

Phonerin

www.phonexiacom

PHONEXIA
Goal:
Help clients to extract automatically
maximum of valuable information from
spoken speech.
• 2 0 0 6 founded as spin-off of
Brno University of Technology (BUT)
• 6 + years or work for security/defense sector
15+ years experience in speech processing
• H e a d q u a t e r in Czech Republic,
active worldwide
• T i g h t collaboration with BUT
- 2 0 researches moves the technology forward
- 1 5 people for development, sales, marketing
and technical support

PhoneIlin

w

w

w

.

p

h

o

n

e

x

i

a

.

c

o

m

CLIENTS AND PARTNERS
Customers and partners:
• S e c u r i t y and defense agencies
• C a l l centers
• I T integrators
• H W suppliers
• T V and radio stations (audio
archive indexing)
• Universities
• Customers on several continents:
USA, Russia, UK, Germany,
France, Czech Republic, Poland,
Slovakia, India,
Spain, Israel,
Mexico,

Ministerstvo obrany
repubilky

I C C •••••- .0-2111111:

SE E
El l
• Ele

111V2aTECH
SQ
conetIc
La W i t d c l C I n i t

OptimSys



www.phonexia.com

ci
TECHNOLOGY VIEW

LANGUAGE RECOGNITION
• C r i m e is caused by some minorities very often
e C a l l record forwarding based on lang./dialect
x
(to operator I other technologies I archive ...)
• L a n g u a g e validation /Analysis of the audio archive x

1-,

,

Technology:

®

• 5 0 + languages & customer can add new (20h)
• U s e r based calibration
• i V e c t o r based technology, discriminative training,
acoustic channel independent
C2;)
P h o n e ) : in

www.phonexiacom

SPEAKER RECOGNITION
• S p e a k e r verification / search / spotting
clustering, link/pattern analysis
• A c o u s t i c channel and language independent
technology, noise and channel distortion robust
technology
• B a s e d on 'Vectors algorithm
• Vo i c e p r i n t extraction and comparison
• C o m p a c t voiceprint representation (<1kB)
• M i l l i o n s of comparisons in fraction of seconds
• D i a r i z a t i o n (speaker segmentation)
• U s e r based system training,
user based calibration
P h o n e Itin

www.phonexia.com

HOW DOES IT WORK?

audio
VP1
VP2
VP3
VP4 F i l e s
VP5 o r
VP6 D B
(SQL)

pre-processing,
VP 0
segmantat ion

Extraction
Comparison

Phoneuiri

%VP1
%VP2
%VP3
%VP4
%VP5
%VP6

scoring
post-processing

www.phonexia.com

SEVERAL SPEAKERS

Zaudios
pre-processing
+ diarization

VP VP 0.1
VP 0.2
VP 0.3

Wavl
Wav2
Wav3

Extraction
cl-t)
Phoneuin

VP1
VP2
VP3
VP4 F i l e s
VP5 o r
VP6 D B
(SQL)

Comparison

1p/oVP-1
%VP2
%VP3
%VP4
%VP5
%VP6

scoring
post-processing
voAmtpnonexia.com

EXAMPLE

AS- a

Audio record 15 sec
DB of 50.000 voice prints of the suspects
- time to extract one voice-print :
- time to compare the voice-prints :

0.25 sec
0.0625 sec

TOTAL time:

www.phonexia.com

Ph

ne 1
WHAT FOR?

Ph
INTEGRATION

1ci

INTEGRATION - CMDLINE
• T h e command line version (Win or Linux for 32/64 bits, ...).
• Good for automatic processing.
• Using skripting languages
(bat/shell script, Perl, Python ...)
• Possible inputs: one audio file / list of the files / folder
v p e x t r a c t - c s e t t i n g s \ e x t r a c t . b s - v - d example - e way - h example\vprints

vpcompare - c s e t t i n g s \ c o m p a r e . b s - 1 - o s c o r e . t x t l i s t l . t x t

Phonerin

w

w

w

.

p

h

o

n

e

x

i

a

list2.txt

.

c

o

m

INTEGRATION - PSIP
Phonexia Speech Intelligence Platform (PSIP)
Configuration example


Output example
csid>

sid

sid

sid

speaker i d < / t y p e >

list

3 . 8 4 0 < / s p e e c h l e n g t h >

listl.txt
list2.txt

John
< b e s t _s c o r e > 9 4 . 0 2 1 < / b e s t s c o r e >


score.sco


Phonerin

www.phonexiacom

VALUE OF COMBINATION
Quality control
and
segmentation

Language
identification

Speech
transcription
English

Speech
transcript

Keyword
spotting
Russian

Detected
keywords

Keyword
spotting
Arabic

• Detected
keywords

Call
Garbage
Unknown
language

Phoneum

www.phonexia.com

SPEECH INTELLIGENCE
PLATFORM
if* O H N i - XML
coring

XML
output

XML
output

XML
output

XML
output

XML
output

XML
output

INTEGRATION A P I
cottlitilMESIMMr-

V O M M M M M M M M M M M M M M i r

11111 M r

SDK - API (C++, Java, C#), documentation, examples
instruction for other platforms (Net, Delphi, ...)
Ifinclude " b s a p i . h "

SLIDI * p l i d

s t a t i c _ c a s t < S L I D I *>(3SAPICreateInstance(SIID_Liu));

plid->Init("default.cfg")
plid->SetModelDirectory("languages"); plid->ActivateAllModels();

p l i d - > Te s t F i l e ( " f i l e . w a v " )
i n t num; c h a r **pplanguages = plid->GetModelNames(&num);
f l o a t *pscores = plid->GetModelScores();

plid->Release();

UZ;)

Pr11311PWIll

Documentation:
Phonexia.col o w n l o a d
www.phonexia.com

FOR EVALUATION
all1111111111111111

www.phonexia.comidownk
• S W

for Windows/Linux in GUI cmd-line version.

• T h e free license file is send to the specified email.
0 2 Lariu40e Wen6ca11
$ 0 6 111 . 5 3 , LC.R..."

.:41:2csion thrushat shit

LIMUKCES4

ir
twiwk
mmkpwasur
Out.

-e,

procnc/snput/kelymn
0 2 Lang

N

i n t e 41(mr1 panswaan smowni ' I t e m I
nwr ram

Pow,

I

p r w s t i r t i n i t i t r e _ W o m a n _ L w a v
Rerx.pme 2 . . . F r e n c h canathan ( 0.00016•)

+

Play lief S l a y F a z e

Score w

Gender

100.000

F (76396)

q kelly3.wer

97.760

F (34364)

q jullajmay

43.669

F (59-205)

q pani_twac

40236

M (97362)

q lullai.wav

35236

F (16312)

000030

60:00910

q daadiarav

31971

m (93759)

0100:22

00:0039

• L ProgramFdes

q

32360

m 93244)

0023037

00:0030

• t r a m Fa. (see)

Cif paul_lwav

1(245

M (97142)

0102,21

000036

settngs
L test

± t tan
• PelLogs
setae

danktivac

snit- n risr

L users
n•
em is ready

(;)
Phoneuin

-

Ale

models

essIng

-

q kelty_lowar

e a C:
I e , Phonexe SM
L data

arlaritlaWer

L i X -I_

Testing I speakers

k e c *as extenn5

1 sae nine I Ls-as-...-

P i t a . ) riputidavid

n

de f d i t V i e w H e l p

40



T.15412%6

at

▪ Kelly - Speaker Identi c

warm, se p r o o n s i m p u t i v e r _ w o r w a n _ i was



4

km:4.-P( 1111011111

kemsi 8

www.phonexia.com

HOW CAN WE HELP?
Phonexia is
a technology company

°wino,
wrlisteppits

O a d

P h O n e X i a

D e m o

APPI,CatiOnS

P r o t a • 11 0 , 0 n

S e n d . ,
- 0 0 • 1 , 1 t w . a b l e t n a l o g a m o e r a ol v a t . t r o e r v t k i e t l i z t h t
. . . . e r p w t e l l o n f i v o w e t , o t c o - o m O y n • f t e ow m i l P l O o t h e 0 , % ” 1 , 0 1 • • • c k

.y0/flow.
Pun.

m

i

d

,

I

t

y



*

b

u

n

,

S

t

e

r

d

p

e

n

i

l

,

a

.

1

,

P

1

.

7

.

1

6

P

g

I

M

A

O

N

.

r

.

,

C a t s .

o.••••,
th.noat.- •••••-th. ft- • ..••••

Utot,

We can offer:
• Consultations
• Speech technologies and solutions
• Custom development.
• Research

?

e



• • • • • M o apfAnsw.

n l• sm.o.mew
o

a

Cri."4-1 r . = = . " : " " • • • • , , ••




P h o s t a • • • • • • • • • • •oympopel t p d a w , n o m o
i t t l t e a r c a n y • F l n d e rat. t n t i 5 a w s l a b l e 0 0 , 4 . g a l - i c e s . . . m t .
, Te n t l s 1 • 1 0 1 1 . a . V o l t e m m t e i n e gedrnciAto t eoun i m
t tLeee. -•. . -1t .w1i "t n t i n t .
o neSi C7a.D. vt n
nw
m s . a r t i m , no • * * i * m o m u m w d o azAmn • • • • e t r i nmdls, gin!, • • • • • • • v o n e c t
o t o s Ta • t e s a r t a t m m a l l u e :§itto s . s t r • •

f=r;d;C::1".?::

Get the most
from speech records
C?:)
PME3110

www.phonexia.com

ClatA
Radim Kudia
Do you have all!
for your business

BDM
+420 732 100 775
kudia©phonexia.com

ENI

Ph' lexia s.fr.
CALL CENTER
ktjtaw".-new)r,fics.,--t•
INTELLIGENC
tsrbotdbm., E
11n.n.,,,,kr
con,
tos•tt,
i

t

+420 511 205 265
info©phonexia.com

A•E,3/4„04
U D I ° A lIt oVs I
-4,00

Try it!
www.phonexiacom/download
www.superlectures.com (BUT)
Phnnetin

www.phonexia.com

Document Path: ["1303-phonexia-presentation-new-generation-of.pdf"]

e-Highlighter

Click to send permalink to address bar, or right-click to copy permalink.

Un-highlight all Un-highlight selectionu Highlight selectionh