Prezentace se nahrává, počkejte prosím

Prezentace se nahrává, počkejte prosím

CLEF-2007 Cross-Language Speech Retrieval Track Overview CU:Pavel Pecina, Jan Hajic, Petra Hoffmannova DCU:Gareth Jones, Ying Zhang UMD:Doug Oard, Dagobert.

Podobné prezentace


Prezentace na téma: "CLEF-2007 Cross-Language Speech Retrieval Track Overview CU:Pavel Pecina, Jan Hajic, Petra Hoffmannova DCU:Gareth Jones, Ying Zhang UMD:Doug Oard, Dagobert."— Transkript prezentace:

1 CLEF-2007 Cross-Language Speech Retrieval Track Overview CU:Pavel Pecina, Jan Hajic, Petra Hoffmannova DCU:Gareth Jones, Ying Zhang UMD:Doug Oard, Dagobert Soergel, Scott Olsson IBM: Bhuvana Ramabhadran JHU:Bill Byrne (Cambridge), Zak Shafran (OHSU) USC:Sam Gustman UWB: Pavel Ircing

2 Speech “Retrieval” Evaluations  1996-1998: TREC SDR  English broadcast news; English queries  1997-2004: TDT  multilingual news; query by example  2003-2004: CLEF CL-SDR  English broadcast news; Multilingual queries  2005-2007: CLEF CL-SR  English/Czech interviews; Multilingual queries  2007: CLEF QAST  English lectures/meetings, English questions

3 What’s New in 2007?  Czech  Fixed “quickstart” time alignment problem!  29 training topics, 42 new evaluation topics  3 new teams (Brown, Chicago, Charles U)  English  17% relative improvement over 2006 (TD, ASR)  4 new teams (Brown, Chicago, Jaen, Amsterdam)  Same topics and ASR as 2006  63 training topics, 33 evaluation topics

4 CLEF‒2007 Cross-Language Speech Retrieval Track Overview

5 English ASR Training: 200 hours from 800 speakers ASR2003A ASR2004A ASR2006A

6 VHF00009-056154.003 Sidonia L... | 1927 | Shaindl | L... | Sydzia Issac L..., Cyla L... Shabbat | Jewish identity | customs and observances, Jewish | Przemysl (Poland) | food | Poland 1918 (November 11) - 1939 (August 31) SL recounts her daily activities. She notes her family's Jewish identity and she talks about a typical Shabbat. SL describes cholent. december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of =em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home any … december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of =em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home … december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of =em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home any … december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of =em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home any … cultural and social activities | customs and observances, Jewish | family life | food | Shabbat | sports and games | education | family homes | grandmothers | education, Jewish | Jewish-gentile relations | schools | synagogues | Polish (language) | working life | photographs (stills) 1930s | Poland 1918 (November 11) - 1939 (August 31) | Poland 1935 (May 13) - 1939 (August 31) | Cracow (Poland) | Germany 1918 (November 11) - 1939 (August 31) Poland 1918 (November 11) - 1939 (August 31) | customs and observances, Jewish | education | cultural and social activities | extended family members | education, Jewish | family life | Jewish-gentile relations | Jewish identity | Hungary 1918 (November 11) - 1939 (August 31) | Shabbat | sports and games | Budapest (Hungary) | Poland 1941 (June 21) - 1944 (July 21) | synagogue attendance | Hungary 1939 (September 1) - 1944 (March 18) | food in the ghettos | forced labor in the ghettos | fate of loved ones | food

7 An English Topic Number: 1148 Title: Jewish resistance in Europe Description: Provide testimonies or describe actions of Jewish resistance in Europe before and during the war. Narrative: The relevant material should describe actions of only- or mostly Jewish resistance in Europe. Both individual and group-based actions are relevant. Type of actions may include survival (fleeing, hiding, saving children), testifying (alerting the outside world, writing, hiding testimonies), fighting (partisans, uprising, political security) Information about undifferentiated resistance groups is not relevant.

8 Automatic English TD Runs OttawaDCUBrownChicagoAmsterdam

9 Automatic English TD Runs Run IDMAPLangQueryDocument FieldsSite uoEnTDtQExF10.0855ENTDAK1,AK2,ASR04UO uoEnTDtQExF20.0841ENTDAK1,AK2,ASR04UO dcuEnTDauto0.0787ENTDAK1,AK2,ASR06BDCU brown.TD.auto0.0785ENTDAK1,AK2,ASR06BBLLIP UCkwENTD0.0571ENTDAK1,AK2,ASR06BUC UCbaseENTD10.0512ENTDASR06BUC UvA_2_en4g0.0444ENTDAK2,ASR06BUVA UvA_1_base0.0430ENTDASR06BUVA AK1 = AUTOKEYWORD2004A1, AK2 = AUTOKEYWORD2004A2, ASR03 = ASRTEXT2003A, ASR04 = ASRTEXT2004A, ASR06A =ASRTEXT2006A, and ASR06B = ASRTEXT2006B.

10 Wilcoxon Signed-Rank Test UO ‒ DCU The number of nonzero tests is --> 33 The sum of the signed rank is --> 86.000000 The 95.0% level of confidence is --> 184.129807 Methods cannot be separated. UO ‒ BLLIP The number of nonzero tests is --> 33 The sum of the signed rank is --> 147.000000 The 95.0% level of confidence is --> 184.129807 Methods cannot be separated. DCU ‒ BLLIP The number of nonzero tests is --> 32 The sum of the signed rank is --> 12.000000 The 95.0% level of confidence is --> 175.945801 Methods cannot be separated. BLLIP ‒ UC The number of nonzero tests is --> 33 The sum of the signed rank is --> 198.000000 The 95.0% level of confidence is --> 184.129807 Method 1 is better than method 2. UC ‒ UVA The number of nonzero tests is --> 33 The sum of the signed rank is --> 138.000000 The 95.0% level of confidence is --> 184.129807 Methods cannot be separated. UODCUBLLIPUCUVA 0.06980.20180.00030.04330.2049 0.00150.00190.00020.00220.0014 0.02460.03160.01870.02110.0294 0.14370.12120.11910.17220.0219 0.39300.32190.38630.24840.0313 0.00450.0001 0.00040.0002 0.04730.07970.00010.05290.0493 0.00990.00880.02070.00500.0043 0.16010.15710.17060.11480.1255 0.06440.02440.03920.01860.0205 0.40630.20770.35460.11890.0780 0.07010.07090.07900.03750.0187 0.12680.12110.21850.10210.1146 0.29470.12250.25290.07140.1771 0.00530.00800.00640.00380.0046 0.01550.00520.00230.01520.0114 0.00250.00080.00060.00070.0053 0.10490.08130.06750.06640.0838 0.08300.10760.09410.05870.0976 0.08580.07950.06580.05740.0386 0.15000.16140.13510.14870.0566 0.01440.00480.01770.00520.0043 0.00030.00040.00140.00020.0006 0.02800.01100.02040.01130.0078 0.21960.17990.07820.11950.0545 0.01160.04440.05040.04890.0102 0.00800.00870.01200.00590.0252 0.00130.00120.00070.00080.0003 0.13390.19900.22900.17730.1069 0.00430.00220.00250.00180.0045 0.01340.08840.03240.00910.0033 0.03820.06570.06110.05910.0234 0.08430.07660.05390.08540.0508 0.08550.07870.07850.05710.0444

11 Monolingual vs. Cross-lingual (Automatic TD Runs) SiteDocument FieldsEnglishFrenchSpanishDutch UOAK1,AK2,ASR040.085571%72%‒ DCUAK1,AK2,ASR06B0.078781%‒‒ UCAK1,AK2,ASR06B0.057171%‒‒ UVAAK2,ASR06B0.0444‒ ‒90%

12 CLEF 2007: The CL-SR Czech Track Pavel Pecina pecina@ufal.mff.cuni.cz

13 What’s Different in Czech?  Lack of manual interview annotation  no topic boundaries (start and stop times)  no description (summary, assessors' scratchpad)  English labels (assigned thesaurus terms) not used  Task: to identify appropriate replay points  focus on start time, stop times ignored  Unknown-boundary relevance assessment  manual labeling of start and stop times  modified mGAP used as the evaluation measure  penalization for not exact match

14 Interviews  Czech Holocaust survivors testimonies  357 mostly seen speakers, ~565 hours  35% ASR mean Word Error Rate 2007 Quickstart collection  11377 automatically generated overlapping passages  average passage duration 3.75 min, 33% overlap  fields: DOCNO, INTERVIEWDATA, ASRSYSTEM, CHANNEL, ASRTEXT  no thesaurus terms used in 2007

15 Interview Usage 0 30 6090120 150 180 Minutes from interview start … … … … Czech English 800 ASR Train 297 IR Eval 335 Seen 22 Unseen Seen ASR Train

16 VHF04106-7401.30 Tommy K...-K... 2006 right PŘIVEZLI VĚZNĚ NA NOSÍTKÁCH A PROSTĚ MUSEL TADY BYL SKUTEČNĚ KAŽDÝ KDO TO V TEREZÍNĚ V TÉ DOBĚ BYL ALE STEJNĚ VÝSLEDEK BYL MYSLÍM ŽE MLUVÍ STOJÍM PROSTORU KDE AŽ DO KONCE KVĚTNA ČTYŘICET TŘI PŘICHÁZELY TRANSPORTY Z BOHUŠOVIC TEĎ JEŠTĚ CHODILI PĚŠKY ALE POTOM UŽ JE TO DEVĚT ČASTO MOJE TO ZACHOVALA ČÁST KOLEJÍ KTERÉ VEDLY AŽ TAK ZA HAMBURSKÝ KASÁRNA TAM POTOM UŽ OD KVĚTNA ČTYŘICET TŘI DOCHÁZELY VŠECHNY TRANSPORTY JAK Z VENKU TAKÉ OPAČNĚ KDYŽ TEREZÍNA ŠLY TRANSPORTY DO TAKOVÉHO ZA NÁMI TO BYLA JSOU HAMBURSKÝ KASÁRNA TAM JSOU HANNOVERSKÝCH KASÁRNÁCH TADY NA NA TOM O TOM PROSTORU PŘED TOU HROMADOU VĚCÍ MIMOŘÁDNĚ ŠPATNÉ I TADY TŘEBA MÍSTO KDE PŘICHÁZELY TRANSPORTY KDYŽ ČILI DIALOSTECKÝ DĚTI DIALOSTECKÝ DĚTI DVANÁCTSET NEBO KDYŽ POTOM CHODILY TRANSPORTY Z NĚMECKA TAK PROSTĚ NA TEN NA TĚCH KOLEJÍCH SE DĚLALO JAK TRANSPORTOVÁNI LIDI CO PŘICHÁZELI SEM TAK TRANSPORTOVÁNI TY VĚCI DÁL DO OSVĚTIMI NEJVĚTŠÍ TRANSPORTY SE TO ZÁŘÍ ČTYŘICET ČTYŘI KDY TAKÉ ODEŠEL MŮJ OTEC A BRATR TO UŽ BYLO ASI DESET TRANSPORTU PO TISÍCI MUŽÍCH TI ŽIDOVŠTÍ ČINNOST KTERÝ SE VELKÁ VĚTŠINA NEVRÁTIL TAK SEM PODÍVALA DĚLÁME NA BAŠTU TO JE ZAJÍMALI PROSTOR TÍM NĚJAKÝM OSOBNÍ VZPOMÍNKU NA TOM NAHOŘE TO BYLO ZAHRADNICTVÍ VLÁDNOUT ALE NĚJAKÝ DO DOBY NEŽ U TEN TAM BYL POSTAVENÝ DOMEČEK ALE BYLO TAM TAKÉ MALÝ V ODBOJI HŘIŠTĚ TAM JSME NIKDY NE- NĚKDY PŘESNĚ TO VÍM SEDMADVACÁTÉHO KVĚTNA ČTYŘICET ČTYŘI JSME TAM HRÁLI FOTBAL UTKÁNÍ SPARTA NEŽ JÁ VÍM SPARTA TO BYL KLUK Z KLUKŮ KTERÝ JSME BYDLELI TADY V HAMBURSKÝCH KASÁRNÁCH TAKŽE TEN PODVOZEK ALE NAPŘED BYLI MY JSME MĚLI SVÍČKU NEBO SE SPARTA A SESTRY STA SEDMNÁCT TY TO BYL DĚTSKÝ DOMOV KLUKŮ TAM MĚLI SE STALO PŘED VNUČKU NEŽ ALE MY JSME TEHDY DO UTKÁNÍ PROHRÁLI TŘI JEDNA ALE TO NENÍ PODSTATNÝ PODSTATNĚ TO ŽE TEN ZÁPAS A JÁ TADY DVACÁTÉHO SEDMÉHO KVĚTNA ČTYŘICET ČTYŘI A ŽE SE NA NĚ BYL JEŠTĚ PŮJDE OTEC PAK UŽ ODJEL DÁL A UŽ SEM U NÍ NEVĚDĚL KOUKÁME SE NÁM ÚZKÁ KASÁRNA TO JSOU KASÁRNA KTERÁ BYLA DŘEVĚNÁ PRO ŽENY TAK SEM TADY S MAMINKOU HNED V TOM ČTYŘICÁTÉM DRUHÉM JSME SEM PŘIŠLI TAK JSME TADY BYLI UBYTOVANÝ NA POKOJ MĚSTĚ ŠEST ŽE TO TYPICKY KASÁRENSKÝCH DOBU KASÁRENSKÝCH BUDOVAL S DVĚMA DVORY VELIKÝM A VPRAVO BYLO JSOU HANNOVERSKÝCH KASÁRNÁCH TA MOJE ČERNÁ PEKÁRNA TAM SE TAK CHLEBA I PRO TEREZÍN TO PRO NÁS A BYL TAM TAKÉ TAKOVÝ DVŮR ODJÍŽDĚLI ŽENY CO VOZILI NA TU DOBRANSKÝCH VOZECH VŠECHNO CO BYLO POTŘEBA TEREZÍNĚ DĚLAL VŠE SE VOZILO S- VOZY KTERÝ BYLI V TOM DVOŘE TAMHLE VZADU MAMINKA TAM BYL ZAMĚSTNÁN OSUDU HUNDERTSCHAFT JSME UMĚLY HAMBURSKÝCH KASÁREN PŘED TÍM JSME SE DÍVALI OTEC PO TOM PRVNÍM DVOŘE HAMBURSKÝ KASÁRNA KDE BYLI ÚPLNĚ TEDA VĚTŠINOU ŽENY JÁ SEM ZABIL S MATKOU PŮL ROKU DVAAČTYŘICET BUDE TAKÉ VIDĚT MÍSTO KDE JSME BYLI BYLO TO MÍSTO ČÍSLO DVĚ STĚ ŠEST S MAMINKOU SPALY SAMOŘEJMĚ TÍM ŽE MATKA CHTĚLA

17 Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 9 Unuse d 10 2006 Czech Safety 28 2006 Czech Evaluation 1 2006 Czech Evaluation 115 29

18 Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 9 Unuse d 10 2006 Czech Safety 28 2006 Czech Evaluation 1 2006 Czech Evaluation 29 2007 Czech Training 115 29

19 Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 9 Unuse d 10 2006 Czech Safety 28 2006 Czech Evaluation 1 2006 Czech Evaluation 3 2007 Czech Safety 29 2007 Czech Training 118 29

20 Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 9 Unuse d 10 2006 Czech Safety 28 2006 Czech Evaluation 1 2006 Czech Evaluation 3 2007 Czech Safety 29 2007 Czech Training 40 Possible 2007 Czech Evaluation 10 Possible 2007 Czech Evaluation (6 or more relevant passages identified during search-guided assessment) 118 29 50 29

21 Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 9 Unuse d 10 2006 Czech Safety 28 2006 Czech Evaluation 1 2006 Czech Evaluation 3 2007 Czech Safety 29 2007 Czech Training 40 Possible 2007 Czech Evaluation 10 Possible 2007 Czech Evaluation 34 2007 Czech Evaluation 8 2007 Czech Evaluation (highly-ranked assessment completed) (6 or more relevant passages identified during search-guided assessment) 118 29 50 42 29

22 Evaluation Measure  based on the mean Generalized Average Precission  human assessments are binary  degree of match to the assessments can be partial  penalization for non 100% match up to 150 sec  quantization noise (scores lower than for English)  15 sec assessment granurality  quickstart documents begin every 150 sec 1.00.50.0 0 sec-75 sec+150 sec

23 Relevance Judgements  performed by 6 relevance assessors in Prague  completed for 87 topics  2156 rel. passages identified in the evaluation topics  completed for 42 topics  pool depth set to 50 start times  11896 highly-ranked start times checked (284/topic)  233 rel. passages identified Search-guided assessment Highly-ranked assessment

24 Relevance Assessment Interface

25 Relevance Judgement Results

26 Relevance Judgement Statistics

27 Participation  Brown University (BLLIP)  Matthew Lease, Eugene Charniak  Charles University (CUNI)  Pavel Češka, Pavel Pecina  University of Chicago (UC)  Gina-Anne Levow  University of West Bohemia (UWB)  Pavel Ircing, Luděk Müller  total of 15 runs submitted  required condition: automatic queries from Title and Description

28 Results

29

30 Results: Term normalization  the effect of term normalization for handling Czech morphology is quite significant:  60-120% relative improvement

31 Alignment Issues in the Quickstart Collection  2006 data release (affected 2006 working notes)  Time mismatch made mGAP uninformative (pauses ignored)  Post-CLEF 2006 evaluation (“corrected” in 2006 proceedings)  Post-hoc start time correction (but missing tapes counted as 30 min)  AUTO and MANUAL KEYWORDS still misaligned  2007 data release  Some additional corrections for ASR timing  AUTO and MANUAL KEWYORDS removed (too hard to fix)  2007 evaluation (reported in 2007 working notes)  Missing-tape timing corrected post-hoc

32

33 Test Collection Release  CLEF CL-SR track test collections:  Package for release  Independent (cross-site) validation  Deposit at ELDA  MALACH ASR training data:  Package English and Czech for release  With Polish, Russian, Slovak (+ maybe Hungarian)  Deposit at LDC

34 What Did We Learn?  Searching conversational speech works  Real user needs, two languages  Improving ASR helps less than expected  Error rates vary by speaker  Ranked retrieval prefers lower error rates  Automatic classification can help ASR  At least if error rates are high  Unsegmented sources bring new challenges  Cross-sourcealignment  Evaluation measure design

35 Critiquing the Collection  Large for ASR is small for IR  ~1,000 hours of speech = ~20,000 “documents”  No manual reference transcription  Would cost ~$100,000  Interviews are just one type of conversation


Stáhnout ppt "CLEF-2007 Cross-Language Speech Retrieval Track Overview CU:Pavel Pecina, Jan Hajic, Petra Hoffmannova DCU:Gareth Jones, Ying Zhang UMD:Doug Oard, Dagobert."

Podobné prezentace


Reklamy Google