狠狠综合久久久久综合网址-a毛片网站-欧美啊v在线观看-中文字幕久久熟女人妻av免费-无码av一区二区三区不卡-亚洲综合av色婷婷五月蜜臀-夜夜操天天摸-a级在线免费观看-三上悠亚91-国产丰满乱子伦无码专区-视频一区中文字幕-黑人大战欲求不满人妻-精品亚洲国产成人蜜臀av-男人你懂得-97超碰人人爽-五月丁香六月综合缴情在线

COM6511代寫、Python語言編程代做

時間:2024-05-09  來源:  作者: 我要糾錯



COM4511/COM6511 Speech Technology - Practical Exercise -
Keyword Search
Anton Ragni
Note that for any module assignment full marks will only be obtained for outstanding performance that
goes well beyond the questions asked. The marks allocated for each assignment are 20%. The marks will be
assigned according to the following general criteria. For every assignment handed in:
1. Fulfilling the basic requirements (5%)
Full marks will be given to fulfilling the work as described, in source code and results given.
2. Submitting high quality documentation (5%)
Full marks will be given to a write-up that is at the highest standard of technical writing and illustration.
3. Showing good reasoning (5%) Full marks will be given if the experiments and the outcomes are explained to the best standard.
4. Going beyond what was asked (5%)
Full marks will be given for interesting ideas on how to extend work that are well motivated and
described.
1 Background
The aim of this task is to build and investigate the simplest form of a keyword search (KWS) system allowing to find information
in large volumes of spoken data. Figure below shows an example of a typical KWS system which consists of an index and
a search module. The index provides a compact representation of spoken data. Given a set of keywords, the search module
Search Results
Index
Key− words
queries the index to retrieve all possible occurrences ranked according to likelihood. The quality of a KWS is assessed based
on how accurately it can retrieve all true occurrences of keywords.
A number of index representations have been proposed and examined for KWS. Most popular representations are derived
from the output of an automatic speech recognition (ASR) system. Various forms of output have been examined. These differ
in terms of the amount of information retained regarding the content of spoken data. The simplest form is the most likely word
sequence or 1-best. Additional information such as start and end times, and recognition confidence may also be provided for
each word. Given a collection of 1-best sequences, the following index can be constructed
w1 (f1,1, s1,1, e1,1) . . . (f1,n1 , s1,n1 , e1,n1 )
w2 (f1,1, s1,1, e1,1) . . . (f1,n1 , s1,n1 , e1,n1 )
.
.
.
wN (fN,1, sN,1, eN,1) . . . (fN,nN , sN,nN , eN,nN )
(1)
1
where wi is a word, ni is the number of times word wi occurs, fi,j is a file where word wi occurs for the j-th time, si,j and ei,j
is the start and end time. Searching such index for single word keywords can be as simple as finding the correct row (e.g. k)
and returning all possible tuples (fk,1, sk,1, ek,1), . . ., (fk,nk , sk,nk , ek,nk ).
The search module is expected to retrieve all possible keyword occurrences. If ASR makes no mistakes such module
can be created rather trivially. To account for possible retrieval errors, the search module provides each potential occurrence
with a relevance score. Relevance scores reflect confidence in a given occurrence being relevant. Occurrences with extremely
low relevance scores may be eliminated. If these scores are accurate each eliminated occurrence will decrease the number of
false alarms. If not then the number of misses will increase. What exactly an extremely low score is may not be very easy
to determine. Multiple factors may affect a relevance score: confidence score, duration, word confusability, word context,
keyword length. Therefore, simple relevance scores, such as those based on confidence scores, may have a wide dynamic range
and may be incomparable across different keywords. In order to ensure that relevance scores are comparable among different
keywords they need to be calibrated. A simple calibration scheme is called sum-to-one (STO) normalisation
rˆi,j = r
γ
 
i,j
ni
k=1 r
γ
i,k
(2)
where ri,j is an original relevance score for the j-th occurrence of the i-th keyword, γ is a scale enabling to either sharpen or
flatten the distribution of relevance scores. More complex schemes have also been examined. Given a set of occurrences with
associated relevance scores, there are several options available for eliminating spurious occurrences. One popular approach
is thresholding. Given a global or keyword specific threshold any occurrence falling under is eliminated. Simple calibration
schemes such as STO require thresholds to be estimated on a development set and adjusted to different collection sizes. More
complex approaches such as Keyword Specific Thresholding (KST) yield a fixed threshold across different keywords and
collection sizes.
Accuracy of KWS systems can be assessed in multiple ways. Standard approaches include precision (proportion of relevant retrieved occurrences among all retrieved occurrences) and recall (proportion of relevant retrieved occurrences among all
relevant occurrences), mean average precision and term weighted value. A collection of precision and recall values computed
for different thresholds yields a precision-recall (PR) curve. The area under PR curve (AUC) provides a threshold independent summative statistics for comparing different retrieval approaches. The mean average precision (mAP) is another popular,
threshold-independent, precision based metric. Consider a KWS system returning 3 correct and 4 incorrect occurrences arranged according to relevance score as follows: ✓ , ✗ , ✗ , ✓ , ✓ , ✗ , ✗ , where ✓ stands for correct occurrence and ✗ stands
for incorrect occurrence. The average precision at each rank (from 1 to 7) is 1
1 , 0
2 , 0
3 , 2
4 , 3
5 , 0
6 , 0
7 . If the number of true correct
occurrences is 3, the mean average precision for this keyword 0.7. A collection-level mAP can be computed by averaging
keyword specific mAPs. Once a KWS system operates at a reasonable AUC or mAP level it is possible to use term weighted
value (TWV) to assess accuracy of thresholding. The TWV is defined by
TWV(K, θ) = 1 −
 
1
|K|
 
k∈K
Pmiss(k, θ) + βPfa(k, θ)
 
(3)
where k ∈ K is a keyword, Pmiss and Pfa are probabilities of miss and false alarm, β is a penalty assigned to false alarms.
These probabilities can be computed by
Pmiss(k, θ) = Nmiss(k, θ)
Ncorrect(k) (4)
Pfa(k, θ) = Nfa(k, θ)
Ntrial(k) (5)
where N<event> is a number of events. The number of trials is given by
Ntrial(k) = T − Ncorrect(k) (6)
where T is the duration of speech in seconds.
2 Objective
Given a collection of 1-bests, write a code that retrieves all possible occurrences of keyword list provided. Describe the search
process including index format, handling of multi-word keywords, criterion for matching, relevance score calibration and
threshold setting methodology. Write a code to assess retrieval performance using reference transcriptions according to AUC,
mAP and TWV criteria using β = 20. Comment on the difference between these criteria including the impact of parameter β.
Start and end times of hypothesised occurrences must be within 0.5 seconds of true occurrences to be considered for matching.
2
3 Marking scheme
Two critical elements are assessed: retrieval (65%) and assessment (35%). Note: Even if you cannot complete this task as a
whole you can certainly provide a description of what you were planning to accomplish.
1. Retrieval
1.1 Index Write a code that can take provided CTM files (and any other file you deem relevant) and create indices in
your own format. For example, if Python language is used then the execution of your code may look like
python index.py dev.ctm dev.index
where dev.ctm is an CTM file and dev.index is an index.
Marks are distributed based on handling of multi-word keywords
• Efficient handling of single-word keywords
• No ability to handle multi-word keywords
• Inefficient ability to handle multi-word keywords
• Or efficient ability to handle multi-word keywords
1.2 Search Write a code that can take the provided keyword file and index file (and any other file you deem relevant)
and produce a list of occurrences for each provided keyword. For example, if Python language is used then the
execution of your code may look like
python search.py dev.index keywords dev.occ
where dev.index is an index, keywords is a list of keywords, dev.occ is a list of occurrences for each
keyword.
Marks are distributed based on handling of multi-word keywords
• Efficient handling of single-word keywords
• No ability to handle multi-word keywords
• Inefficient ability to handle multi-word keywords
• Or efficient ability to handle multi-word keywords
1.3 Description Provide a technical description of the following elements
• Index file format
• Handling multi-word keywords
• Criterion for matching keywords to possible occurrences
• Search process
• Score calibration
• Threshold setting
2. Assessment Write a code that can take the provided keyword file, the list of found keyword occurrences and the corresponding reference transcript file in STM format and compute the metrics described in the Background section. For
instance, if Python language is used then the execution of your code may look like
python <metric>.py keywords dev.occ dev.stm
where <metric> is one of precision-recall, mAP and TWV, keywords is the provided keyword file, dev.occ is the
list of found keyword occurrences and dev.stm is the reference transcript file.
Hint: In order to simplify assessment consider converting reference transcript from STM file format to CTM file format.
Using indexing and search code above obtain a list of true occurrences. The list of found keyword occurrences then can
be assessed more easily by comparing it with the list of true occurrences rather than the reference transcript file in STM
file format.
2.1 Implementation
• AUC Integrate an existing implementation of AUC computation into your code. For example, for Python
language such implementation is available in sklearn package.
• mAP Write your own implementation or integrate any freely available.
3
• TWV Write your own implementation or integrate any freely available.
2.2 Description
• AUC Plot precision-recall curve. Report AUC value . Discuss performance in the high precision and low
recall area. Discuss performance in the high recall and low precision area. Suggest which keyword search
applications might be interested in a good performance specifically in those two areas (either high precision
and low recall, or high recall and low precision).
• mAP Report mAP value. Report mAP value for each keyword length (1-word, 2-words, etc.). Compare and
discuss differences in mAP values.
• TWV Report TWV value. Report TWV value for each keyword length (1-word, 2-word, etc.). Compare and
discuss differences in TWV values. Plot TWV values for a range of threshold values. Report maximum TWV
value or MTWV. Report actual TWV value or ATWV obtained with a method used for threshold selection.
• Comparison Describe the use of AUC, mAP and TWV in the development of your KWS approach. Compare
these metrics and discuss their advantages and disadvantages.
4 Hand-in procedure
All outcomes, however complete, are to be submitted jointly in a form of a package file (zip/tar/gzip) that includes
directories for each task which contain the associated required files. Submission will be performed via MOLE.
5 Resources
Three resources are provided for this task:
• 1-best transcripts in NIST CTM file format (dev.ctm,eval.ctm). The CTM file format consists of multiple records
of the following form
<F> <H> <T> <D> <W> <C>
where <F> is an audio file name, <H> is a channel, <T> is a start time in seconds, <D> is a duration in seconds, <W> is a
word, <C> is a confidence score. Each record corresponds to one recognised word. Any blank lines or lines starting with
;; are ignored. An excerpt from a CTM file is shown below
7654 A 11.34 0.2 YES 0.5
7654 A 12.00 0.34 YOU 0.7
7654 A 13.30 0.5 CAN 0.1
• Reference transcript in NIST STM file format (dev.stm, eval.stm). The STM file format consists of multiple records
of the following form
<F> <H> <S> <T> <E> <L> <W>...<W>
where <S> is a speaker, <E> is an end time, <L> topic, <W>...<W> is a word sequence. Each record corresponds to
one manually transcribed segment of audio file. An excerpt from a STM file is shown below
2345 A 2345-a 0.10 2.03 <soap> uh huh yes i thought
2345 A 2345-b 2.10 3.04 <soap> dog walking is a very
2345 A 2345-a 3.50 4.59 <soap> yes but it’s worth it
Note that exact start and end times for each word are not available. Use uniform segmentation as an approximation. The
duration of speech in dev.stm and eval.stm is estimated to be 57474.2 and 25694.3 seconds.
• Keyword list keywords. Each keyword contains one or more words as shown below
請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp




















 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:EBU6304代寫、Java編程設計代做
  • 下一篇:COM4511代做、代寫Python設計編程
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • NBA直播 短信驗證碼平臺 幣安官網下載 歐冠直播 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    狠狠综合久久久久综合网址-a毛片网站-欧美啊v在线观看-中文字幕久久熟女人妻av免费-无码av一区二区三区不卡-亚洲综合av色婷婷五月蜜臀-夜夜操天天摸-a级在线免费观看-三上悠亚91-国产丰满乱子伦无码专区-视频一区中文字幕-黑人大战欲求不满人妻-精品亚洲国产成人蜜臀av-男人你懂得-97超碰人人爽-五月丁香六月综合缴情在线
  • <dl id="akume"></dl>
  • <noscript id="akume"><object id="akume"></object></noscript>
  • <nav id="akume"><dl id="akume"></dl></nav>
  • <rt id="akume"></rt>
    <dl id="akume"><acronym id="akume"></acronym></dl><dl id="akume"><xmp id="akume"></xmp></dl>
    日韩人妻一区二区三区蜜桃视频| 国产自产在线视频| 日本十八禁视频无遮挡| 欧美a级免费视频| 强开小嫩苞一区二区三区网站 | 春日野结衣av| 日本久久久网站| 日b视频免费观看| 久久亚洲精品无码va白人极品| 欧美h视频在线观看| 中文字幕第一页亚洲| 99re6这里有精品热视频| 国产xxxx振车| 日韩少妇内射免费播放| 久久无码高潮喷水| 尤蜜粉嫩av国产一区二区三区| 久久综合伊人77777麻豆最新章节| 国产熟人av一二三区| 日本在线播放一区二区| 国产激情片在线观看| 国产精品333| 日本激情视频在线播放| 色姑娘综合天天| 成人免费在线视频播放| 久久精品.com| 在线免费黄色小视频| 日韩美女爱爱视频| wwwwww.色| 国产 欧美 日本| 超碰在线97免费| 妺妺窝人体色www看人体| 久久久久狠狠高潮亚洲精品| 天堂av2020| 99久久国产综合精品五月天喷水| 粗暴91大变态调教| 一二三四中文字幕| 天天色综合天天色| 成人免费性视频| 青青草原播放器| 青青草原av在线播放| 中文字幕免费高| 五月天婷婷激情视频| 人妻少妇精品久久| 日本成人xxx| 久久综合久久色| 国产午夜福利100集发布| 国产精品jizz在线观看老狼| 国产l精品国产亚洲区久久| 久久99国产精品一区| 欧美国产日韩在线播放| 日本久久久网站| 热这里只有精品| 九九九九九国产| 日本新janpanese乱熟| 日韩日韩日韩日韩日韩| 超碰10000| 日韩精品一区二区三区电影| 日本77777| 日本精品一区在线| 色一情一区二区| 国产精品拍拍拍| 国产免费999| 日本肉体xxxx裸体xxx免费| 国产精品少妇在线视频| 蜜臀av午夜一区二区三区| 黄色一级视频在线播放| 欧美激情视频免费看| 全黄性性激高免费视频| 国产午夜福利在线播放| 男人添女人下部视频免费| 中文字幕在线中文| 丁香六月激情婷婷| 日本日本19xxxⅹhd乱影响| 污污的网站18| 亚洲欧美自偷自拍另类| 涩多多在线观看| 日本不卡一区二区三区四区| 美女av免费观看| 成人免费毛片在线观看| 毛片一区二区三区四区| 天天干天天干天天干天天干天天干| 日本男人操女人| www.com黄色片| 特级黄色录像片| 白白操在线视频| 欧美aⅴ在线观看| 视频二区在线播放| 400部精品国偷自产在线观看| 欧美日韩dvd| 久久久久人妻精品一区三寸| 色免费在线视频| 国产一区 在线播放| 亚洲熟妇av一区二区三区| 国产三级国产精品国产专区50| 久久精品亚洲天堂| 国产a级片网站| 日本中文字幕观看| 欧美黑人经典片免费观看| 在线观看日本一区二区| 97超碰国产精品| 一级黄色片国产| 大j8黑人w巨大888a片| 伊人色在线视频| 激情网站五月天| 日韩精品视频在线观看视频| 五月婷婷之综合激情| 国产精品视频二| 亚洲一区精品视频在线观看| 欧美精品久久久久久久自慰| 日日干日日操日日射| 国产l精品国产亚洲区久久| 久久观看最新视频| 污污的视频免费| 免费av网址在线| 久久在线中文字幕| 天堂在线一区二区三区| 成人性生生活性生交12| 日韩日韩日韩日韩日韩| 91精品国产毛片武则天| 天天影视色综合| 无限资源日本好片| 国产成人手机视频| 日韩av一二三四区| 精品少妇人妻av免费久久洗澡| www.偷拍.com| 992tv人人草| 天堂av8在线| mm131国产精品| 波多野结衣xxxx| 国产三级国产精品国产专区50| 毛葺葺老太做受视频| 精品视频一区二区在线| 日本wwww视频| 免费日韩视频在线观看| 少妇高清精品毛片在线视频| 国产精品一区二区免费在线观看| 国产肉体ⅹxxx137大胆| 超碰97在线看| 免费在线看黄色片| 久草视频国产在线| 国产美女网站在线观看| 2022亚洲天堂| 北条麻妃在线一区| 最新国产黄色网址| 中文字幕色网站| 黄色片免费在线观看视频| 精品成在人线av无码免费看| 99色这里只有精品| 欧美精品无码一区二区三区| 国产成人手机视频| 亚洲午夜精品一区| 成人在线激情网| 成年人网站av| 欧美大黑帍在线播放| 男女午夜激情视频| 污污视频网站在线| www.av91| 999精彩视频| 国产精品igao激情视频| 久久婷婷五月综合色国产香蕉| 天天色综合社区| 青青视频免费在线| 国产精品拍拍拍| 国产精品啪啪啪视频| 免费国产黄色网址| 伊人国产在线视频| av免费看网址| 毛片毛片毛片毛| 欧美久久久久久久久久久久久| 日日噜噜夜夜狠狠| 免费毛片网站在线观看| www.国产视频.com| 欧美a v在线播放| 国产又粗又大又爽的视频| 日韩在线xxx| av免费看网址| 老司机午夜网站| www.涩涩涩| 国产二区视频在线播放| www成人免费| 久久久久久久久久一区| 六月激情综合网| 成年人网站国产| 91免费国产精品| 超碰成人在线免费观看| 日韩高清第一页| 欧美成人黄色网址| 午夜精品久久久久久久无码| 国产 欧美 日韩 一区| 欧美日韩理论片| 国产精品嫩草影院8vv8 | 最新中文字幕免费视频| 精品久久一二三| 欧美精品一区二区三区三州| 久久精品在线免费视频| 麻豆md0077饥渴少妇| 永久av免费在线观看| 不卡中文字幕在线| 99热一区二区三区| 美女在线免费视频|