狠狠综合久久久久综合网址-a毛片网站-欧美啊v在线观看-中文字幕久久熟女人妻av免费-无码av一区二区三区不卡-亚洲综合av色婷婷五月蜜臀-夜夜操天天摸-a级在线免费观看-三上悠亚91-国产丰满乱子伦无码专区-视频一区中文字幕-黑人大战欲求不满人妻-精品亚洲国产成人蜜臀av-男人你懂得-97超碰人人爽-五月丁香六月综合缴情在线

代做IEMS 5730、代寫 c++,Java 程序設計

時間:2024-03-11  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • NBA直播 短信驗證碼平臺 幣安官網下載 歐冠直播 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    狠狠综合久久久久综合网址-a毛片网站-欧美啊v在线观看-中文字幕久久熟女人妻av免费-无码av一区二区三区不卡-亚洲综合av色婷婷五月蜜臀-夜夜操天天摸-a级在线免费观看-三上悠亚91-国产丰满乱子伦无码专区-视频一区中文字幕-黑人大战欲求不满人妻-精品亚洲国产成人蜜臀av-男人你懂得-97超碰人人爽-五月丁香六月综合缴情在线
  • <dl id="akume"></dl>
  • <noscript id="akume"><object id="akume"></object></noscript>
  • <nav id="akume"><dl id="akume"></dl></nav>
  • <rt id="akume"></rt>
    <dl id="akume"><acronym id="akume"></acronym></dl><dl id="akume"><xmp id="akume"></xmp></dl>
    九一精品在线观看| 日韩欧美一级在线| 国产91在线亚洲| 国产午夜福利100集发布| 污污视频网站免费观看| 加勒比成人在线| 一级黄色特级片| 日韩视频一二三| ww国产内射精品后入国产| 欧美少妇性生活视频| 两性午夜免费视频| av无码久久久久久不卡网站| 成人性免费视频| 欧美在线观看视频免费| 成年人网站av| 中文av一区二区三区| 国产三级日本三级在线播放| 国产资源在线视频| 国产精品无码一本二本三本色| 久热精品在线播放| 手机av在线免费| 在线免费看v片| 中文字幕精品在线播放| 亚洲人成无码网站久久99热国产| 成人一区二区免费视频| 激情婷婷综合网| 免费网站在线观看黄| 午夜影院免费版| 亚洲色成人www永久在线观看| 欧美黑人经典片免费观看| chinese少妇国语对白| 视频免费1区二区三区| www.69av| 色婷婷成人在线| 成人在线观看www| 看一级黄色录像| 97超碰人人爽| 天天操天天干天天做| 精品人妻大屁股白浆无码| 亚洲一级片av| 国产精品久久久久久9999| 亚洲午夜激情影院| 日韩精品一区二区三区色欲av| 日韩一二区视频| 成人性生生活性生交12| 国产aaa免费视频| 国产xxxx振车| 久久久久久www| 欧美少妇一区二区三区| 老司机av福利| 一区二区xxx| 亚洲爆乳无码专区| 欧美精品久久久久久久自慰 | 国产av人人夜夜澡人人爽| 永久免费看av| 99久久99久久精品| 自拍偷拍21p| 国产裸体免费无遮挡| 欧美高清中文字幕| 久久99国产精品一区| 久久久久久综合网| 18视频在线观看娇喘| 波多野结衣天堂| 天天影视综合色| 久久久久久久少妇| 日韩精品视频久久| 激情六月天婷婷| 久久久国内精品| 欧美日韩成人免费视频| 久久精品在线免费视频| 欧美日韩午夜爽爽| 美女av免费观看| 中国老女人av| 极品粉嫩美女露脸啪啪| 国产视频一区二区视频| wwwwww欧美| 奇米视频888| 免费激情视频在线观看| 中文字幕av久久| 中文字幕1234区| 日韩免费毛片视频| 亚洲精品高清无码视频| xx欧美撒尿嘘撒尿xx| 三级黄色片免费观看| 久久精品国产99久久99久久久| xxx国产在线观看| 久久精品国产露脸对白| 亚洲综合20p| 免费 成 人 黄 色| 国产精品视频中文字幕| www.xxx亚洲| 中国黄色片一级| 天堂av2020| 欧美激情视频免费看| 麻豆传传媒久久久爱| 性一交一乱一伧国产女士spa| 日韩欧美不卡在线| 国产特级黄色大片| 成人黄色片视频| 91麻豆天美传媒在线| 黑人巨大国产9丨视频| 五月丁香综合缴情六月小说| 久草视频这里只有精品| 妺妺窝人体色www在线观看| 日本高清免费在线视频| 亚洲第一综合网站| 久久最新免费视频| 最新天堂在线视频| 91国内在线播放| 亚洲国产精品女人| 日本三级免费网站| 亚洲男人天堂av在线| 欧美黄色免费网址| h无码动漫在线观看| 嫩草影院国产精品| 色婷婷狠狠18| 91制片厂免费观看| 欧美变态另类刺激| 日本肉体xxxx裸体xxx免费| 91视频成人免费| 欧美xxxxx在线视频| 伊人国产精品视频| 三区视频在线观看| 日本女人高潮视频| 岛国av在线免费| 熟女人妇 成熟妇女系列视频| 日本一区二区免费高清视频| 老司机午夜av| 第四色婷婷基地| 久久久久久久久久久久久国产精品 | 在线能看的av网站| 国产福利影院在线观看| 婷婷无套内射影院| 欧美aaa在线观看| 波多野结衣家庭教师视频| 国产高清www| 国产欧美在线一区| 日韩国产一级片| 北条麻妃亚洲一区| 日本一二三四区视频| 热久久最新网址| 国产在线xxxx| aaa毛片在线观看| 向日葵污视频在线观看| av7777777| 国产真人做爰毛片视频直播 | 亚洲免费看av| 日韩中文字幕免费在线| 免费观看精品视频| 伊人国产在线视频| 福利视频999| 成人黄色大片网站| 996这里只有精品| 91动漫在线看| 情侣黄网站免费看| 久久久精品高清| 粉嫩av一区二区三区天美传媒| 4444在线观看| 国产3p露脸普通话对白| 免费裸体美女网站| www.日日操| av动漫在线免费观看| a级免费在线观看| 免费在线激情视频| 99热一区二区| 欧美另类videos| 182午夜在线观看| 6080国产精品| 黄色动漫在线免费看| 看看黄色一级片| 男人亚洲天堂网| 天堂在线资源视频| 小说区视频区图片区| 免费在线观看日韩视频| 亚洲欧美日本一区二区三区| 欧美日韩久久婷婷| 免费毛片小视频| 91日韩精品视频| 国产精品又粗又长| 激情五月亚洲色图| 你懂的av在线| 精品人妻人人做人人爽| 男女av免费观看| 久色视频在线播放| 亚洲精品中文字幕乱码无线| 欧美两根一起进3p做受视频| 欧美一级免费在线观看| 久无码久无码av无码| 嫩草视频免费在线观看| 国产裸体舞一区二区三区| 国产乱码一区二区三区四区| 玩弄japan白嫩少妇hd| 日本福利视频一区| 中文字幕第50页| 久久精品一卡二卡| 国产一区二区在线观看免费视频| 国产免费成人在线| 国产精品视频二| 人妻熟女一二三区夜夜爱| 日韩视频在线视频|