狠狠综合久久久久综合网址-a毛片网站-欧美啊v在线观看-中文字幕久久熟女人妻av免费-无码av一区二区三区不卡-亚洲综合av色婷婷五月蜜臀-夜夜操天天摸-a级在线免费观看-三上悠亚91-国产丰满乱子伦无码专区-视频一区中文字幕-黑人大战欲求不满人妻-精品亚洲国产成人蜜臀av-男人你懂得-97超碰人人爽-五月丁香六月综合缴情在线

IEMS 5730代做、c++,Java語言編程代寫

時間:2024-03-12  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the
submitted homework.
I declare that the assignment submitted on Elearning system is original
except for source material explicitly acknowledged, and that the same or
related material has not been previously submitted for another course. I
also acknowledge that I am aware of University policy and regulations on
honesty in academic work, and of the disciplinary guidelines and
procedures applicable to breaches of such policy and regulations, as
contained in the website
http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________
Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must
be created COMPLETELY by oneself ALONE. A student may not share ANY written work or
pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has
discussed or worked with. If the answer includes content from any other source, the
student MUST STATE THE SOURCE. Failure to do so is cheating and will result in
sanctions. Copying answers from someone else is cheating even if one lists their name(s) on
the homework.
If there is information you need to solve a problem, but the information is not stated in the
problem, try to find the data somewhere. If you cannot find it, state what data you need,
make a reasonable estimate of its value, and justify any assumptions you make. You will be
graded not only on whether your answer is correct, but also on whether you have done an
intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.
Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of
Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in
books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference
[1] and [2] to download the two datasets. Each line in these two files has the following format
(TAB separated):
bigram year match_count volume_count
An example for 1-grams would be:
circumvallate 1978 335 91
circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall,
from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop
cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over
the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7]
to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per
year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared.
Assume the data set contains all the 1-grams in the last 100 years, and the above
records are the only records for the word ‘circumvallate’. Then the average value is:
(335 + 261) / 2 = 298,
instead of
(335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences
per year along with their corresponding average values sorted in descending order. If
multiple bigrams have the same average value, write down anyone you like (that is,
break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform
this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance
between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your
Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive
2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with
the same datasets stored in the HDFS. Rerun the Pig script in this cluster and
compare the performance between Pig and Hive in terms of overall run-time and
explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small
subset of the data instead of the whole data set. Once your Hive commands/ scripts
work as desired, you can then run them up on the complete data set.
Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in
the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is
aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this
homework, you will implement a similar-users-detection algorithm for the online movie rating
system. Basically, users who rate similar scores for the same movies may have common
tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this
homework, the similarity between a given pair of users (e.g. A and B) is measured as the
total number of movies both A and B have watched divided by the total number of
movies watched by either A or B. The following is the formal definition of similarity: Let
M(A) be the set of all the movies user A has watched. Then the similarity between user A
and user B is defined as:
………..(**) 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) =
|𝑀(𝐴)∩𝑀(𝐵)|
|𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented
by its unique userID and each movie is represented by its unique movieID. The format of the
data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google
Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of
movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the
list of the 10 pairs of users having the largest number of movies watched by
both users in the pair within the corresponding dataset. The format of your
answer should be as follows:
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:COMP 315代寫、Java程序語言代做
  • 下一篇:代做CSCI 2525、c/c++,Java程序語言代寫
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • NBA直播 短信驗證碼平臺 幣安官網下載 歐冠直播 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    狠狠综合久久久久综合网址-a毛片网站-欧美啊v在线观看-中文字幕久久熟女人妻av免费-无码av一区二区三区不卡-亚洲综合av色婷婷五月蜜臀-夜夜操天天摸-a级在线免费观看-三上悠亚91-国产丰满乱子伦无码专区-视频一区中文字幕-黑人大战欲求不满人妻-精品亚洲国产成人蜜臀av-男人你懂得-97超碰人人爽-五月丁香六月综合缴情在线
  • <dl id="akume"></dl>
  • <noscript id="akume"><object id="akume"></object></noscript>
  • <nav id="akume"><dl id="akume"></dl></nav>
  • <rt id="akume"></rt>
    <dl id="akume"><acronym id="akume"></acronym></dl><dl id="akume"><xmp id="akume"></xmp></dl>
    亚洲欧美日韩三级| www.亚洲天堂网| www.18av.com| 日本77777| 亚洲熟妇无码一区二区三区| 男人操女人免费软件| 日本老太婆做爰视频| 亚洲美免无码中文字幕在线 | 日本精品免费在线观看| 色婷婷狠狠18| 亚洲va在线va天堂va偷拍| 一区二区三区国产好的精华液| 日本wwww视频| 国产一二三四在线视频| 三级a三级三级三级a十八发禁止| 污污视频在线免费| 国产aaaaa毛片| 免费av网址在线| 日本大片免费看| 天天做天天爱天天高潮| 人妻夜夜添夜夜无码av| 国产美女主播在线| 无遮挡又爽又刺激的视频 | 九色porny自拍| 久久久久狠狠高潮亚洲精品| 漂亮人妻被中出中文字幕| 999精品网站| 欧美精品99久久| 僵尸世界大战2 在线播放| 男人添女人下部视频免费| 欧美一级特黄a| 欧美成人免费高清视频| 男女污污的视频| 乱人伦xxxx国语对白| 伊人色在线观看| 免费观看国产视频在线| 亚洲欧美日本一区二区三区| 成年人免费在线播放| 国产 日韩 亚洲 欧美| 无码少妇一区二区三区芒果| 91日韩精品视频| www国产黄色| 国产真实老熟女无套内射| 18禁男女爽爽爽午夜网站免费| 日韩最新中文字幕| 午夜av中文字幕| 成人小视频在线观看免费| 91香蕉视频网址| 五月婷婷六月合| 999久久久精品视频| 强开小嫩苞一区二区三区网站| 国产一级不卡毛片| 99精品视频网站| 日韩 欧美 视频| 成人免费观看cn| www日韩视频| 制服丝袜中文字幕第一页| 久久网站免费视频| 奇米精品一区二区三区| 国产精品欧美激情在线观看| 91网址在线观看精品| 欧美日韩在线一| 激情黄色小视频| 欧美成人精品免费| av网站大全免费| 日韩精品一区二区三区电影| 天天天干夜夜夜操| 少妇一级淫免费放| 国产日韩欧美精品在线观看| 噜噜噜久久亚洲精品国产品麻豆| 自拍偷拍一区二区三区四区| av网站在线不卡| 日本中文字幕在线视频观看 | 青青草综合在线| 欧美成人免费高清视频| 亚洲制服中文字幕| 亚洲理论电影在线观看| 国产精品av免费观看| 国产精品久久国产| 777久久精品一区二区三区无码| 亚洲天堂av线| 超碰影院在线观看| 可以在线看的黄色网址| 亚洲综合欧美激情| 日本美女视频一区| 黄色免费高清视频| 中文字幕永久有效| 好色先生视频污| 大陆极品少妇内射aaaaaa| 男人天堂av片| 中文字幕第88页| www.国产福利| 欧美日韩一区二区三区电影| 国产精品一区在线免费观看| 毛片毛片毛片毛片毛| 三上悠亚免费在线观看| 国产三区在线视频| 欧美三级午夜理伦三级富婆| 国产精品一二三在线观看| 最新中文字幕免费视频| 亚洲欧美日韩综合网| 国产成人精品视频ⅴa片软件竹菊| 久久久久久久久久毛片| 国产又粗又大又爽的视频| 免费国产黄色网址| 免费av不卡在线| 欧美 日韩 亚洲 一区| 国产原创精品在线| 啊啊啊一区二区| 日本三级免费观看| 国产色视频在线播放| www.av片| 亚洲国产精品影视| 国产成人久久777777| 蜜臀av.com| 亚洲一区在线不卡| 黄黄视频在线观看| 亚洲综合激情五月| 在线观看免费视频污| 黄色手机在线视频| 中文字幕第三区| 日韩a级黄色片| 免费在线观看的av网站| www,av在线| 97中文字幕在线| 在线视频一二区| 五月天丁香花婷婷| 男人c女人视频| www.在线观看av| 国产 福利 在线| 日本黄网站色大片免费观看| 杨幂毛片午夜性生毛片| 六月婷婷激情网| 妞干网在线视频观看| 女人扒开屁股爽桶30分钟| 日韩不卡一二区| 国产精品裸体瑜伽视频| 久久久国产精华液999999| 欧美日韩成人免费视频| 国产又粗又爽又黄的视频| 亚洲人精品午夜射精日韩| 日本人视频jizz页码69| av片中文字幕| 免费无码av片在线观看| 特黄视频免费观看| 拔插拔插海外华人免费| 中文字幕 91| 国产日韩视频在线播放| 人人干人人视频| 麻豆一区二区三区视频| 三级视频中文字幕| 999一区二区三区| 久久人人爽av| 久章草在线视频| 青青草综合视频| 大胆欧美熟妇xx| 久久婷婷综合色| 国产精品天天av精麻传媒| 国产一级片黄色| 亚洲人成无码www久久久| 天天干天天综合| 欧美视频在线播放一区| 福利视频免费在线观看| 中文字幕亚洲影院| 日韩精品在线观看av| 久久无码高潮喷水| 屁屁影院ccyy国产第一页| 在线观看免费视频污| 18岁视频在线观看| 三级性生活视频| 国产日韩一区二区在线| 懂色av一区二区三区四区五区| 欧美亚洲日本在线观看| 亚洲欧美久久久久| 亚洲女人在线观看| 污污污污污污www网站免费| 男女啪啪网站视频| 男人的天堂视频在线| 国产精品无码av无码| 国产911在线观看| 日本丰满少妇xxxx| 久久九九国产视频| 国产97在线 | 亚洲| 欧美老熟妇喷水| 日韩一级免费片| 日韩成人精品视频在线观看| 国产手机免费视频| 婷婷免费在线观看| 日韩av手机版| 国产精品沙发午睡系列| 成人免费播放器| 99久久国产综合精品五月天喷水| 潘金莲一级淫片aaaaaa播放1| 三级在线免费观看| 国产不卡的av| 日本特级黄色大片| 中文字幕在线导航| 欧美激情视频免费看| 激情文学亚洲色图| 欧美日韩国产精品激情在线播放|