︿
Top

2015年12月3日 星期四

CDH 5.4.7 安裝 (Part2)


承續前一篇, 本篇將繼續就 CDH 5.4.7 的安裝進行說明.

雖然最後看起來還是有一些狀況待排除, 但至少整個安裝過程是順利完成.

文章的內容有點長, 但可以由 目錄 看出整個大綱.

註1: 筆者係由網路搜尋相關文章 Study, 並進行實作, 故本文若有些疏漏, 尚請見諒.

註2: 筆者係在一台 i7 (16GB RAM) 的筆電上安裝的, 同一顆硬碟上安裝了 4 個 VM, 可能資源上不是很足夠, 而造成一些問題; 建議如果可能, 可以加大 RAM, 同時將 VM 分裝至不同硬碟.




目錄

內容




前言

前一篇終於將虛擬主機安裝完成, 首先還是回顧一下虛擬主機的配置規劃.


主機名稱
IP
記憶體
硬碟空間
nn.cdh
192.168.190.140
6GB
80GB
dn01.cdh
192.168.190.141
2GB
80GB
dn02.cdh
192.168.190.142
2GB
80GB
dn03.cdh
192.168.190.143
2GB
80GB


安裝前資料研習

雖然有 3 種安裝模式, 但以 Path A 最為方便, 可以連用 Cloudera Manager 一併安裝; 只是 Cloudera 的 Web Server 很忙, 很容易斷線, 所以即使採用 Path A, 也需要動一些手腳, 把相關的軟體套件都先下載到 NameNode (nn.cdh) , 放到 Apache Web Server 上, 以供離線安裝之用.


http://www.cloudera.com/content/cloudera/en/documentation/core/latest/images/cm_install_phases.jpg


下載 Cloudera Manager 安裝執行檔

各個歷史版本載點

最新版本檔案下載

5.4.7 版載點



下載Cloudera Manager 所需的其它 RPMS 安裝檔案

註: 因為常常會斷線, 不建議用 wget 的方式, 而是用 firefox 去作下載.
指令: wget -r -nd -l1 --no-parent -A ".rpm" http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.4.6/RPMS/x86_64/


下載 CDH 5.4.7 的 parcels檔案

下載下圖紅框裡的檔案即可




離線安裝

修正一些設定 (/etc/hosts) (root身份)

// 目的: 處理安裝過程中的某個錯誤
// waiting for newly installed agent to heartbeat
// Installation failed. Failed to receive heartbeat from agent.


// NameNode: 要留 127.0.0.1 的設定
$ sudo vi /etc/hosts
# <ip>  <FQDN>  <shortname>
192.168.190.140         nn.cdh       nn.cdh
192.168.190.141         dn01.cdh     dn01.cdh
192.168.190.142         dn02.cdh     dn02.cdh
192.168.190.143         dn03.cdh     dn03.cdh
#
127.0.0.1   localhost localhost.localdomain
::1         localhost localhost.localdomain


// DataNode: 不留 127.0.0.1 的設定
$ sudo vi /etc/hosts
# 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
# <ip> <FQDN> <shortname>
192.168.190.140         nn.cdh       nn.cdh
192.168.190.141         dn01.cdh     dn01.cdh
192.168.190.142         dn02.cdh     dn02.cdh
192.168.190.143         dn02.cdh     dn02.cdh


// Tips: 如何將 /etc/hosts 檔案同步到其它台? 假設目前在 dn01.cdh
$ sudo scp /etc/hosts dn02.cdh:/etc
$ sudo scp /etc/hosts dn03.cdh:/etc


上傳 ClouderaManager 安裝程式

[hadoop@nn ~]$ mkdir -p ~/01-install_folder/00-cm
[hadoop@nn ~]$ cd ~/01-install_folder/00-cm
  • 利用 winSCP 將先前下載的檔案上傳 (圖略)
//設定可執行屬性
[hadoop@nn ~]$ chmod u+x cloudera-manager-installer.bin


RPMS 本地化 (hadoop身份)

上傳 RPMS

[hadoop@nn ~]$ mkdir -p ~/01-install_folder/01-yum/5.4.7/RPMS
[hadoop@nn ~]$ ~/01-install_folder/01-yum/5.4.7/RPMS
  • 利用 winSCP 將先前下載的 RPMS 檔案上傳 (圖略)


yum repository 路徑

// 切到 yum repository 主資料夾
[hadoop@nn ~]$ cd /etc/yum.repos.d


// 看一下有那些檔案,
[hadoop@nn yum.repos.d]$ ls -l
total 16
-rw-r--r--. 1 root root 1926 Nov 27  2013 CentOS-Base.repo
-rw-r--r--. 1 root root  638 Nov 27  2013 CentOS-Debuginfo.repo
-rw-r--r--. 1 root root  630 Nov 27  2013 CentOS-Media.repo
-rw-r--r--. 1 root root 3664 Nov 27  2013 CentOS-Vault.repo


// 再找 CentOS-Base.repo 出來看一下 (前30列)
[hadoop@nn yum.repos.d]$ head -n 30  CentOS-Base.repo
# CentOS-Base.repo
…….


[base]
name=CentOS-$releasever - Base
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6


#released updates
[updates]
name=CentOS-$releasever - Updates
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
#baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6


// 列出目前有那些 yum repository
[hadoop@nn yum.repos.d]$ yum repolist
Loaded plugins: fastestmirror, refresh-packagekit, security
Loading mirror speeds from cached hostfile
* base: ftp.ksu.edu.tw
* extras: ftp.ksu.edu.tw
* updates: mirror.neu.edu.cn
repo id            repo name                                status
base              CentOS-6 - Base                            6,575
extras             CentOS-6 - Extras                             43
updates           CentOS-6 - Updates                           427
repolist: 7,045


create repo

// 安裝 createrepo 套件
[hadoop@nn ~]$ sudo yum install createrepo


// 建立 repodata
[hadoop@nn ~]$ mkdir -p 01-install_folder/01-yum/5.4.7/RPMS
[hadoop@nn ~]$ cd 01-install_folder/01-yum/5.4.7/RPMS
[hadoop@nn RPMS]$ createrepo .
Spawning worker 0 with 7 pkgs
Workers Finished
Gathering worker results


Saving Primary metadata
Saving file lists metadata
Saving other metadata
Generating sqlite DBs
Sqlite DBs complete


// 查一下 repodata 資料夾
[hadoop@nn RPMS]$ ls -l repodata
total 228
-rw-rw-r-- 1 hadoop hadoop   1059 Sep 27 15:58 01bacfa3719004ab32a80ed98da86f2929e4a0dc27eaf5795a0ff0c65afbf9a2-other.sqlite.bz2
-rw-rw-r-- 1 hadoop hadoop  10685 Sep 27 15:58 29c8360a7333b2ea99548825c816e8b9d9356ac108898726ff8acfe313556642-primary.sqlite.bz2
-rw-rw-r-- 1 hadoop hadoop    600 Sep 27 15:58 67147f98e6f617b6aefa727c64b8d11fd3e575c7d35d0b1d6d1844c039b0853f-other.xml.gz
-rw-rw-r-- 1 hadoop hadoop   3993 Sep 27 15:58 9e8564278bb0c20bb47eed128a66c86cb062fb2d41f90e7870ec98c0e03c0f7a-primary.xml.gz
-rw-rw-r-- 1 hadoop hadoop  96002 Sep 27 15:58 ab5d3dca63f87a3efeaf7d3a6464f20277169e57fae92c0a97f093f750843cc0-filelists.xml.gz
-rw-rw-r-- 1 hadoop hadoop 103192 Sep 27 15:58 af69947ed9180d37ff170b4f09ed0ff62dc1e0425b609e50b793b8deedec8beb-filelists.sqlite.bz2
-rw-rw-r-- 1 hadoop hadoop   2988 Sep 27 15:58 repomd.xml


將 rpm 發佈到 Apache Web Server

[hadoop@nn ~]$ cd /var/www/html
[hadoop@nn html]$ sudo mkdir -p RPMS
// [Move] OR [Copy]
// [Move]
[hadoop@nn ~]$ sudo mv ~/01-install_folder/01-yum/5.4.7/RPMS/* /var/www/html/RPMS
// [Copy]
[hadoop@nn ~]$ sudo cp -r ~/01-install_folder/01-yum/5.4.7/RPMS/* /var/www/html/RPMS


// 修改資料夾的權限, user / group / other 均可 read, execute
[hadoop@nn ~]$ sudo chmod -R ugo+rx /var/www/html/RPMS


// 以 firefox 試一下這個 Url
http://nn.cdh/RPMS/


建立 cdh5 的 repo 檔, 以指向前述 Web Server 路徑



[hadoop@nn ~]$ cd /etc/yum.repos.d
[hadoop@nn yum.repos.d]$ sudo vi cdh5.repo
[cdh5]
name=cdh5
baseurl=http://nn.cdh/RPMS
enabled=1
gpgcheck=0


[hadoop@nn yum.repos.d]$ ls -l
total 20
-rw-r--r--  1 root root   78 Sep 27 16:26 cdh5.repo
-rw-r--r--. 1 root root 1926 Nov 27  2013 CentOS-Base.repo
-rw-r--r--. 1 root root  638 Nov 27  2013 CentOS-Debuginfo.repo
-rw-r--r--. 1 root root  630 Nov 27  2013 CentOS-Media.repo
-rw-r--r--. 1 root root 3664 Nov 27  2013 CentOS-Vault.repo


將 cdh5.repo 檔案複製到每一台 !

[root@nn yum.repos.d]# [hadoop@nn ~]$ sudo scp /etc/yum.repos.d/cdh5.repo dn01.cdh:/etc/yum.repos.d
[root@nn yum.repos.d]# [hadoop@nn ~]$ sudo scp /etc/yum.repos.d/cdh5.repo dn02.cdh:/etc/yum.repos.d
[root@nn yum.repos.d]# [hadoop@nn ~]$ sudo scp /etc/yum.repos.d/cdh5.repo dn03.cdh:/etc/yum.repos.d


Parcels 本地化 (hadoop身份)

上傳 parcels

[hadoop@nn ~]$ mkdir -p ~/01-install_folder/02-parcels/5.4.7.3/
[hadoop@nn ~]$ cd ~/01-install_folder/02-parcels/5.4.7.3/
  • 利用 winSCP 將先前下載的檔案上傳 (圖略)


將 parcels 發佈到 Apache Web Server

[hadoop@nn ~]$ cd /var/www/html
[hadoop@nn html]$ sudo mkdir -p parcels/5.4.7.3
// [Move] or [Copy]
// [Move]
[hadoop@nn ~]$ sudo mv ~/01-install_folder/02-parcels/5.4.7.3/* /var/www/html/parcels/
// [Copy]
[hadoop@nn ~]$ sudo cp -r ~/01-install_folder/02-parcels/5.4.7.3/* /var/www/html/parcels/5.4.7.3


// 修改資料夾的權限, user / group / other 均可 read, execute
[hadoop@nn ~]$ sudo chmod -R ugo+rx /var/www/html/parcels
// 以 firefox 試一下這個 Url
http://nn.cdh/parcels/

可以開始安裝了

安裝 Oracle JDK (針對每一台)(範本機沒有作)

[hadoop@nn ~]$ cd ~/01-install_folder/04-java
[hadoop@nn 04-java]$ sudo yum install ./jdk-7u75-linux-x64.rpm


[root@nn ~]# vi /etc/profile
export JAVA_HOME=/usr/java/jdk1.7.0_75
export PATH=$JAVA_HOME/bin:$PATH


[root@nn ~]# scp /etc/profile dn01.cdh:/etc
[root@nn ~]# scp /etc/profile dn02.cdh:/etc
[root@nn ~]# scp /etc/profile dn03.cdh:/etc


安裝 Cloudera Manager (cloudera-manager-installer.bin)



CM (Cloudera Manager) 安裝必須透過package方式, 而CDH、Impala可以透過package、parcel方式安裝.


安裝程式預設從網際網路下載套裝程式安裝, 但若已進行上述 RMPS 及 Parcels 本地化的過程, 則只需要在安裝的時候添加參數 ( --skip_repo_package=1 ), Shell命令如下, 即可由本地化的套件作安裝.


$ cd ~/01-install_folder/00-cm
//設定可執行屬性
$ chmod u+x cloudera-manager-installer.bin
//執行
$ sudo ./cloudera-manager-installer.bin  --skip_repo_package=1




安裝cdh cluster (part 1) (登入及確定 Nodes)



安裝cdh cluster (part 2) (parcels 設定)









安裝cdh cluster (part 3) (安裝 RPMS)





**** 這裡可能會遇到的錯誤 ****


// 出現上述錯誤, 改一下 /etc/hosts
// waiting for newly installed agent to heartbeat
// Installation failed. Failed to receive heartbeat from agent.
//
// 終於過了一關


安裝cdh cluster (part 4) (安裝parcels)







安裝cdh cluster (part 5) (環境檢查)





Cluster Installation
Inspect hosts for correctness
Validations
Inspector ran on all 4 hosts.
The following failures were observed in checking hostnames...
No errors were found while looking for conflicting init scripts.
No errors were found while checking /etc/hosts.
All hosts resolved localhost to 127.0.0.1.
All hosts checked resolved each other's hostnames correctly and in a timely manner.
Host clocks are approximately in sync (within ten minutes).
Host time zones are consistent across the cluster.
No users or groups are missing.
No conflicts detected between packages and parcels.
No kernel versions that are known to be bad are running.
Cloudera recommends setting /proc/sys/vm/swappiness to at most 10. Current setting is 60. Use the sysctl command to change this setting at runtime and edit /etc/sysctl.conf for this setting to be saved after a reboot. You may continue with installation, but you may run into issues with Cloudera Manager reporting that your hosts are unhealthy because they are swapping. The following hosts are affected:
dn[01-03].cdh; nn.cdh
Transparent Huge Pages is enabled and can cause significant performance problems. Kernel with release 'CentOS release 6.5 (Final)' and version '2.6.32-431.el6.x86_64' has enabled set to '[always] madvise never' and defrag set to '[always] madvise never'. Run "echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag" to disable this, then add the same command to an init script such as /etc/rc.local so it will be set upon system reboot. Alternatively, upgrade to RHEL 6.5 or later, which does not have this bug. The following hosts are affected:
dn[01-03].cdh; nn.cdh
CDH 5 Hue Python version dependency is satisfied.
0 hosts are running CDH 4 and 3 hosts are running CDH5.
There are mismatched versions across the system, which will cause failures. See below for details on which hosts are running what versions of components.
All managed hosts have consistent versions of Java.
All checked Cloudera Management Daemons versions are consistent with the server.
All checked Cloudera Management Agents versions are consistent with the server.


swappiness 問題
[hadoop@nn vm]$ su root
Password:
[hadoop@nn vm]$ cd /proc/sys/vm
[root@nn vm]# echo 10 > swappiness
[root@nn vm]# cat swappiness
10
[root@nn vm]# sysctl -w vm.swappiness=10 //永久生效


Huge Pages 問題
[root@nn ~]# echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag


安裝cdh cluster (part 6) (安裝 service)









Hive
------
Database Host Name: nn.cdh:7432
Database Name : hive
Username: hive
Password: hive


Reports Manager
-------------------------
Database Host Name: nn.cdh:7432
Database Name : rman
Username: rman
Password: GPnbLSarc4 / sAltYqeopz
Navigator Audit Server
-------------------------
Database Host Name: nn.cdh:7432
Database Name : nav
Username: nav
Password: GzjVvEKsJI
Navigator Metadata Server
-------------------------
Database Host Name: nn.cdh:7432
Database Name : navms
Username: navms
Password: 7HGg7EbQ2e
Oozie Server
-------------------------
Database Host Name: nn.cdh:7432
Database Name : oozie_oozie_server
Username: oozie_oozie_server
Password: oozie


// 檢查一些設定




// 開始跑設定


// 終於告一段落, 好感動  …


// 雖然看來還有一些問題, 但至少安裝的過程結束了


C:\Users\JASPER~1\AppData\Local\Temp\enhtmlclip\Image.png


C:\Users\JASPER~1\AppData\Local\Temp\enhtmlclip\Image(1).png




參考文件












4 則留言:

  1. 不客氣, 希望能夠幫助到妳.

    回覆刪除
  2. As we know, Big data platform managed service is the future of the industries these days, this article helps me to figure out which language I need to learn to pursue the future in this field.

    回覆刪除
    回覆
    1. Thanks a lot for your information.
      Sorry, I'm working for ASP.NET MVC, Web API, SQL Server now, and not focused on CDH (Hadoop) for a long time...
      If I have time, I will study the reference url you provided.

      刪除
  3. Thanks a lot for your information.
    Sorry, I'm working for ASP.NET MVC, Web API, SQL Server now, and not focused on CDH (Hadoop) for a long time...
    If I have time, I will study the reference url you provided.

    回覆刪除