CM6整合Flink

CDH6由于官方没有提供Flink

1. 上传Flink包

上传Flink1.14.6的安装包和源码包、maven安装包到/opt/software, 需要指定scala版本为2.11(因为只做parcels工具不支持2.12)

[root@hadoop105 ~]# mkdir /opt/software /opt/module && cd /opt/software
## 上传完成后
[root@hadoop105 software]# ll
total 2602952
-rw-r--r-- 1 root root    9506321 Feb 15  2023 apache-maven-3.6.3-bin.tar.gz
-rw-r--r-- 1 root root  314015504 Feb 15  2023 flink-1.14.6-bin-scala_2.11.tgz
-rw-r--r-- 1 root root   30661804 Feb 15  2023 flink-1.14.6-src.tgz
-rw-r--r-- 1 root root 2311231436 Feb 15  2023 repository.zip

2. 安装maven

2.1 解压安装

[root@hadoop105 software]# tar -zxvf apache-maven-3.6.3-bin.tar.gz -C /opt/module

添加环境变量

[root@hadoop105 software]# vim /etc/profile.d/my_env.sh 
##添加以下变量 
#MAVEN_HOME 
export MAVEN_HOME=/opt/module/apache-maven-3.6.3
export PATH=$PATH:$MAVEN_HOME/bin 
## 退出保存后配置生效
[root@hadoop105 software]# source /etc/profile.d/my_env.sh 
## 测试安装结果

2.2 配置Maven镜像

[root@hadoop105 software]# cd /opt/module/apache-maven-3.6.3/conf/
[root@hadoop105 conf]# rm -rf settings.xml
[root@hadoop105 conf]# wget https://bigdata-1308108754.cos.ap-guangzhou.myqcloud.com/conf/maven-setting/settings.xml

3. 编译配置

3.1 解压Flink源码包

[root@hadoop105 software]# tar -zxvf flink-1.14.6-src.tgz -C /opt/module
[root@hadoop105 software]# mv /opt/module/flink-1.14.6/ /opt/module/flink-1.14.6-src
[root@hadoop105 software]# tar -zxvf flink-1.14.6-bin-scala_2.11.tgz -C /opt/module

3.2 修改pom.xml文件

修改flink-1.14.6-src中的pom.xml：

[root@hadoop101 flink-1.14.6-src]# vim /opt/module/flink-1.14.6-src/pom.xml
## 修改hadoop版本 96行
<hadoop.version>3.0.0-cdh6.3.2</hadoop.version> 
## 修改hive版本 149行 
<hive.version>2.1.1-cdh6.3.2</hive.version> 
## 156行
<hivemetastore.hadoop.version>3.0.0-cdh6.3.2</hivemetastore.hadoop.version> 
## </build>标签之后 </project>之前 添加
<repositories> 
<repository> 
<id>cloudera</id> 
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository> 
<repository> 
<id>confluent-repo</id>
<url>https://packages.confluent.io/maven/</url>
</repository> 
</repositories>

修改flink-sql-connector-hive-2.2.0的pom.xml文件

[root@hadoop105 flink-1.14.6-src]# vim flink-connectors/flink-sql-connector-hive-2.2.0/pom.xml
## 修改hive-exec版本 48行 
<version>2.1.1-cdh6.3.2</version>

3.3 解压maven仓库包

[root@hadoop105 software]# unzip repository.zip
[root@hadoop105 module]# vim apache-maven-3.6.3/conf/settings.xml
## 添加本地仓库地址
<localRepository>/opt/software/repository</localRepository>

4. 编译Flink

[root@hadoop105 flink-1.14.6-src]# mvn clean install -DskipTests -Dfast -Drat.skip=true -Dhaoop.version=3.0.0-cdh6.3.2 -Dinclude-hadoop -Dscala-2.12 -T10C

Alt text 编译后flink-dist/target/flink-1.14.6-bin/flink-1.14.6文件夹就是flink官方提供二进制包文件。
拷贝编译成功connector后的包放入Flink运行包的lib中

cp /opt/module/flink-1.14.6-src/flink-connectors/flink-sql-connector-hive-2.2.0/target/flink-sql-connector-hive-2.2.0_2.11-1.14.6.jar /opt/module/flink-1.14.6/lib/
cp /opt/cloudera/parcels/CDH/jars/hive-exec-2.1.1-cdh6.3.2.jar /opt/module/flink-1.14.6/lib/
cp /opt/cloudera/parcels/CDH/jars/libfb303-0.9.3.jar /opt/module/flink-1.14.6/lib/ 
cp /opt/cloudera/parcels/CDH/jars/hadoop-common-3.0.0-cdh6.3.2.jar /opt/module/flink-1.14.6/lib/ 
cp /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-common-3.0.0-cdh6.3.2.jar /opt/module/flink-1.14.6/lib/ 
cp /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-core-3.0.0-cdh6.3.2.jar /opt/module/flink-1.14.6/lib/ 
cp /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-hs-3.0.0-cdh6.3.2.jar /opt/module/flink-1.14.6/lib/ 
cp /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.3.2.jar /opt/module/flink-1.14.6/lib/

拷贝后/opt/module/flink-1.14.6/lib路径下应该包含以下包

[root@hadoop105 ~]# ll /opt/module/flink-1.14.6/lib
total 276272
-rw-r--r-- 1  502 games     85586 Sep 10  2022 flink-csv-1.14.6.jar
-rw-r--r-- 1  502 games 143701139 Sep 10  2022 flink-dist_2.11-1.14.6.jar
-rw-r--r-- 1  502 games    153148 Sep 10  2022 flink-json-1.14.6.jar
-rw-r--r-- 1  502 games   7709731 Jun  9  2022 flink-shaded-zookeeper-3.4.14.jar
-rw-r--r-- 1 root root   43784906 Mar  7 08:41 flink-sql-connector-hive-2.2.0_2.11-1.14.6.jar
-rw-r--r-- 1  502 games  42348280 Sep 10  2022 flink-table_2.11-1.14.6.jar
-rw-r--r-- 1 root root    3993717 Mar  7 08:36 hadoop-common-3.0.0-cdh6.3.2.jar
-rw-r--r-- 1 root root     770504 Mar  7 08:36 hadoop-mapreduce-client-common-3.0.0-cdh6.3.2.jar
-rw-r--r-- 1 root root    1644597 Mar  7 08:36 hadoop-mapreduce-client-core-3.0.0-cdh6.3.2.jar
-rw-r--r-- 1 root root     182762 Mar  7 08:36 hadoop-mapreduce-client-hs-3.0.0-cdh6.3.2.jar
-rw-r--r-- 1 root root      51302 Mar  7 08:36 hadoop-mapreduce-client-jobclient-3.0.0-cdh6.3.2.jar
-rw-r--r-- 1 root root   35803898 Mar  7 08:40 hive-exec-2.1.1-cdh6.3.2.jar
-rw-r--r-- 1 root root     313702 Mar  7 08:36 libfb303-0.9.3.jar
-rw-r--r-- 1  502 games    208006 Jun  9  2022 log4j-1.2-api-2.17.1.jar
-rw-r--r-- 1  502 games    301872 Jun  9  2022 log4j-api-2.17.1.jar
-rw-r--r-- 1  502 games   1790452 Jun  9  2022 log4j-core-2.17.1.jar
-rw-r--r-- 1  502 games     24279 Jun  9  2022 log4j-slf4j-impl-2.17.1.jar

5. 制作parcel包和csd文件

将Flink完整包进行压缩:

[root@hadoop105 ~]# cd /opt/module
[root@hadoop105 module]# tar -zcvf flink-1.14.6-cdh6.3.2.tgz flink-1.14.6

安装git, 参考CICD中安装git
下载工具包源码

[root@hadoop105 module]# git clone https://github.com/YUjichang/flink-parcel.git

(可更换成加速地址https://gitclone.com/github.com/YUjichang/flink-parcel.git) 修改脚本配置

[root@hadoop101 module]# cd flink-parcel/
[root@hadoop101 flink-parcel]# vim flink-parcel.properties
#FLINk 下载地址
FLINK_URL= /opt/module/flink-1.14.6-cdh6.3.2.tgz

#flink版本号
FLINK_VERSION=1.14.6

#扩展版本号
EXTENS_VERSION=CDH6.3.2

#操作系统版本，以centos为例
OS_VERSION=7

#CDH 小版本
CDH_MIN_FULL=6.0
CDH_MAX_FULL=6.4

#CDH大版本
CDH_MIN=5
CDH_MAX=6

运行build.sh脚本，开始构建parcel和csd

[root@hadoop101 flink-parcel]# ./build.sh parcel 
[root@hadoop101 flink-parcel]# ./build.sh csd

编译完成后，生成的Flink的parcel和csd文件

[root@hadoop105 flink-parcel]# ll
total 434184
drwxr-xr-x 5 root root      4096 Mar  7 08:43 aux
drwxr-xr-x 2 root root      4096 Mar  7 08:43 bin
-rwxr-xr-x 1 root root      4028 Mar  7 08:43 build.sh
drwxr-xr-x 5 root root      4096 Mar  7 08:43 cm_ext
drwxr-xr-x 2 root root      4096 Mar  7 08:43 descriptor
drwxr-xr-x 3 root root      4096 Mar  7 08:43 etc
drwxr-xr-x 6 root root      4096 Mar  7 08:45 FLINK-1.14.6-CDH6.3.2
drwxr-xr-x 2 root root      4096 Mar  7 08:46 FLINK-1.14.6-CDH6.3.2_build
-rw-r--r-- 1 root root 444518251 Mar  7 08:45 flink-1.14.6-cdh6.3.2.tgz
drwxr-xr-x 6 root root      4096 Mar  7 08:46 flink_csd_build
-rw-r--r-- 1 root root     21123 Mar  7 08:46 FLINK_ON_YARN-1.14.6.jar
-rwxr-xr-x 1 root root       286 Mar  7 08:45 flink-parcel.properties
drwxr-xr-x 2 root root      4096 Mar  7 08:43 images
drwxr-xr-x 2 root root      4096 Mar  7 08:43 meta
-rw-r--r-- 1 root root       371 Mar  7 08:43 README.md
drwxr-xr-x 2 root root      4096 Mar  7 08:43 scripts

发现多了FLINK-1.14.6-CDH6.3.2_build文件夹和FLINK_ON_YARN-1.14.6.jar文件，并且在FLINK-1.14.6-CDH6.3.2_build文件夹中parcel已经生成。

[root@hadoop105 flink-parcel]# cd  FLINK-1.14.6-CDH6.3.2_build
[root@hadoop105 FLINK-1.14.6-CDH6.3.2_build]# ll
total 438596
-rw-r--r-- 1 root root 449108689 Mar  7 08:46 FLINK-1.14.6-CDH6.3.2-el7.parcel
-rw-r--r-- 1 root root        40 Mar  7 08:46 FLINK-1.14.6-CDH6.3.2-el7.parcel.sha
-rw-r--r-- 1 root root       840 Mar  7 08:46 manifest.json

6. 向CM中添加Flink的安装包

将parcel包到cloudera的parcel-repo中

[root@hadoop105 FLINK-1.14.6-CDH6.3.2_build]# scp FLINK-1.14.6-CDH6.3.2-el7.parcel* root@hadoop101:/opt/cloudera/parcel-repo/
[root@hadoop105 FLINK-1.14.6-CDH6.3.2_build]# cd ../

在cloudera的parcel-repo中的manifest.json文件添加Flink.parcel相关得加载依赖配置

## 回到hadoop101上执行
[root@hadoop101 parcel-repo]# vim manifest.json
## 添加hadoop105中FLINK-1.14.6-CDH6.3.2_build目录下的manifest.json内容到hadoop101中

Alt text 3. 拷贝FLINK_ON_YARN-1.14.6.jar到hadoop101的cloudera的csd中

[root@hadoop105 flink-parcel]# scp FLINK_ON_YARN-1.14.6.jar root@hadoop101:/opt/cloudera/csd/

重启hadoop101上的CM

[root@hadoop101 parcel-repo]# systemctl restart cloudera-scm-server

7. 在CM上添加Flink服务

重启后登录CM,点击群集->Parcel：
Alt text 可以看到Parcels列表中多了FLink：点击右边的分配，分配后点击激活：
激活后，Flink服务就处于可用状态：
回到首页，点击添加服务：
可以看到Flink服务已经有了：点击继续，选择没有hive的依赖项：
我们把flink安装在hadoop105上，点击继续：
Alt text 使用默认配置，点击继续：
进入flink安装启动界面：
启动完成后点击完成：
回到首页，发现flink并没有受到CDH的管控：
点击重启CM，进入配置确认页面：
点击重启CM Server按钮：

重启完成后，点击完成：
可以看到这时，Flink服务就被监控了： Alt text

8. 验证Flink功能

8.1 使用yarn-per-job模式跑wordcount测试

[root@hadoop101 flink-1.14.6-cdh6.3.2]# chmod 777 /opt/cloudera/parcels/FLINK/bin/flink
[root@hadoop101 flink-1.14.6-cdh6.3.2]# sudo -u hdfs  /opt/cloudera/parcels/FLINK/bin/flink run -t yarn-per-job /opt/cloudera/parcels/FLINK/lib/flink/examples/batch/WordCount.jar

在CM页面找到YARN,点击进入YARN组件页面，选择WEBUI菜单：
Alt text 点击进入ResouceManager页面：
可以看到当前运行成功了Flink作业。

8.2 验证Hive_FlinkSQL

在hdfs上面上传hive-site.xml文件

## 需要hdfs用户具有管理权限操作
[root@hadoop101 ~]# vim /etc/passwd
## 将hdfs用户设置为可以登录： /sbin/nologin改为/bin/bash
hdfs:x:992:989:Hadoop HDFS:/var/lib/hadoop-hdfs:/bin/bash
[root@hadoop101 ~]# su - hdfs
[hdfs@hadoop101 ~]$ hadoop fs -mkdir /hiveconf/
[hdfs@hadoop101 ~]$ hadoop fs -put /etc/hive/conf.cloudera.hive/hive-site.xml /hiveconf/

在hue中创建表t_user:
Alt text 继续使用hdfs用户执行flink-sql-client命令行：

[root@hadoop101 ~]$ /opt/cloudera/parcels/FLINK/bin/flink-sql-client -s yarn-session

Alt text 在交互命令窗口中进行操作：

Flink SQL> CREATE CATALOG myhive WITH (
>     'type' = 'hive',
>     'hive-version'='2.1.1',
>     'default-database' = 'default',
>   'hive-conf-dir' = 'hdfs://hadoop101:8020/hiveconf/'
> );
[INFO] Execute statement succeed.
Flink SQL> show catalogs;
+-----------------+
|    catalog name |
+-----------------+
| default_catalog |
|          myhive |
+-----------------+
2 rows in set

Flink SQL> USE CATALOG myhive;
[INFO] Execute statement succeed.

Flink SQL> show tables;
+------------+
| table name |
+------------+
|     t_user |
+------------+
1 row in set

Flink SQL> insert into t_user values(1001, 'jack');
[INFO] Submitting SQL update statement to the cluster...
2025-03-07 11:47:43,694 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/etc/flink/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2025-03-07 11:47:43,835 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2025-03-07 11:47:43,892 INFO  org.apache.hadoop.conf.Configuration                         [] - resource-types.xml not found
2025-03-07 11:47:43,892 INFO  org.apache.hadoop.yarn.util.resource.ResourceUtils           [] - Unable to find 'resource-types.xml'.
2025-03-07 11:47:43,921 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=2048, taskManagerMemoryMB=2048, slotsPerTaskManager=1}
2025-03-07 11:47:45,879 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1741262007380_0010
2025-03-07 11:47:46,109 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1741262007380_0010
2025-03-07 11:47:46,109 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2025-03-07 11:47:46,110 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2025-03-07 11:47:51,641 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2025-03-07 11:47:51,642 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface hadoop103:8080 of application 'application_1741262007380_0010'.
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: 77f07cc61645af7f38ffddc8419021f1


Flink SQL> SET sql-client.execution.result-mode=tableau;
[INFO] Session property has been set.

Flink SQL> select * from t_user;
2025-03-07 11:49:18,997 INFO  org.apache.hadoop.mapred.FileInputFormat                     [] - Total input files to process : 1
2025-03-07 11:49:19,146 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/etc/flink/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2025-03-07 11:49:19,162 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2025-03-07 11:49:19,167 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster specification: ClusterSpecification{masterMemoryMB=2048, taskManagerMemoryMB=2048, slotsPerTaskManager=1}
2025-03-07 11:49:20,848 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Submitting application master application_1741262007380_0011
2025-03-07 11:49:21,054 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        [] - Submitted application application_1741262007380_0011
2025-03-07 11:49:21,054 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Waiting for the cluster to be allocated
2025-03-07 11:49:21,055 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Deploying cluster, current state ACCEPTED
2025-03-07 11:49:26,079 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2025-03-07 11:49:26,079 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface hadoop103:8081 of application 'application_1741262007380_0011'.
+----+----------------------+--------------------------------+
| op |                   id |                           name |
+----+----------------------+--------------------------------+
| +I |                 1001 |                           jack |
+----+----------------------+--------------------------------+
Received a total of 1 row

访问http://hadoop101:8088，可以看到刚才执行了一个flink任务：
Alt text

CM6整合Flink ​

1. 上传Flink包 ​

2. 安装maven ​

2.1 解压安装 ​

2.2 配置Maven镜像 ​

3. 编译配置 ​

3.1 解压Flink源码包 ​

3.2 修改pom.xml文件 ​

3.3 解压maven仓库包 ​

4. 编译Flink ​

5. 制作parcel包和csd文件 ​

6. 向CM中添加Flink的安装包 ​

7. 在CM上添加Flink服务 ​

8. 验证Flink功能 ​

8.1 使用yarn-per-job模式跑wordcount测试 ​

8.2 验证Hive_FlinkSQL ​