大家好,我是考100分的小小码 ,祝大家学习进步,加薪顺利呀。今天说一说ETL数据导入导出工具HData使用[通俗易懂],希望您对编程的造诣更进一步.
./bin/hdata --reader READER_NAME -Rk1=v1 -Rk2=v2 --writer WRITER_NAME -Wk1=v1 -Wk2=v2
代码100分
代码100分
./bin/hdata --reader jdbc -Rurl="jdbc:mysql://127.0.0.1:3306/testdb" -Rdriver="com.mysql.jdbc.Driver" -Rtable="testtable" -Rusername="username" -Rpassword="password" -Rparallelism=3 --writer hive -Wmetastore.uris="thrift://127.0.0.1:9083" -Whdfs.conf.path="/path/to/hdfs-site.xml" -Wdatabase="default" -Wtable="testtable" -Whadoop.user="hadoop" -Wparallelism=2
job.xml
<?xml version="1.0" encoding="UTF-8"?>
<job id="job_example">
<reader name="jdbc">
<url>jdbc:mysql://127.0.0.1:3306/testdb</url>
<driver>com.mysql.jdbc.Driver</driver>
<table>testtable</table>
<username>username</username>
<password>password</password>
<parallelism>3</parallelism>
</reader>
<writer name="hive">
<metastore.uris>thrift://127.0.0.1:9083</metastore.uris>
<hdfs.conf.path>/path/to/hdfs-site.xml</hdfs.conf.path>
<database>default</database>
<table>testtable</table>
<hadoop.user>hadoop</hadoop.user>
<parallelism>2</parallelism>
</writer>
</job>
代码100分
./bin/hdata -f /path/to/job.xml
<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<localRepository>D:/apache-maven-3.5.4/repository</localRepository>
<pluginGroups></pluginGroups>
<proxies></proxies>
<servers></servers>
<mirrors>
<!--
<mirror>
<id>alimaven</id>
<mirrorOf>*</mirrorOf>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
<mirror>
<id>aliyunmaven</id>
<mirrorOf>*</mirrorOf>
<name>阿里云spring插件仓库</name>
<url>http://maven.aliyun.com/repository/spring-plugin</url>
</mirror>
-->
<mirror>
<mirrorOf>*</mirrorOf>
<name>mirror-all</name>
<url>http://mirrors.cloud.tencent.com/nexus/repository/maven-public/</url>
<id>custom</id>
</mirror>
<!--
<mirror>
<id>repo2</id>
<name>Mirror from Maven Repo2</name>
<url>http://repo.spring.io/plugins-release/</url>
<mirrorOf>central</mirrorOf>
</mirror>
-->
</mirrors>
<profiles>
<profile>
<profile>
<id>jdk-1.7</id>
<activation>
<activeByDefault>true</activeByDefault>
<jdk>1.7</jdk>
</activation>
<properties>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
<maven.compiler.compilerVersion>1.7</maven.compiler.compilerVersion>
</properties>
</profile>
<profile>
<id>maven-home</id>
<repositories>
<repository>
<id>central</id>
<url>https://repo1.maven.org/maven2</url>
<releases>
<enabled>true</enabled>
<checksumPolicy>warn</checksumPolicy>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>central</id>
<url>http://repo1.maven.org/maven2</url>
</pluginRepository>
</pluginRepositories>
</profile>
</profiles>
</settings>
mvn clean package -Pmake-package --settings D:apache-maven-3.5.4confhdata-settings.xml -Dmaven.test.skip=true
....
....
.....
[INFO] Reading assembly descriptor: src/build/package.xml
[INFO] Building tar: D:Workspacesidea_2HData-masterassembly..uildhdata-0.2.8.tar.gz
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] HData 0.2.8 ........................................ SUCCESS [ 18.604 s]
[INFO] hdata-api .......................................... SUCCESS [ 14.069 s]
[INFO] hdata-core ......................................... SUCCESS [ 4.442 s]
[INFO] hdata-console ...................................... SUCCESS [ 0.234 s]
[INFO] hdata-csv .......................................... SUCCESS [ 0.632 s]
[INFO] hdata-jdbc ......................................... SUCCESS [ 1.343 s]
[INFO] hdata-ftp .......................................... SUCCESS [ 0.753 s]
[INFO] hdata-http ......................................... SUCCESS [ 0.234 s]
[INFO] hdata-kafka ........................................ SUCCESS [ 6.452 s]
[INFO] hdata-hdfs ......................................... SUCCESS [ 20.008 s]
[INFO] hdata-hive ......................................... SUCCESS [ 30.733 s]
[INFO] hdata-hbase ........................................ SUCCESS [ 35.710 s]
[INFO] hdata-mongodb ...................................... SUCCESS [ 3.634 s]
[INFO] hdata-excel ........................................ SUCCESS [ 14.700 s]
[INFO] hdata-wit .......................................... SUCCESS [ 4.037 s]
[INFO] hdata-assembly 0.2.8 ............................... SUCCESS [ 17.239 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:53 min
[INFO] Finished at: 2020-06-16T12:12:33+08:00
[INFO] ------------------------------------------------------------------------
D:Workspacesidea_2HData-master>dir
...
2020/06/16 12:12 <DIR> .
2020/06/16 12:12 <DIR> ..
2018/01/11 19:22 14 .gitignore
2020/06/16 15:45 <DIR> .idea
2020/06/16 12:12 <DIR> assembly
2020/06/15 13:49 <DIR> bin
2020/06/16 14:06 <DIR> build
2020/06/15 13:49 <DIR> conf
2020/06/15 13:49 <DIR> doc
2020/06/16 12:10 <DIR> hdata-api
2020/06/16 12:10 <DIR> hdata-console
2020/06/16 12:10 <DIR> hdata-core
2020/06/16 12:10 <DIR> hdata-csv
2020/06/16 12:12 <DIR> hdata-excel
2020/06/16 12:10 <DIR> hdata-ftp
2020/06/16 12:11 <DIR> hdata-hbase
2020/06/16 12:10 <DIR> hdata-hdfs
2020/06/16 12:11 <DIR> hdata-hive
2020/06/16 12:10 <DIR> hdata-http
2020/06/16 12:10 <DIR> hdata-jdbc
2020/06/16 12:10 <DIR> hdata-kafka
2020/06/16 12:11 <DIR> hdata-mongodb
2020/06/16 12:12 <DIR> hdata-wit
2020/06/15 13:52 574 hdata.iml
2020/06/16 11:44 5,337 pom.xml
2018/01/11 19:22 13,774 README.md
2020/06/16 12:10 <DIR> target
...
hdata-0.2.8.tar.gz压缩文件,即打包后的可执行程序包;解压该包,进入到根目录即可执行命令;
hdata-0.2.8.tar.gz压缩文件解压,复制到D:/test/目录下。
D: esthdata-0.2.8>dir
....
2020/06/16 16:06 <DIR> .
2020/06/16 16:06 <DIR> ..
2020/06/16 14:06 <DIR> bin
2020/06/16 16:17 <DIR> conf
2020/06/16 14:06 <DIR> lib
2020/06/16 14:06 <DIR> plugins
...
- 【执行mysql到mysql的数据库同步】
java -Xss256k -Xms1G -Xmx1G -Xmn512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:SoftRefLRUPolicyMSPerMB=0 -Dhdata.conf.dir="D: esthdata-0.2.8conf" -Dlog4j.configurationFile=file:///D: esthdata-0.2.8conflog4j2.xml -classpath ".;D: esthdata-0.2.8lib*" "com.github.stuxuhai.hdata.CliDriver" -Dhttps.protocols=TLSv1.2 -Dfile.encoding="UTF-8" --reader jdbc -Rurl="jdbc:mysql://127.0.0.1:3306/gateway?characterEncoding=utf8&useSSL=false" -Rdriver="com.mysql.jdbc.Driver" -Rtable="client" -Rusername="root" -Rpassword="123456" --writer jdbc -Wurl="jdbc:mysql://192.168.1.35:3306/gateway?characterEncoding=utf8&useSSL=false" -Wdriver="com.mysql.jdbc.Driver" -Wtable="client" -Wusername="root" -Wpassword="root_it_123465"
hdata.bat -f D:/test/hdata-0.2.8/conf/mysqlToMysql.xml
<?xml version="1.0" encoding="UTF-8"?>
<job id="job_example">
<reader name="jdbc">
<url><![CDATA[jdbc:mysql://127.0.0.1:3306/gateway?characterEncoding=utf8&useSSL=false]]></url>
<driver>com.mysql.jdbc.Driver</driver>
<table>client</table>
<username>root</username>
<password>123456</password>
<parallelism>3</parallelism>
</reader>
<writer name="jdbc">
<url><![CDATA[jdbc:mysql://192.168.1.35:3306/gateway?characterEncoding=utf8&useSSL=false]]></url>
<driver>com.mysql.jdbc.Driver</driver>
<table>client</table>
<username>root</username>
<password>123456</password>
<parallelism>3</parallelism>
</writer>
</job>
- 【执行mysql到excel的数据同步】
java -Xss256k -Xms1G -Xmx1G -Xmn512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:SoftRefLRUPolicyMSPerMB=0 -Dhdata.conf.dir="D: esthdata-0.2.8conf" -Dlog4j.configurationFile=file:///D: esthdata-0.2.8conflog4j2.xml -classpath ".;D: esthdata-0.2.8lib*" "com.github.stuxuhai.hdata.CliDriver" -Dhttps.protocols=TLSv1.2 -Dfile.encoding="UTF-8" --reader jdbc -Rurl="jdbc:mysql://127.0.0.1:3306/gateway?characterEncoding=utf8&useSSL=false" -Rdriver="com.mysql.jdbc.Driver" -Rtable="client" -Rusername="root" -Rpassword="123456" --writer excel -Wpath="D://test//client.xlsx" -Winclude.column.names="true"
hdata.bat -f D:/test/hdata-0.2.8/conf/mysqlToExcel.xml
<?xml version="1.0" encoding="UTF-8"?>
<job id="job_example">
<reader name="jdbc">
<url><![CDATA[jdbc:mysql://127.0.0.1:3306/gateway?characterEncoding=utf8&useSSL=false]]></url>
<driver>com.mysql.jdbc.Driver</driver>
<table>client</table>
<username>root</username>
<password>123456</password>
<parallelism>3</parallelism>
</reader>
<writer name="excel">
<path><![CDATA[D://test//client2.xlsx]]></path>
<include.column.names>true</include.column.names>
</writer>
</job>
- 【执行http到excel的数据同步】
java -Xss256k -Xms1G -Xmx1G -Xmn512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:SoftRefLRUPolicyMSPerMB=0 -Dhdata.conf.dir="D: esthdata-0.2.8conf" -Dlog4j.configurationFile=file:///D: esthdata-0.2.8conflog4j2.xml -classpath ".;D: esthdata-0.2.8lib*" "com.github.stuxuhai.hdata.CliDriver" -Dhttps.protocols=TLSv1.2 -Dfile.encoding="UTF-8" --reader http -Rurl="https://www.baidu.com/" --writer excel -Wpath="D://test//html2.xlsx" -Winclude.column.names="true"
hdata.bat -f D:/test/hdata-0.2.8/conf/httpToExcel.xml
<?xml version="1.0" encoding="UTF-8"?>
<job id="job_example">
<reader name="http">
<url><![CDATA[https://www.baidu.com/]]></url>
</reader>
<writer name="excel">
<path><![CDATA[D://test//html2.xlsx]]></path>
<include.column.names>true</include.column.names>
</writer>
</job>
3、IDEA中查看代码
1.导入工程
VM Options: -Dhttps.protocols=TLSv1.2 -Dhdata.conf.dir="D:\Workspaces\idea_2\HData-master\conf"
Program arguments: --reader jdbc -Rurl="jdbc:mysql://127.0.0.1:3306/gateway?characterEncoding=utf8&useSSL=false" -Rdriver="com.mysql.jdbc.Driver" -Rtable="client" -Rusername="root" -Rpassword="123456" --writer jdbc -Wurl="jdbc:mysql://192.168.1.35:3306/gateway?characterEncoding=utf8&useSSL=false" -Wdriver="com.mysql.jdbc.Driver" -Wtable="client" -Wusername="root" -Wpassword="123456"
[Fatal Error] mysqlToMysql.xml:5:73: 对实体 "useSSL" 的引用必须以 ";" 分隔符结尾。
<url><![CDATA[jdbc:mysql://127.0.0.1:3306/gateway?characterEncoding=utf8&useSSL=false]]></url>
因为xml结构内有&=等特殊符号,xml解析失败,将内容以<![CDATA["xxxxxx"]]> 包装起来即可;
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
转载请注明出处: https://daima100.com/7870.html