您现在的位置 >> Hadoop教程 >> Hadoop实战 >> hive专题  
 

hadoop

【作者:Hadoop实战专家】【关键词:hive 编译 下载 版本 】 【点击:41698次】【2013-08-1】
* * * * * * chilianyi的专栏 * 目录视图 * 摘要视图 * 订阅 2014年7月微软MVP申请开始了 参与CSDN社区问答活动“基于Java的微信公众平台开发”赢签名赠书 2014年4月微软MVP当选名单揭晓 hadoop-2.2.0配合hive-0.12.0使用orc存储引发的bug 分类:  

相关热门搜索:

大数据标签:hadoop hdfs yarn mapreduce hive sqoop bigdata

*

*

*

*

*

*

chilianyi的专栏

* 目录视图
* 摘要视图
* 订阅

2014年7月微软MVP申请开始了      参与CSDN社区问答活动“基于Java的微信公众平台开发”赢签名赠书      2014年4月微软MVP当选名单揭晓

hadoop-2.2.0配合hive-0.12.0使用orc存储引发的bug

分类: hadoop hive 2014-03-13 18:54 103人阅读 评论(1) 收藏 举报

环境:
hadoop版本:hadoop-2.2.0 (官网下载并编译为64位版本)
hive版本:hive-0.12.0(官网下载后解压)
集群状态良好,尝试普通hive以及mapreduce均成功。

测试新版hive的orc存储格式,步骤如下:

create external table text_test (id string,text string)  row format delimited fields terminated by '\t' STORED AS textfile LOCATION '/user/hive/warehouse/text_test';

create external table orc_test (id string,text string) row format delimited fields terminated by '\t' STORED AS orc LOCATION '/user/hive/warehouse/orc_test';

hive> desc text_test;
OK
id                      string                  None
text                    string                  None

hive> desc orc_test;
OK
id                      string                  from deserializer
text                    string                  from deserializer

hive> select * from text_test;
OK
1       myew
2       ccsd
3       33

hive> insert overwrite table orc_test select * from text_test;
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1394433490694_0016, Tracking URL = http://zw-34-69:8088/proxy/application_1394433490694_0016/
Kill Command = /opt/hadoop/hadoop/bin/hadoop job  -kill job_1394433490694_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-03-13 17:00:49,899 Stage-1 map = 0%,  reduce = 0%
2014-03-13 17:01:10,097 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1394433490694_0016 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1394433490694_0016_m_000000 (and more) from job job_1394433490694_0016

Task with the most failures(4):
-----
Task ID:
task_1394433490694_0016_m_000000

URL:
http://zw-34-69:8088/taskdetails.jsp?jobid=job_1394433490694_0016&tipid=task_1394433490694_0016_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:240)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.UnsupportedOperationException: This is supposed to be overridden by subclasses.
at com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.getSerializedSize(OrcProto.java:3046)
at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry.getSerializedSize(OrcProto.java:4129)
at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndex.getSerializedSize(OrcProto.java:4641)
at com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:75)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:548)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1328)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1699)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:1868)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:95)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:181)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:866)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:596)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
... 8 more

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

随后开始漫长的google、baidu、bing之旅,终于找到了解决办法:http://web.archiveorange.com/archive/v/S2z2uV6yqpmtC3rgpsrs
感谢两位前辈辛勤的研究。

总结一下问题原因:
编译hadoop-2.2.0时用的protobuf-2.5.0版本,而编译hive-0.12.0时用的protobuf-2.4.1版本,从而造成了冲突。
解决办法:
重新使用protobuf-2.5.0来编译hive-0.12.0

1. 安装protobuf:下载:https://code.google.com/p/protobuf/downloads/detail?name=protobuf-2.5.0.tar.gz
解压:tar -xzvf protobuf-2.5.0.tar.gz
进入:cd protobuf-2.5.0
编译安装:

1. ./configure
2. make
3. make check
4. make install  (root权限)

2. 下载hive源码:svn checkout http://svn.apache.org/repos/asf/hive/tags/release-0.12.0/

3. 安装ant:下载地址 http://ant.apache.org/bindownload.cgi  我下载的1.9.2版本 apache-ant-1.9.2-bin.tar.gz。
(1.9.3版本编译会报错 http://www.mailinglistarchive.com/html/dev@ant.apache.org/2014-01/msg00009.html)
解压: tar -xzvf apache-ant-1.9.2-bin.tar.gz
配置Path:vi ~/.bash_profile
export ANT_HOME=/opt/hadoop/apache-ant-1.9.2
PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ANT_HOME/bin:$PATH
export PATH
保存退出后执行: . ~/.bash_profile  使配置生效。
4. 更改ant编译时使用的protobuf版本:

修改release-0.12.0/ivy/libraries.properties文件,将protobuf.version=2.4.1修改为protobuf.version=2.5.0
5. 在hive目录中编译protobuf:cd release-0.12.0
ant protobuf

6. 编译hive:
在release-0.12.0目录下执行:ant clean package
漫长的等待......(要联网)
7. 编译好的内容在release-0.12.0/build/dist/中

回头执行:insert overwrite table orc_test select * from text_test;成功。

hive --orcfiledump 

hive> select * from orc_test;
OK
1       myew
2       ccsd
3       33

hive --orcfiledump /user/hive/warehouse/orc_test/000000_0

Rows: 3
Compression: ZLIB
Compression size: 262144
Type: struct<_col0:string,_col1:string>

Statistics:
Column 0: count: 3
Column 1: count: 3 min: 1 max: 3
Column 2: count: 3 min: 33 max: myew

Stripes:
Stripe: offset: 3 data: 31 rows: 3 tail: 50 index: 59
Stream: column 0 section ROW_INDEX start: 3 length 9
Stream: column 1 section ROW_INDEX start: 12 length 23
Stream: column 2 section ROW_INDEX start: 35 length 27
Stream: column 1 section DATA start: 62 length 6
Stream: column 1 section LENGTH start: 68 length 5
Stream: column 2 section DATA start: 73 length 13
Stream: column 2 section LENGTH start: 86 length 7
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
Encoding column 2: DIRECT_V2

更多 0

* 上一篇hadoop2.0 执行框架
* 下一篇hadoop job 抛出 Exception in thread "main" java.lang.NoClassDefFoundError: ___/tmp/hsperfdata_ 异常

主题推荐
    存储 hadoop bug hive application

博文推荐
    利用sqoop从mysql向多分区hiv...
    haoop执行reduce后合并结果文件
    hadoop HDFS原理基础知识
    剖析淘宝TDDL(TAOBAO DIST...
    Oracle VM VirtualBox...
    iOS提交后申请加急审核
    android之viewFlipper简...
    驱动程序与应用程序之间的通信

大数据系列hive相关文章:

最新评论
K2014-09-10 04:29:00
这是一个条件概率,前提条件是他已经参与了,所以剩下的只要有一个人带炸弹登机就足够形成“两个人同时带炸弹上飞机”的事实了,所以,说白了,整体事件的概率还是“一个人带炸弹登机”的概率
怀霜临云2014-09-09 02:04:43
求大神帮忙
ANYNI2014-09-09 12:22:02
最近的课题是在Hive+HBase的多功能Hadoop集群上实现多租户管理、安全控制以及基于租户优先级的资源控制;发现Hadoop和Hive的安全控制有些地方还不match,HBase由于还没有on YARN所以还做不到overall的资源调度,也没有一个独立的租户管理部件、Sentry似乎是个不错的ProtoType。继续研究。。。
阿宝2014-09-08 12:18:32
要重启eclipse吗?
mary2014-09-07 02:12:44
在公司用不着的东西,慢慢就忘记了
李单2014-09-05 10:45:26
#Mario# #马里奥·毛瑞尔# Mario参加“Hive Salon”新分行开幕式时与Ploy姐的合照,以及昨天的工作照,满脸大汗,头发像被电过一样,祝大家有一个快乐的周末
hadoophive10032014-09-05 04:06:08
不是,是在文吉路
n3xtm32014-09-04 03:43:41
用三台虚拟机搭了个hadoop微型集群,笔记本表示毫无压力。
2014-09-03 09:26:46
五月不充电,年底徒伤悲!新一期《大数据技术开发与应用实践培训》报名倒计时!详情:http://t.cn/8FDwt8l 报名:http://t.cn/8FDwV5O 姐妹课程《Hadoop技术开发与管理实战培训》详情:http://t.cn/8stnYrF 报名:http://t.cn/8sdwxzB (精品小班名额有限,即将报满截止)
小陈师傅2014-09-03 02:53:33
Hadoop 2.3 release出来一个重要的feature是“Centralized cache management”,其能够指定文件或者路径做cache,提高性能。但是其与Spark中的RDD还是有本质区别的,HDFS只能Cache读取,不能Cache写入(一定要写磁盘,因为没有linage信息来重算),具体细节可以参见设计文档 http://t.cn/8sJOb8D
 
  • Hadoop生态系统资料推荐