使用hadoop-streaming-2.8.4.jar,命令如:./share/hadoop/tools/lib/hadoop-streaming-2.8.4.jar -input /mr-input/* -output /mr-output -file /home/lzh/external/Mapper.py -mapper 'Mapper.py' -file /home/lzh/external/Reducer.py -reducer 'Reducer.py'
遇到问题1: bash: ./share/hadoop/tools/lib/hadoop-streaming-2.8.4.jar: Permission denied
解决方法:扩大文件权限,chmod -R 777 ./share/hadoop/tools/lib/hadoop-streaming-2.8.4.jar
解决,随之而来另一个问题2:invalid file (bad magic number): Exec format error
解决方法:自己马虎呗,命令漏掉了前面的hadoop jar,加上即可,即:hadoop jar ./share/hadoop/tools/lib/hadoop-streaming-2.8.4.jar -input /mr-input/* -output /mr-output -file /home/lzh/external/Mapper.py -mapper 'Mapper.py' -file /home/lzh/external/Reducer.py -reducer 'Reducer.py'
可能遇到问题3:得将Mapper.py和Reducer.py变成可执行的。 chmod +x 文件名 修改权限为可执行的
用python编写mapreduce程序时前面最好加上:#!/usr/bin/env python 这条语句
最后运行成功
Reference:
[python]使用python实现Hadoop MapReduce程序:计算一组数据的均值和方差