频道栏目
首页 > 网络 > 云计算 > 正文
spark【例子】同类合并、计算2
2017-04-11 13:50:01           
收藏   我要投稿

例子描述:

大概意思为,统计用户使用app的次数排名

原始数据:

000041b232,张三,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15097003,,2016/6/8 17:10,2016/6/8 17:10,690,6218,11=0|12=200,2016/7/5 11:11

000041b232,张三,FC:1A:11:5C:58:34,F8:E7:1E:1E:69:C0,15026002,,2016/6/8 17:10,2016/6/8 17:10,690,6218,11=0|12=200,2016/7/5 11:11

000041b232,张三,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15026002,,2016/6/8 17:10,2016/6/8 17:10,690,6218,11=0|12=200,2016/7/5 11:11

000041b744,张三,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15026002,,2016/6/8 17:10,2016/6/8 17:10,719,4174,6=2016-06-23 08:50:00|7=,2016/7/5 11:11

000041b22f,李四,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15097002,,2016/6/8 17:10,2016/6/8 17:10,856,367,7=,2016/7/5 11:11

000041b1bc,李四,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15026002,,2016/6/8 17:10,2016/6/8 17:10,937,2964,3=北京|4=上海,2016/7/5 11:11

000041cf18,赵六,7C:1D:D9:F4:BE:E0,F8:E7:1E:1E:62:20,15097002,,2016/6/8 17:10,2016/6/8 17:10,665,2669,5=2016-06-22 00:00:00,2016/7/5 11:11

000041b1bc,孙七,7C:1D:D9:F4:BE:E0,38:FF:36:2E:5B:A0,9003000,,2016/6/8 17:10,2016/6/8 17:10,530,245,,2016/7/5 11:11

000041b8f1,王五,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,9007000,,2016/6/8 17:11,2016/6/8 17:11,626,6886,,2016/7/5 11:11

000041b8f1,周八,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,16500000,,2016/6/8 17:11,2016/6/8 17:11,2532,646,,2016/7/5 11:11

000041966a,李四,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,16501000,,2016/6/8 17:11,2016/6/8 17:11,690,454,,2016/7/5 11:11

000041966a,李四,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,16501000,,2016/6/8 17:11,2016/6/8 17:11,690,454,,2016/7/5 11:11

结果数据:

周八,人人贷:1

孙七,支付宝:1

赵六,途牛机票:1

王五,快钱:1|天弘基金:1

李四,红岭创投:2|携程机票:1|携程酒店:1|途牛机票:1

张三,途牛酒店:5|携程机票:3

代码片段:

 

cxRDD0.map {

lines =>

val line = lines.split(",")//逗号分隔数据

//想办法将数据拼成(数据,1)的映射,并且这个地方的数据要相同,可以理解取为用户,APPID,然后当成K,写个数字1当成V,这里使用的字典关联去取的数据

(s"""${line((data_location.getOrElse("USR_NBR", "").toInt))},${buss_location.getOrElse(line((data_location.getOrElse("BUS_ID", "").toInt)), "").split(",", -1)(0)}""", 1)

}.reduceByKey(_ + _).map {//分组

lines =>

//将分组后的数据,以用户为K,其他为V拼成映射,便于后续分组

(s"${lines._1.split(",")(0)}", s"${lines._1.split(",")(1)},${lines._2}")

}.groupByKey().map {//分组

case (k, v) =>

//对APPID数量 V 进行排序

val app = v.map {

x =>

val a = x.split(",")

//拆分APPID 与 数量,这里传递给下面的类型为映射

(a(0), a(1))

//使用sortWith对映射的第二位数字进行排序,需要转换成INT,因为传递过来都是字符

}.toSeq.sortWith(_._2.toInt > _._2.toInt).map {

app =>

//格式化输出

//V:V

s"${app._1}:${app._2}"

}

//格式化输出

//K,V

//K,V1|V2......

s"$k,${app.mkString("|")}"

}.foreach(println)

 

点击复制链接 与好友分享!回本站首页
上一篇:spark【例子】同类合并、计算(主要使用groupByKey)
下一篇:spark【例子】count(distinct字段)简易版使用groupByKey和zip
相关文章
图文推荐
点击排行

关于我们 | 联系我们 | 广告服务 | 投资合作 | 版权申明 | 在线帮助 | 网站地图 | 作品发布 | Vip技术培训 | 举报中心

版权所有: 红黑联盟--致力于做实用的IT技术学习网站