Apache Pig BinStorage()函数

BinStorage() 函数使用机器可读格式将数据加载并存储到Pig中。Pig中的 BinStorge() 通常用于存储MapReduce作业之间生成的临时数据，它支持多个位置作为输入。

语法

下面给出了 BinStorage() 函数的语法。

grunt> BinStorage();

例

假设在HDFS目录 /pig_data/ 中有一个名为 stu_data.txt 的文件，如下所示。

Stu_data.txt

001,Rajiv_Reddy,21,Hyderabad 
002,siddarth_Battacharya,22,Kolkata 
003,Rajesh_Khanna,22,Delhi 
004,Preethi_Agarwal,21,Pune 
005,Trupthi_Mohanthy,23,Bhuwaneshwar 
006,Archana_Mishra,23,Chennai 
007,Komal_Nayak,24,trivendram 
008,Bharathi_Nambiayar,24,Chennai

让我们将这些数据加载到一个关系中，如下所示。

grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/stu_data.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, age:int, city:chararray);

现在，我们可以使用 BinStorage() 函数将此关系存储到名为 /pig_data/ 的HDFS目录中。

grunt> STORE student_details INTO 'hdfs://localhost:9000/pig_Output/mydata' USING BinStorage();

执行上述语句后，关系存储在给定的HDFS目录中。你可以使用HDFS ls命令查看它，如下所示。

$ hdfs dfs -ls hdfs://localhost:9000/pig_Output/mydata/
  
Found 2 items 
-rw-r--r--   1 Hadoop supergroup       0 2015-10-26 16:58
hdfs://localhost:9000/pig_Output/mydata/_SUCCESS

-rw-r--r--   1 Hadoop supergroup        372 2015-10-26 16:58
hdfs://localhost:9000/pig_Output/mydata/part-m-00000

现在，从文件 part-m-00000 加载数据。

grunt> result = LOAD 'hdfs://localhost:9000/pig_Output/b/part-m-00000' USING BinStorage();

验证关系的内容如下所示

grunt> Dump result; 

(1,Rajiv_Reddy,21,Hyderabad) 
(2,siddarth_Battacharya,22,Kolkata) 
(3,Rajesh_Khanna,22,Delhi) 
(4,Preethi_Agarwal,21,Pune) 
(5,Trupthi_Mohanthy,23,Bhuwaneshwar) 
(6,Archana_Mishra,23,Chennai) 
(7,Komal_Nayak,24,trivendram) 
(8,Bharathi_Nambiayar,24,Chennai)

w3cschool 编程狮，随时随地学编程

Apache Pig BinStorage()函数

语法

例

Apache Pig 介绍

Apache Pig 环境

Pig Latin 介绍

Apache Pig 加载和存储

Apache Pig 诊断运算符

Apache Pig 分组和连接

Apache Pig 合并和拆分

Apache Pig 过滤

Apache Pig 排序

Pig Latin 内置函数

Apache Pig 其他执行模式

Apache Pig 有用的资源