MapReduce的替代API：Cascading

Cascading

Cascading 是 MapReduce 的替代 API，它实际上使用 MapReduce，但允许您以简化的方式编写 MapReduce 代码。

以下示例显示了 Cascading Flow，它将数据“汇集（sinks）”到 HBase 集群中。同样的 hBaseTap API 也可以用于“源（source）”数据：

// read data from the default filesystem
// emits two fields: "offset" and "line"
Tap source = new Hfs( new TextLine(), inputFileLhs );

// store data in an HBase cluster
// accepts fields "num", "lower", and "upper"
// will automatically scope incoming fields to their proper familyname, "left" or "right"
Fields keyFields = new Fields( "num" );
String[] familyNames = {"left", "right"};
Fields[] valueFields = new Fields[] {new Fields( "lower" ), new Fields( "upper" ) };
Tap hBaseTap = new HBaseTap( "multitable", new HBaseScheme( keyFields, familyNames, valueFields ), SinkMode.REPLACE );

// a simple pipe assembly to parse the input into fields
// a real app would likely chain multiple Pipes together for more complex processing
Pipe parsePipe = new Each( "insert", new Fields( "line" ), new RegexSplitter( new Fields( "num", "lower", "upper" ), " " ) );

// "plan" a cluster executable Flow
// this connects the source Tap and hBaseTap (the sink Tap) to the parsePipe
Flow parseFlow = new FlowConnector( properties ).connect( source, hBaseTap, parsePipe );

// start the flow, and block until complete
parseFlow.complete();

// open an iterator on the HBase table we stuffed data into
TupleEntryIterator iterator = parseFlow.openSink();

while(iterator.hasNext())
  {
  // print out each tuple from HBase
  System.out.println( "iterator.next() = " + iterator.next() );
  }

iterator.close();

w3cschool 编程狮，随时随地学编程

MapReduce的替代API：Cascading

Cascading

HBase快速入门

Apache HBase配置

升级HBase

HBase Shell

HBase数据模型

HBase和Schema设计

Thumb的RegionServer大小规则

HBase模式（Schema）设计案例

HBase和MapReduce

HBase MapReduce示例

Apache HBase安全

客户端安全访问Apache HBase

HBase数据安全

HBase：可见性标签

HBase架构

HBase客户端

HBase客户端请求过滤器

HBase架构：RegionServer

HBase使用Write Ahead Log（WAL）

HBase区域

Store

HBase批量加载

HBase：Timeline-consistent高可用读取

HBase：存储中型对象（MOB）

HBase备份与还原

HBase备份和还原命令

HBase备份图像管理

Apache HBase外部API

HBase：Thrift API和过滤器语言

HBase和Spark

Apache HBase协处理器

Apache HBase性能调整

故障排除和调试Apache HBase

故障排除和调试HBase：工具