Apache Flume
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.
Duration: 30hrs
Course Content:
Introduction
Overview, , Architecture, , Data flow mode, , Reliability and Recoverability
Setting up an agent
Configuring individual components, Wiring the pieces together, Data ingestion
Executing commands
Network streams
Setting multi-agent flow
Consolidation, Multiplexing the flow, Configuration, Defining the flow, Configuring individual components, Adding multiple flows in an agent
Configuring a multi agent flow
Fan out flow, Flume Sources, Avro Source, Exec Source, NetCat Source, Sequence Generator Source, Syslog Sources, Syslog TCP Source, Syslog UDP Source, Legacy Sources, Avro Legacy Source, Thrift Legacy Source, Custom Source
Flume Sinks
HDFS Sink, Logger Sink, Avro Sink, IRC Sink, File Roll Sink, Null Sink, HbaseSinks, HbaseSink, AsyncHBaseSink, Custom Sink
Flume Channels
Memory Channel , JDBC Channel , Recoverable Memory Channel , File Channel , Pseudo Transaction Channel , Custom Channel , Flume Channel Selectors , Replicating Channel Selector , Multiplexing Channel Selector , Custom Channel Selector
Flume Sink Processors
Default Sink Processor , Failover Sink Processor, Load balancing Sink Processor , Custom Sink Processor
Flume Interceptors
Timestamp Interceptor , Host Interceptor , Flume Properties , Property
Security
Monitoring , Troubleshooting, Handling agent failures, Compatibility, HDFS, AVRO