Components of Apache Flume – Channel selectors
As the name suggests, the Channel selector is that component of Flume that determines which channel a particular Flume event should go into. As we know, the source can send the events into multiple channels; so where a particular Flume event that has been generated should go? Should the Flume event go through one particular channel or should it go through all the channels or some of the channels? All these questions are answered by the channel selector and the decision of through which channel a particular Flume event should go is decided by the channel selector.
In the above example, the channel will selector whether the event should go into channel 1, 2 or 3 or all or some of the channels. So, this is what the channel selector does.
There are different types of channel selectors:
- Replicating channel selectors: This is the default channel selector; if we don’t configure anything with respect to the channel selector, it is the replicating channel selector that will do the job of deciding into which channel a particular Flume event should go. So, how does it work? It simply puts a copy of the event into each channel, assuming we have more than one channel. So, if we consider the above example where there are 3 channels, it simply replicates the events to each channel.
- Multiplexing channel selector: This channel selector can write the Flume event to different channels depending on the header information. Based on the header information, which can be anything like where the data has originated like in which data centre or on which application server, etc., based on this information the multiplexing selector which has been configured will decide through which channel the event should be transferred.
Both the interceptors as well as the channel selectors work in between the Source and Channel and this interceptor and channel selector together will define the routing for the Flume event. Hence, the interceptor will decide what sort of data should pass through to the Channel and the channel selector will decide to which channel and thereby to which Sink and thus to which HDFS cluster or HBase system the data should ultimately go to.