Apache Flume Advantages over ad-hoc solutions
Apache Flume is basically an event processing framework, which, with the help of coding, allows you to take data and perform some live transformations over it. So Flume is collection and aggregation of streaming event data and what is an event? Event can anything you want it to be; it may be a file, it may be a single attribute, an image, a single record from a database, an XML or any other thing you want it to be. So what Flume actually does is, it is basically an event processing framework which allows us to take those events and land them into a target store which can be HDFS, HBase, etc. So that is basically what Apache Flume is.
Where is Flume generally used? In a very trivial fashion it is mostly used for log data; almost every website keeps generating logs continuously and to transport these logs to HDFS, Flume is the most common tool used. What is the use of this solution over the other, more ad-hoc solutions?
- It is reliable meaning if an event is introduced into the Flume event processing framework it is guaranteed not to be lost.
- It is scalable; by adding more number of processing agents, known as Flume agents, we can have scalability as far as processing is concerned. Thus, we can process more if we add more agents.
- It is manageable, customizable and is of high performance.
- Once the concept of an event is introduced it is given that they have to be routed, processed, transformed, etc. This is where we have the advantage of the declarative configuration of Flume. By the way of configuration files, I can dynamically configure an event to flow in a certain manner depending on some conditions of that event.
- Then there is also this contextual routing through the dynamic configuration which is another advantageous feature.
- Finally, it is feature rich and fully extensible; so we can extend this Flume framework to our needs
Thus, Flume is a framework that is useful for moving data. What Flume lets us do is move data from point A to point B and while moving the data it lets us also transform the data. So in that sense it is more like an ETL tool.