OVERVIEW A service for streaming logs into Hadoop Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS). It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery. YARN coordinates data ingest from Apache Flume and other services that deliver raw data into an Enterprise Hadoop cluster. WHAT FLUME DOES Flume lets Hadoop users ingest high-volume streaming data into HDFS for storage. Specifically, Flume allows users to: FeatureDescriptionStream dataIngest streaming data from multiple sources into Hadoop for storage and analysisInsulate systemsBuffer storage platform from transient spikes, when the rate of incoming data exceeds the rate at which data can be written to the destinationGuarantee data deliveryFlume NG uses channel-based transactions to guarantee reliable message delivery. When a message moves from one agent to another, two transactions are started, one on the agent that delivers the event and the other on the agent that receives the event. This ensures guaranteed delivery semanticsScale horizontallyTo ingest new data streams and additional volume as needed Enterprises use Flume’s powerful streaming capabilities to land data from high-throughput streams in the Hadoop Distributed File System (HDFS). Typical sources of these streams are application logs, sensor and machine data, geo-location data and social media. These different types of data can be landed in Hadoop for future analysis using interactive queries in Apache Hive. Or they can feed business dashboards served ongoing data by Apache HBase. In one specific example, Flume is used to log manufacturing operations. When one run of product comes off the line, it generates a log file about that run. Even if this occurs hundreds or thousands of times per day, the large volume log file data can stream through Flume into a tool for same-day analysis with Apache Storm or months or years of production runs can be stored in HDFS and analyzed by a quality assurance engineer using Apache Hive.
Top 6 Apache Flume Interview Questions and Answers - YouTube | |
7 Likes | 7 Dislikes |
109 views views | 12,095 followers |
Education | Upload TimePublished on 16 Apr 2018 |
Không có nhận xét nào:
Đăng nhận xét