Introduction to Kafka & KsqlDB ~ itguidekolkata

An Introduction to Kafka

The amount of data in the world is growing exponentially and, according to the World Economic Forum, the number of bytes being stored in the world already far exceeds the number of stars in the observable universe. When you think of this data, you might think of piles of bytes sitting in data warehouses, in relational databases, or on distributed file systems. Systems like these have trained us to think of data in its resting state. In other words, data is sitting somewhere, resting, and when you need to process it, you run some query or job against the pile of bytes. This view of the world is the more traditional way of thinking about data. However, while data can certainly pile up in places, more often than not, it’s moving. You see, many systems generate continuous streams of data, including IoT sensors, medical sensors, financial systems, user and customer analytics software, application and server logs, and more. Even data that eventually finds a nice place to rest likely travels across the network at some point before it finds its forever home.

If we want to process data in real time, while it moves, we can’t simply wait for it to pile up somewhere and then run a query or job at some interval of our choosing. That approach can handle some business use cases, but many important use cases require us to process, enrich, transform, and respond to data incrementally as it becomes available. Therefore, we need something that has a very different worldview of data: a technology that gives us access to data in its flowing state, and which allows us to work with these continuous and unbounded data streams quickly and efficiently. This is where Apache Kafka comes in. Apache Kafka (or simply, Kafka) .

It is a streaming platform for ingesting, storing, accessing, and processing streams of data. While the entire platform is very interesting, this book focuses on what I find to be the most compelling part of Kafka: the stream processing layer. However, to understand Kafka Streams and ksqlDB (both of which operate at this layer, and the latter of which also operates at the stream ingestion layer), it is necessary to have a working knowledge of how Kafka, as a platform, works.

An Introduction to KsqlDB

The story of ksqlDB is one of simplification and evolution. It was built with the same goal as Kafka Streams: simplify the process of building stream processing applications. However, as ksqlDB evolved, it became clear that its goals were much more ambitious than even Kafka Streams. That’s because it not only simplifies how we build stream processing applications, but also how we integrate these applications with other systems (including those external to Kafka). It does all of this with a SQL interface, making it easy for beginners and experts alike to leverage the power of Kafka.

Both Kafka Streams and ksqlDB are excellent tools to have in your stream processing toolbelt, and complement each other quite well. You can use ksqlDB for stream processing applications that can be expressed in SQL, and for easily setting up data sources and sinks to create end-to-end data processing pipelines using a single tool. On the other hand, you can use Kafka Streams for more complex applications, and your knowledge of that library will only deepen your understanding of ksqlDB since it’s actually built on top of Kafka Streams.

What Is ksqlDB?

ksqlDB is an open source event streaming database that was released by Confluent in 2017 (a little more than a year after Kafka Streams was introduced into the Kafka ecosystem). It simplifies the way stream processing applications are built, deployed, and maintained, by integrating two specialized components in the Kafka ecosystem (Kafka Connect and Kafka Streams) into a single system, and by giving us a high-level SQL interface for interacting with these components. Some of the things we can do with ksqlDB include:

Model data as either streams or tables (each of which is considered a collection in ksqlDB) using SQL.

Apply a wide number of SQL constructs (e.g., for joining, aggregating, transforming, filtering, and windowing data) to create new derived representations of data without touching a line of Java code.

Query streams and tables using push queries, which run continuously and emit/push results to clients whenever new data is available. Under the hood, push queries are compiled into Kafka Streams applications and are ideal for event-driven micro services that need to observe and react to events quickly.

When to Use ksqlDB

It’s no surprise that higher-level abstractions are often easier to work with than their lower-level counterparts. However, if we were to just say, “SQL is easier to write than Java,” we’d be glossing over the many benefits of using ksqlDB that stem from its simpler interface and architecture. These benefits include:

More interactive workflows, thanks to a managed runtime that can compose and deconstruct stream processing applications on demand using an included CLI and REST service for submitting queries.

Less code to maintain since stream processing topologies are expressed using SQL instead of a JVM language.

Simplified architecture, since the interface for managing connectors (which integrate external data sources into Kafka) and transforming data are combined into a single system. There’s also an option for running Kafka Connect from the same JVM as ksqlDB.

Kafka Streams Integration

For the first two years of its life, ksqlDB was known as KSQL. Early development focused on its core feature: a streaming SQL engine that could parse and compile SQL statements into full-blown stream processing applications. In this early evolutionary form, KSQL was conceptually a mix between a traditional SQL database and Kafka Streams, borrowing features from relational databases (RDBMS) while using Kafka Streams to do the heavy lifting in the stream processing layer.

The most notable feature KSQL borrows from the RDBMS branch of the evolutionary tree is the SQL interface. This removed a language barrier for building stream processing applications in the Kafka ecosystem, since users were no longer required to use a JVM language like Java or Scala in order to use Kafka Streams.

Subscribe Us

0 Comments:

Post a Comment

Social Profiles

Ad Code

Ad Code

Featured post

Search This Blog

Blog Archive

Recently added book names

Footer Menu Widget

Social Plugin

Css Options

Link List

Default Variables

Link List

Top Social Widget

Menu Footer Widget

About Me

Contact Form

Followers

Mobile Logo Settings

Header Ads Widget

Hot Widget

Menu

Recent in Sports

Popular Posts

Most Popular

Popular Posts

Labels

Blog Archive

Blogger templates

Pages

Blogroll

About