kafka as a component Big Data

Kafka as a component: products you wouldn’t tell they are using it


We all know that Kafka is very useful for your streaming and real time projects thanks to its reliability, low latency, escalation and so on. If not, read on this post: Do you know why Kafka is being a success?.

But, have you ever thought that Kafka may be just a commodity for your products not only your projects?

Kafka as a component

It is well known that most successful companies are built on top of products, not projects. Apple started with a computer and now is triumphant with a phone: products. Microsoft with MS-DOS, Windows and Azure Cloud. Coca-Cola with a drink.

Who can claim a similar level of success just selling projects?

Fortune 500 top companies start with Walmart, Exxon Mobil, Berkshire, Apple and United Health group. Just the third and the fifth are more services oriented. None of the top 10 are software development/consultancy companies.

The same could happen to Kafka: instead of just being used for inside projects in the last years we have seen a gradual approach to use Kafka as another component just like the database or a security service.

Let’s see three examples form different worlds

companies using kafka as a component
Companies using Kafka as a component

Appian BPM: The process management example

Appian BPM is a Business Process Management (BPM) solution. It can be used on premise or cloud. Appian’s cloud in this case. To simplify we will describe a BPM as a tool where you model your business process (i.e. sell a credit to a customer) with all the steps involving customers, employees and the systems required to fully integrate and complete the selling process end-to-end. This design is graphical, multilevel (process and sub processes), parallel and branched, some of them quite complex. They evolve in time and some parts can be even ad hoc processes.

What does this have to do with Kafka?

Appian uses Kafka for dealing with its internal messaging among its components. This means that all actions in the system will be sent and distributed using Kafka. The obvious advantages are those from Kafka: reliability, speed, fault-tolerance, capacity. But this also eases the problem of communication between components, less ports, less protocols, K.I.S.S. again 😉

In addition to this albeit Appian has not opened their internal message format for developers, it is quite obvious that the flow of messages is the perfect source of information for KPIs.

Blockchain rescued!

Hyperledger Fabric (HF) is one of the flavors of the famous Blockchain.  Blockchain became famous in the last years for Bitcoin high prices and speculation and some news on stealing electricity to mine cryptocurrencies. Aham! This is getting interesting: do you know the origin of Bitcoin and Blockchain?

But Blockchain is not about the cryptocurrencies, those are the rewards for some work done for Blockchain networks. The real underlying benefit of Blockchain is the ability to exchange money, goods and information, to execute contracts and so on without the need of trusting a central authority; instead, the Blockchain network is a large amount of entities and the trust is what the majority says it is the truth, so the larger the network the more difficult it is to fool. Note: this is true for a certain type of Blockchain network, other consensus algorithms work different.

So, what is Kafka doing in this system?

HF has one component: the Orderer service that uses Kafka to assure all the transactions are executed in order, because any change has to be taken into account for future transactions.

This is very critical because you see, the “chain” part of Blockchain means that a transaction is related to the previous one.  HF is using a single Kafka topic per channel (this is a Hyperledger concept that relates to the all the transactions inside a group of actors and therefore those have to be in order) using a single partition with at least four replicas. The replicas allow the system to have a (Kafka or Orderer) broker down and still work in sync. The single partition guarantees the order, independent of the number of brokers/orderers in the network.

Obvious? Not everything is what it seems

IBM CDC/Data Replication is a classical case data streaming, what is known as Change Data Capture (CDC) system, but there are many other similar (Attunity, AB Initio, etc) in the market.

CDC products have the ability to detect changes in a system (typically a database) and transport those changes to other systems (other databases, HDFS, filesystems, etc). The underlying technologies are various but during the last years it has been common practice to store/buffer those data changes into Kafka.

This allows the systems to squeeze Kafka capabilities:

  • Data/message order if you use a single partition like Hyperledger Fabric. Not mandatory, some advanced CDC use other techniques to maintain order even in multi-partition topics
  • High capacity
  • Multiple consumers will allow easy and cheap information distribution to multiple targets
  • Cheap storage and reprocessing can be used as event sourcing (event sourcing)
  • High performance
  • Multiple brokers and multiple topics can help with multiple sources/targets
  • Not to mention that ACLs will keep who see what in control in multi-tenancy environments

Solutions using Kafka vs Nats

There are a vast number of products and solutions using Kafka,why not also yours?

But why Kafka? Wouldn’t it be reasonable to use any Kafka alternatives like Nats? What about Nats vs Kafka?

Very good questions. The answer is rather obvious: Kafka is older, wiser, better known. Whilst starting a development project there are usually two ways of doing things: the “let’s find the best, newer fancier product that solves exactly our needs” or the “let’s use the old, known and certified product”. When you create a product using some third-party technology it is very typical to stick just to the second option. This option is usually selected because at the end of the day your product needs support, something you are selling along with the product licenses. And the better the product and components it uses, the lesser work support it will have.

kafka support vs nats

Talking about support: Kafka beats Nats

In this case it is Kafka who beats Nats. At version 2.2, Kafka is 8 years old. It is a mature software that has commercial support from several companies. Nats could be lighter or even faster but it lacks the market share and the good image Kafka has, and that counts, for the sake of your reputation and the good sleeping of support teams 😉

Author: Juan Tavira

Santander Global Tech


Other posts