Tutorial related to the installation and configuration of Kafka on an Ubuntu 16.04 virtual machine. The instructions and tutorial include setting up the dependent components (ZooKeeper, etc) as well as testing and interacting with the Kafka instance.
Warning
Note that the resulting Kafka setup from these instructions is NOT for production use. These instructions are for setting up an initial test/dev setup of Kafka to understand how the application functions.
Installation/Setup Process
The following steps are performed on a VirtualBox-hosted Ubuntu virtual machine with 1CPU and 6GB RAM.
Prerequisites
Install the required prerequisites (Java):
Setup
Retrieve the Kafka package from the Downloads page on the Apache Kafka website. For the purposes of these instructions, use the Binary package download.
At the time of this post, the version used is 0.10.0.1 for Scala version 2.11.
Copy the downloaded package to your home directory, and unpack it:
Zookeeper
ZooKeeper is used for internal tracking and status information for Kafka.
Configuration/Setup
The configuration file for ZooKeeper is in config/zookeeper.properties. One item of note in the configuration
file is the clientPort
parameter, which defaults to 2181.
To start the ZooKeeper instance, run the following command:
Kafka
Kafka is the main event processing application.
Configuration/Setup
The configuration file for Kafka is in config/server.properties
. There are quite a few tunable parameters in
this file, but one of note is zookeeper.connect
which tells Kafka where its ZooKeeper instance is
located (defaults to ‘localhost:2181’, which is fine for this exercise). By default, the single Kafka broker
listens on port 9092. If you wish to access Kafka from outside the virtual machine on which it is being
installed, it is best to also configure the listeners
parameter to explicitly bind the service to the
interface that contains an IP address that is reachable by outside sources. In summary, the following
configurations are acceptable in the config/server.properties
, where <IP_ADDRESS>
is an IP address of the
host virtual machine that is reachable from outside the machine itself:
Kafka can be a memory hog (part of the benefits it provides related to speed). If developing on a local VM it is likely a good idea to memory constrain the Kafka process, which is the KAFKA_HEAP_OPTS parameter included in the commands below.
To start the Kafka instance, run the following command(s):
Interaction and Testing
Now that you have a ZooKeeper and Kafka instance set up, let’s test it. Kafka comes pre-packaged with scripts to create, modify, delete, etc. topics and messages within.
Create a Topic
The first step to interact with Kafka would be to create a topic:
Once the topic is created, verify using the list operation:
Send Test Messages
Now that the topic is created, let’s send some messages to it. Running the console producer places your shell in input mode. Each new line (enter key) will result in the message being sent to the specified topic within Kafka. When you are done, terminate the script using the CTRL+C command sequence.
Retrieve Topic Messages
The topic now has messages on it from the previous step. Let’s inspect to verify that a consumer can see the messages. The console consumer runs until the CTRL+C command sequence is executed to terminate the script.
Credit
Contributions to some of the above were gleaned from: