>>Hey friends. Did you
know that Azure Event Hubs ingest more than two trillion
messages per day, and as a distributed
streaming platform Event Hubs enables you to stream your
data from any source, storing and processing
millions of events per second? Shupa is here to show us
how Event Hubs now works for Apache Kafka today
on Azure Friday. Hey friends. I’m Scott Hanselman and it’s another episode
of Azure Friday. I’m here with Shupa Jayasraki
and we’re talking about Event Hubs
and Apache Kafka. Two of my favorite
things in big data, and you’ve brought me
some cool demos as well.>>Yes. I’ll try.>>Yeah, what’s going on with Event Hubs as it relates
to Apache Kafka?>>Event Hubs as we know is a distributed streaming platform and Apache Kafka is also a distributed
streaming platform. We tried our best as Event
Hubs team to try to bring these two big data
streaming platforms together and that’s what
you’ll see and how it is.>>Cool. So you’re going to make it so they
can work together?>>Yes. If we start over, if you see what we see
with big organizations, is they have a lot of
data points that they want to try and analyze
so that they can improve, they can work on that, have better return of investment. In any enterprise lifetime, you will be building
data pipeline and if you see how a typical
data pipeline looks, it’s like you have producers or the data points that
you’re interested in. You start collecting,
staging and then ingressing, processing, modeling and doing. That’s the whole typical
data pipeline where Event Hubs or Kafka come in as the friend door for
any streaming platform, where you’re ingesting before you store your model
your process. It’s like a buffer and a temporal buffer for
your producers and consumers. You can process them and ingest and process
them to model answer.>>I see. So, my producers
can be anything. It could be IoT devices, it could be video
or click streams.>>Applications, click streams, web apps, sensor data.>>You said it’s like a buffer because some days you might have millions of requests a second and some
days you might not. So it’s coming and going.>>I call it a buffer because your producers might produce
at a different velocity and volume while consumers can consume at a different
velocity and volume. So, you want something in between for
your smart endpoint. The pipeline just stay
so you don’t have to, your producers need
not be bothered about when the producers are ready.>>There’s a whole ecosystem of ingestion systems here and
you’re unifying those.>>Yes. Exactly.
They’re just bringing that one big fat pipe so that your producers and consumers can have a straightened pipeline.>>Do you find
a lot of people out there that have streaming
data they might be using Kafka and they don’t know if they can
bring it to Azure?>>Yes, for sure. If you see the Conceptual
Architecture of Event Hubs, you have Event Hubs or
Kafka topic very similar, and the way Event Hubs or
Kafka has data is they have a partitions in
their architecture where data gets redistributed. So, you have a bunch
of even producers and a bunch of even receivers who are
interested in this data, and the data gets distributed
within Event Hubs or Kafka topics in
a uniform pattern. What happens with this, you’re having a huge downstream
parallelism where you are concurrently you’re
letting your receivers concurrently read from
all these partitions. It’s called a partition
consumer model not like the typical competing
consumer model that you would see
in an enterprise.>>It’s not necessarily
PubsHub either?>>You can make it as
a PubsHub because you have events or messages coming
in getting distributed, and couple of event receivers are interested in
only a couple of things. The thing is it’s a buffer
so it’s not a push model, it’s a pull model where you can do a lot on
your receiving end. What this gives you is a scale. If you have four receivers today and I want to add
six receivers, it just scales.>>Awesome. That’s
the great thing about Azure in general. It just scales.>>If you see, this
is how the Kafka is, that’s where many users
come and say, oh no I have Kafka, I have Event Hubs, both streaming platforms,
which one should I choose. Is it Event Hubs or Kafka. Because there are a lot
of advantages with Kafka. Kafka is open source. Apache Kafka is open source. It’s a software. It’s downloaded, and Event Hubs is on cloud. It’s a cloud offering. It’s a PaaS offering. So the couple of pain points
when we talk about Event Hubs or Apache Kafka
is which one do I choose? Many customers are
already on Apache Kafka. It has an on-premise story and Event Hubs
is only for cloud. That’s where we
thought we bring in this and say it’s
not ‘or’ it’s ‘and’. So, that’s what
our integration story is where we are trying to say we make-ably will
build with HTTPs and AMQP, but now we understand
Kafka protocol as well.>>Interesting. So this
reminds me of the time that I had the folks
from the Cosmos DB site, and they said Cosmos is amazing, it has all these features that you don’t see anywhere else. But it’s not something
that’s open source and it doesn’t have
a protocol people understand. So they put Mongo in front
of it and now everyone wins.>>That’s right.>>So the same thing
is happening here. You’re bringing Kafka to Event Hubs so that they work together, and people who have existing
producers can just go off.>>Yes. The biggest
advantage here is you’re not changing
any of your clients, you’re not changing your code, your producers,
your applications, your tools, your frameworks. Kafka offers framework
like connect which is trying to stream using Kafka with your source
and destinations, your data sources and
data destinations, which could be like
your file streams, your databases, your JDBC SAP. All these will just
work with Event Hubs.>>So, are you saying
that I could take an existing app that I had
written using Kafka libraries. Does it even know
Azure is a thing?>>Yes. It need not know
anything about Event Hubs.>>Okay. Do you have a demo?>>I do have. So, what I have here is
a sample Kafka producer. As you can see, there is
no hint of Event Hubs anywhere. It’s all using Kafka API, and the sample consumer as well, it’s a simple Java program. All I need to do on
my producer and consumer is add Bootstrap server entry, which is Event Hubs namespace. Now, how do you create
Event Hubs namespace? You can go to the portal. I have one already
created for you. This is testingfordemo
is my namespace. It’s a Event Hubs namespace. You can just do create and
click on “Event Hubs”.>>Okay.>>They even have
comes in and then you can start creating
your namespace with it. It’s just click, click,
click, and [inaudible].>>Do I have to
tell the namespace I want to use Kafka? Or do
I get that automatically?>>Every namespace is
Kafka enabled so you don’t have to explicitly
say that it’s Kafka.>>That’s nice.>>So, I already have
a namespace created.>>Okay.>>I’ll reuse that.>>I noticed in your
connection string, they’re using 9093, 9092, those are standard ports.>>Yes.>>So, really your app looks like your connection string looks like Kafka because it is Kafka.>>Yeah. Kafka,
most applications also use 9092 is an unsecured port, 9093 is a secured port. Event Hubs requires TLS handshake and 9093 makes a lot
of sense with that.>>Nice.>>So, yeah.>>Okay.>>I already have a testing for demo namespaces,
a sample namespace.>>Okay.>>What I need from is
these connection strings.>>Okay.>>This gives me the endpoint that is needed to
talk to Event Hubs.>>Right.>>So, this is where I can
get my connection string. I already have a
connection string pasted. This is where you need to
put your connection string, just copy and paste and
this is the namespace. What Event Hubs namespace
gives you is a unique FQD, a fully-qualified domain name and that’s what you
need to append as well along with the namespace name
and that’s where your bootstrap server or
your host server name would be.>>Okay.>>So, that done.
The same thing I would do for my consumer config as
well I’m just adding my bootstrap server name and my connection string as my endpoint and you’re all
ready to go, that’s it.>>Yeah, let’s do it.>>So, I have the Kafka producer.>>Okay.>>Okay. Let’s start. This is just sending some, say, 1,000 messages probably->>Okay.>>-to my namespace.>>So, you are doing a
[inaudible] and you’re running that over in
the producer side.>>Yes, it’s just simple
Java app that I’m running.>>Okay.>>So, it started.>>Here comes some data.>>Now, as I’m running that.>>Neither the producer
nor the consumer know it’s Event Hubs because
they’re talking to the Kafka protocol
on that board.>>Yes, absolutely.>>Okay.>>So, I started
my consumer app as well which will start receiving the events that
the producer sent.>>Okay. Look at that.>>Yeah, and if you see you can start
seeing the data here.>>That’s so nice, and you
see the data in the portal.>>Yes.>>In the comfortable
places you already know and you already
understand and I can see that you’ve got filtering and metrics you can
go and dig into.>>Yes, the metrics can-
Azure Event Hubs has metrics using Azure
monitoring which with which you can build
beautiful dashboards. You can use that. Kafka API also gives metrics
in its metrics that also can be used in lieu with Event
Hubs so mix and match works.>>Really. So,
the administrative tools are consumers as well and they
can go and view that. So, if you’ve already
built a Kafka dashboard you can point that at
Event Hubs as well?>>Yes, it should be because if it’s talking to Kafka API, you can talk to this as well.>>That’s so cool.>>It’s as simple as that.
So, we see that happening. While this is happening, I also want to show another
interesting thing. We said Event Hubs natively
supports AMQP and now we opened up for Kafka and
many customers are wondering, “Okay, that means if I
can send with AMQP and received with Kafka or send with Kafka and receive at AMQP? “.>>That’s clever.>>I say yes. So, I have
a sample sender here.>>C#. Okay.>>C#, I know. That’s
great. That’s my language.>>I know. This is
all our language.>>I know that’s
good school though.>>Yeah. So here, I don’t have any Kafka.>>Okay.>>I only have Event Hubs. This is the native Event Hubs API and I have as you can see.>>So, it says testing for demo. Is that the same namespace?>>Same namespace I’m
using for the [inaudible].>>Okay but there’s nothing
that says Kafka here. So, C# is thinking Event Hubs.>>Correct.>>Which Java think is Kafka.>>Correct. Although,
I have created another topic so that I can
show the demo distinctively. The previous one used
topic called test.>>Okay.>>This one is using
a topic called testeh.>>Okay.>>If you see our consumer app.>>Okay.>>Now, so it’s
consuming from testeh. The previous consumer was
consuming from the test topic. This will be using the testeh.>>Okay.>>So, let’s just run. The same thing goes.
Our Event Hubs client also requires an end-point.>>Right.>>Its connection string,
it’s the same thing. So, let me run this and it
will run it in the debug mode.>>Okay.>>They should start
sending the messages. [inaudible] Yes.
This is AMQP send.>>Okay.>>Let the messages
start and then start. Okay, it’s sending. Now, I will start the-.>>So,.NET back here
and Java in front.>>Java is.>>They’re taught in
two different universes, two different
libraries, all brought together by Event
Hubs using Kafka.>>Correct.>>That’s so cool. So,
there’s so many people out there who are probably watching this and thinking, “Oh, I already have something
that I can use it with this right away” and
no code changes just-.>>No code changes,
just a conflict change. Instead of putting
your bootstrap servers, you’ll be using the namespace
or the [inaudible] for Event Hubs and connection string which is an endpoint that
you need to talk to Kafka. Furthermore, things like there are tools like MirrorMaker which mirror your Kafka clusters be it on-prem or on cloud to
other Kafka cluster. So with this protocol, you can use tools like
MirrorMaker to mirror your on-prem Kafka clusters onto Event Hubs
which is on cloud. That’s like two line in configuration change and you’re migrating data from
on-prem to cloud.>>Wow. So, is this
available now? Can people start using this or?>>Yes, it’s available at GH now. It’s available in
all Azure regions.>>Okay.>>Yeah, you can start
using it. It stopped.>>All right.>>So, this is the last
of the slide that I have. So, when people say, “Okay, we integrated
Event Hubs with Kafka.” It’s now like why would I
need that and this slide here tries to explain everything how you get
a past experience, you’re not running your
zookeeper which is orchestrated to run or to
manage your Kafka clusters, there is no flavor of zookeeper. You are not running any clusters, you’re not running anything, you’re just streaming the data, thinking of scaling, and using
your current applications.>>Fantastic. All of these things you don’t have to think about. You get to go and
have that great Kafka ecosystem as a service.>>Exactly.>>Using Event Hubs.>>Yes.>>That’s fantastic.>>The other thing
is like you get not only the wider Kafka ecosystem
with Event Hubs, you get the wide Azure Event
Hubs ecosystem with Kafka. So, the mix and match works.>>Interesting.>>So, it can just work
with Stream Analytics, for the Azure functions, and you can use the storage, the blobs and Cosmos DB and everything that Azure
offers as well.>>Well, thank you so much
for sharing that with us.>>Yeah, thank you Scott.>>All right. I’m learning
all about Azure Event Hubs and Apache Kafka today
on Azure Friday.
Hi Bill Gates
Hi MS! Hey, I love .Net and its great documentation and these videos. One request: could you make a tree chart that links the Azure videos from beginner to expert? You've made so many Azure videos (which is awesome, don't stop) that I can't tell where to start or where to go next or if I haven't accidentally skipped one I should have seen previous to where I am now. Thanks for any help.
The big question is it as good as kafka as far as performance? i am not sure folks who use kafka today will switch and give away all the producer and consumer configuration fine tuning options and replace it with a magic box in the cloud.
What’s the persistence properties of Eventhub, can I also store like 60 days and replay form there? How would a consumer manage its offsets?
Hi, I am trying same implementation in my Maven Spring Boot + Kafka application for Producer after sending data it is hitting Azure Matrix but mentioned error I am getting ,
Error:
2019-07-04 12:27:31.729 INFO 92607 — [nio-8050-exec-2] c.g.k.service.KafkaPublisherService : sending message='t'
2019-07-04 12:28:31.730 ERROR 92607 — [nio-8050-exec-2] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='null' and payload='Hi' to topic testhub:
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
And for consumer while clean build I am getting below mentioned error:
2019-07-04 12:48:26.837 WARN 94752 — [ main] o.s.w.c.s.GenericWebApplicationContext : Exception encountered during context initialization – cancelling refresh attempt: org.springframework.context.ApplicationContextException: Failed to start bean 'org.springframework.kafka.config.internalKafkaListenerEndpointRegistry'; nested exception is org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
Well explained. Thanks folks.
Good one and nice explanation on kafka data streaming platforms