Today, we are experiencing a profound shift in how we understand distributed systems. Organizations want interconnected reactionary systems that drive microservices, machine learning, system accounting, and observability. Also, throw in some IoT to keep things interesting. Today’s world operates under a new set of constraints and tries to solve problems that were never thought possible. Everything has changed. Underpinning this change is the need for event-driven applications. Not only to drive reactions and break down silos but more fundamentally change how we design, build, and architect the systems. The common element is the event. And more so, we need to think about our systems in terms of events. Hence, event-first thinking changes everything. The big question is – how to make this a reality? How do we support existing DevOps practices and continuous delivery commitments? In this talk, Victor discusses the merits of event-first system design and how systems architecture is evolving. The journey to event-driven architecture is not a free lunch, and we need to not only commit to operating them at scale but also support the full software development lifecycle. He will cover testing practices, from unit, integration pipelines, but also touch on data quality and then patterns of adoption. It’s the patterns where we begin to understand how continuous delivery is applied, how to make it synergistically fit with existing processes, and also allow running systems 24×7 while supporting evolution.
Hello and welcome, SwampUp. Welcome to my office and I’m really excited to be here today and talking about things that I’m excited about and I hope you will be excited about as well. So, today I’m going to be talking about events, streams, all things DevOps and developer and organizational velocity. My name is Victor Gamov and I work as a developer advocate at Confluent. It’s a company that… what we do here is we develop event streaming platform based on Apache Kafka. Apache Kafka is open source project and we actively contribute to Apache Kafka. And also we do all things around working with the customers, helping them to build to adopt event streaming platforms. And today, I’m going to be talking about this.
Pre-existing condition. This is the organizational chart the things how we did in the past, we might have multiple different lines of businesses, multiple different small departments, and they work on different software to automate their processes. In order to communicate them, they might establish some of the API or might be providing access to this data through, I don’t know, maybe even accessing the right databases and things like that. Those things created are direct or indirect communication that prior – inventing Gulf venturing platforms and the things that allows them to communicate is through the backbone – this organizational structure might look like this because there are many links, many communications developed in our organization in order to get data from multiple places in order to actively and efficiently communicate.
This is not where I think that many people realize that today… they’re trying to break this down, untangle this and one of the paradigms that is suitable – to fix this or not to fix but at least to help to make some sense out of this noise – is event streaming paradigm which is essentially, allows or like changing the way we think about events that happened in the system through thinking about events not as something that to like some stored records, but rather continuously updating stream of those events. And this continuously updated stream will contain some information about history about the progression of the of the process and so forth and so on. So, let me give you example. So, event essentially, is the shared narrative about some about some business or about some of the domain and event can be anything event is something that allows us to think when it is a human being you thinking about, you can much easier think about events than about like database records, for example, because something happened, there’s existing fact, this fact is immutable, because you can’t change the future, even if you’d like – you know, remember that conversation with your significant other and you said something that you’re not supposed to say. But you cannot do anything to change it only sent new event. And this is the thing with events around us as well. There’s some of the sale happen. Some invoice was issued, payment by customer was made.
And some of the customers register or refer other customers. So those things are happening every day and they’re happening with the business and the capture those events in order to we need to capture those events in order to perform some of the business operations. Events are sequence by time same thing as in real life you can always restore from your memories like this sequence of events, same thing the order of these events can be restored from the streaming platform. Ordering is historically somehow important to us to people for business product process for that matter. For example, if we will talk about… if we will talk about things like a credit card transaction. Yeah, order of credit card transactions is important. So, this is something that we want to preserve in the streaming platform allows us to do so.
Now, also thinking in event in the event paradigm allows to simplify architectures or not maybe simplify or like maybe not, it will not be looking as a simple thing from the very beginning. Because, you know, obviously in monoliths, it’s always very easy to have multiple things in one place when they’re small. When you start growing this system started growing this small ethical system, it would be not very easy to fit all architecture of the system in the hands of one single person, you need to have multiple people they know multiple details about the systems. But with the streaming platform, we can establish communication between microservices so we can break down this monolithic application into the set of services. That can be evolved and can be developed separately. And the event streaming platform will provide the backbone for communication of these services. And they might be, you know, operating on the same view of the world. So, in this case, it’s not only backbone, but also single source of truth for the services and they can reuse and apply this data to the different aspects. Say, we might have event that user register and this can trigger multiple services to react on something, there will be an email service that will send greetings email it would be some analytic service that will put some of the start a complete some of the some of the points for the new users, like if users are doing like more, more buying more, and we can accumulate more points for his loyalty program. Now, so things are slightly getting more complex. So when we start looking to this kind of system, we need to develop the practices how to develop and going forward for this system.
So, let’s break down some of the things that I like to start talking about when I’m talking about the venturing platform and how to developers can embark this journey. So, a couple things here, there’s two – they not necessarily contradict to each other, but – there’s two things that needs to be taken into account. I’m not saying they are polar, but because you know, obviously there are some business goals in order to operate successfully this business. And there are some goals that we want to implement from perspective of DevOps. Once you have CTO this system has up and running 24 by 7, it needs to be cost effective. In order to perform this business, we want to obviously get a profit. We don’t want to just like burn money agile in real time, and we need to have some deterministic about what is going on there or the state of the world what is going on there. So this has goals to the systems. But DevOps, it’s more operated on a technical aspect and architectural aspect and some of the culture aspect right. So the how to implement the processes efficient and how to apply tools to make this process efficient? So, integrate with multiple tools integrate with the frameworks that allow you to scale certain applications more efficiently.
Now, but when we go into tools and strategy. The answer is, always depends that we need to take into consideration many, many things. What kind of platform are we running? Is this cloud or is this pram? Do we depend on vendor or we don’t depend on just like a standard? Is there a standard for certain technology? Should we apply for like, embrace the server less, and since we’re talking about the bands, there’s a ability to look to these events and standardize those events on something like cloud events, which is [CNCF] initiative to standardize cloud events in order to get this standard envelope for transferring this data. So, this is how this is how the things work in terms of like, the bringing some tools and making things happen. Not in the DevOps as a kind of like technology. So, but there’s also some of the things that come from the world of Kafka. So Kafka is an event streaming platform that provides this backbone to storing this events and there are certain patterns and anti patterns that people need to also take into account while implementing this Kafka event streaming platform. So essentially, the Kafka is really agnostic on the data that you put in there. However, the business applications, these microservices, they might be very much dependent on the nature of this data – so this is why I was talking about the format of this, like when I was talking about things like cloud events. Cloud events define a schema.
So, in this case, the schema needs to be exposed to multiple tools, multiple components. So multiple services that might be written in different languages, they need to have access to the same schema. The choreography for all of the services and how services would know that they need to perform certain actions like choreography, over orchestration. So, with the traditional workflow-like tools, there’s always orchestration and/or someone who will controlling the things. Those services needs to be agnostic. So, in this case, it’s more like a choreography. They will work together based on the certain requirements that certain services have. Error handling and dealing with some bad data. It’s also some of the pattern that people need to think how to implement because the Kafka is a streaming platform, it’s not [Easy?] it doesn’t provide you certain ways. However, certain things needs to be implicated. We are running this in a distributed world, in a synchronous world, this is something that needs to be taken into account.
Implementing synchronous communication interfacing on top of a synchronous bus, it might be considered as some of the some of the dependents. So, for example, when we tried to do a request response and implement this on top of the queue like structure like Kafka, maybe this is not the right thing to do. And ability to provision these in the infrastructure for Kafka brokers, it is also required certain automation step because you want to have a fine-tuned machine, you don’t want to spend the time to do manual stuff, because by the end of the day, Kafka is becoming a central neural system and without proper automation in the proper self-tuning, it would be very difficult to use this and operate this tool in the real world. So, how this, how these things the feats in the CI/CD and how the application is to be delivered from development to production? So, when we’re talking about the traditional test lifecycle, usually the functional end unit testing, those are bits that developers produce during development, right? This is the first phase.
And this is what we call developer centric testing, developers develop these tests while developing certain features. While we’re running this, the complex multi micro service system, multiple aspects that were tested in isolation as a unit test needs to be tested in integration. Once this, first phase of integration testing is done, we usually need to move to the phase where we can test the system with the data that very close to production. So we are going through the different environment and we’re getting data flows close to the one we have in the environment. And last but not least, performance testing and make sure that new features will not introduce regression and will not losing any substantial performance and some other things. So, this is like your typical testing lifecycle. Now, some of the things that businesses actually demands, the things that we mentioned on the previous slides that things are not maybe 100% technical requirements However, they are requirements of modern business. So, how the system will work under unprecedented loads.
We’re rolling out new payment system for our black Friday and we want to make sure that the spikes in activity of user activity in our system will not interfere with business process. How to perform 24 by seven business continuity without having this window of downtime. So rolling upgrades – green and blue deployment – these types of scenarios. Providing some of the elasticity appearance. So, like if the load is going up, can we increase the number of the instances of the service in order to serve this load and so forth and so on? So those are goals that were dictated by business and needs to be implemented from perspective of technology. Now, and there’s a couple of things that I would recommend you to look in terms of technology, in terms of particular frameworks that allow us to implement those tests on different stages of development.
Most of the time or mostly development of business logic for this microservices, in particular Java into talking about Kafka implemented with the framework called Kafka streams. Kafka streams is an embedded framework that is part of Apache Kafka and it actually comes to some batteries included. So, for example, for unit tests, that includes the [INAUDIBLE] driver that doesn’t require any environment available to perform this test. Going forward in order to perform live testing, there’s a couple things that can be done. There’s a framework – embedded Kafka – that represents the live protocol where the applications can communicate with Kafka. Or it can be used, the containers can be used, in order to perform this integration testing. Specifically, when we go into environmental integration tests, test containers is actually showing discontinuous. They’re also like a Java framework that allows you to use j-unit like syntax, especially j unit syntax, by writing integration tests and it will use actual containers with actual software.
It’s not looking to be fake or some embedded version. It’s actual containers that you can use to test your application against the real software. And some of the aspects of creators also can be used to spin up another new environment where all your bits are there and things like, you know, Helm, that allows you to define your deployment strategy, or deployment and application structure allows you to quickly roll this, introducing some of the fault injection frameworks that allows you to test for those cases where business are interested in, business continuing and some of the regression and performance testing can come from the things like [INAUDIBLE] operators that allows to the self-balancing and self-healing of certain components like Kafka. Now, ideally in an ideal world, we want to see if we can have an automatic delivery of this of the several things once all these steps in the pipeline are executed successfully, but like how do we scale this, like how we can implement this in a world of – how can we implement this in production? So, here I want to talk quickly about like four architectural principles for building a production ready invention platforms. So, the first one is the core business function. So, in this case, this is this is essentially those micro services that implement this business function. This is like your stream processor so data comes through the streaming system, and there’s a core processor that does some processing.
Like for example, implementing the payment system, we get the event about the payment we do some processing and after that we’re sending in another events saying event was processed. Next thing is that we need to improve meant a so called a trust plane. So trust plane allows to collect business the metrics that business would be interested in for example, how many payments was processed during the time. Those metrics might have a direct impact on the business and this is something that we can also have through integrating this or – result of one – like for example core business function, can send as an input into this core business processor. Also, data quality, like if some of the payments were not processed in time, like what’s going to be impact for the business, so we need to collect these kind of metrics and present those also somehow. Control plane … probably one of the hardest things to implement is the control plane, because it allows us to not only deliver but also perform day to day operational aspects of the of the system. How these micro services will coordinate with how its service discovery will be implemented. How these beats will arrive from your CI/CD pipeline and so far and so on. And some of the orchestration of the… of the underlying components like provisioning, operating system provisioning, maybe, you know some containers or pods talking about specifics and we’re talking about like [INAUDIBLE] this is… Usually the [INAUDIBLE] is these days becoming a platform for building this control plane for your business processes. And the ops plane, like operational plane, that allows to perform and create the observability of the system: health checks, error logs, they perform audit for different, I don’t know, like regulatory reasons and so forth and so on, perform data lineage, we know like where the data come from and how this data evolved and things like that, and also dealing with the bad data by sending those into that letter Q. So, and here’s an example, how this can be implemented. So, core business function in this particular case is like a payment processing.
We have a stream of events that come into a payments topic, we do some performance computation, and this result will be available in another topic. So. if we have another system that will be dependent on this data they will consume from the stock. Next thing is that of a trust plane. Example of the trust plane we need to have some business metrics about number of confirmed events, and in this particular case, the output of core business functions will be consumed by this trust plane. The next thing is that for a control plane, in order to perform deployment and like updating rules how this premium processing thing happened, will have another processor or another streaming application that will be getting the information through status topic and after that perform internal application about the status and also join with data that comes from some of the payment system. For example, we see increased load of the messages that come in for new payments. So, we need to spin up another instance of the payment processor. So, in this case, the control plane will react and operations plane in this particular case, will be performing observability of overall system and report and alert if something will go wrong.
So, for a little bit like closer to operations plane and this is where this is where we go into the world of dealing with some weight but data how we can capture application login and providers logins created, you know dashboards where everything will be there. Error cues if some bad data in some of the bad process things happened, we need to also log this we cannot just spin this out into the on the console and move on like… and after that we will need to go and grab the log files and find where it is. Audit logs provide this retention information and things like that. So, all these kind of things can be done through this operations plane. And most importantly, like the main point is here, all these microservices are implemented in a sense that those elements of this control plane are elements of the individual microservices. They are… elements that represent individual microservices in steel, Kafka event streaming platforms provide this backbone, where it stores the whole history like all this state of the world like source of truth.
So, in this case, events carry different information from one place events evolve, and they allow your system to evolve. So, a few key takeaways from this presentation: events are becoming an API that self-descripted the way of thing, that allows a system to be modeled in flavor of domain driven design and thinking about how these events can be evolved. Decoupling system into events in order to provide the stream of events rather than having silos of the data records where the things which is going in and inserting some tables, and after that, we’re updating this table, and now we don’t know what was the history and how this event ended up here. With this strategy, business goals needs to be taken into account; not only technical aspects, but also business goals and some of the things that business will be interested in and most importantly, automate these things as well. And apply the modern testing technologies for that. Test containers is amazing.
It in general, continue to realize technology changing the way how we’re dealing with the software and not only in production, but also for testing and for development. And so, those – the bounded context – the small individual pieces of work, small individual microservices can be larger… or a larger monolithical system can be broken down into small micro services with their own small bounded context. And look for these patterns by implementing a trust plane a control plane, ops plane and obviously, the base score function. And those patterns allows you to, be successful in implementing streaming platforms. So, if you want to learn a little bit more about Kafka in the world of event streaming, there’s a website called developer.conference.io, where we have a set of tools and examples and the different podcasts, videos, everything – in order to help you to learn Kafka. And there are some bunch of books that you can read, if you’re interested in developing event driven micro services and in general, change the way in perspective, how you want to think about your data and data in your organization, and so far, and so on. So thank you so much for your time. I really enjoyed this. talking to you about the event streaming platform. And let’s get to Q&A right now. Thank you.