Flink in a Nutshell

Javier Ramos
9 min readFeb 10, 2019

In this post I will try to explain why Flink is gaining so much attention. I will review Flink from the AI and DevOps point of view.

Introduction

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. WOW, okay, what does this mean? Well, basically it can process reliably lots of data in real time and it does it really fast. The next question is: Why do I need huge amounts of data and processing power? Let’s rewind a bit…

I decided to write this post after reading this great explanation about AI. I already talked about streaming platforms and AI in the past, and both are related. AI is a broad topic but most business use cases resolve around Machine Learning classification algorithms that help to predict “things” and enhance customer engagement. So AI is just a thing labeler, you present the data to the model and the model adds labels. This is not as fancy as you may have though but it is extremely useful. Common tasks like recognizing someone are extremely difficult to program in a computer. What AI does instead, is getting a lot of examples with already defined labels, train the model and then use it to make predictions. So, you let the computer decide the best way to recognize patterns in the data to minimize the error and make good predictions. This simple model allows you to enhance your platform and add features like real time suggestions, chat bots, recommendation systems, etc. This creates an immerse customer experience for your platform, bear in mind that nowadays only the companies that put the customer before their product will thrive; and big data and AI are the main drivers of customer engagement.

You may be asking, what does this have to do with Flink? First, AI requires a lot of data, hence the rise of big data. Second, most AI programs need to react to data feeds in real time(click streams, internet of things, etc.). So, in one hand you have big data and in the other one real time streams. Customers want the info now, not next day; it is crucial to move from batch to real time processing to achieve better user experience this is giving rise to new…

--

--

Javier Ramos

Certified Java Architect/AWS/GCP/Azure/K8s: Microservices/Docker/Kubernetes, AWS/Serverless/BigData, Kafka/Akka/Spark/AI, JS/React/Angular/PWA @JavierRamosRod