In this article I will provide a quick introduction to Azure Functions and Terraform with a hands-on example which is quite easy to follow.
There are many articles out there about Azure Functions, but most fails to explain how to automate their deployment for a real world scenario using CI/CD pipelines.
This article will not get into the details of creating a a specific CI/CD pipeline, but instead focus on automating the infrastructure and code needed to deploy a Function App in Azure using a simple example. …
In this article, I will discuss the implications of running Spark with Cassandra compared to the most common use case which is using a deep storage system such as S3 of HDFS.
The goal is to understand the internals of Spark and Cassandra so you can write your code as efficient as possible to really utilize the power of these two great tools.
I will give you some tips regarding Spark tuning and Cassandra optimizations so you can maximize performance and minimize costs. I assume you already have basic knowledge of Spark and Cassandra.
First, let’s review the different deployment…
Since I got involved in DevOps and Big Data, I’ve been using two excellent but quite different programming languages: GO and Scala.
Scala is an older and more mature programming language that has found its niche in areas such concurrent programming and Big Data processing. Go, in the other hand, is a newer, simpler language created by Google to overcome the criticisms of C++; designing a language with multi core processors in mind.
Both are great languages that can achieve great performance for concurrent applications and stream processing but their design is quite different. In this article, I will try…
Big Data is complex, I have written quite a bit about the vast ecosystem and the wide range of options available. One aspect that is often ignored but critical, is managing the execution of the different steps of a big data pipeline. Quite often the decision of the framework or the design of the execution process is deffered to a later stage causing many issues and delays on the project.
You should design your pipeline orchestration early on to avoid issues during the deployment stage. …
In my previous article, I barely touched the concept of Visualization Tools. The front-end is a critical part of your data pipeline since it is the visible part of your analytical platform; no matter how good your data pipeline is, it needs reliable and performant visualization tools to achieve it purpose: provide meaningful insights so stakeholders can make important data-driven decisions.
In this article I will give a quick overview of the different visualization options available for your processed data after running your data pipeline. I will focus on open source solutions which are a cheaper and more portable option…
This article is based on my previous article “Big Data Pipeline Recipe” where I tried to give a quick overview of all aspects of the Big Data world.
This article goes a bit more in details of the different aspects that need to be taken into account when you start your Big Data journey. Quite often, architects focus on the tech stack and technical details without paying much attention to the most important aspect of this type of solutions: The Data.
I will try to summarize everything you need to take into account when dealing with large amounts of data…
In this article, I will discuss the implications of running Cassandra with Spark.
The goal is to understand the internals of Spark and Cassandra so you can write your code as efficient as possible to really utilize the power of these two great tools.
I will give you some tips regarding Cassandra optimizations with Spark so you can maximize performance and minimize costs. I assume you already have basic knowledge of Spark and Cassandra.
This is an extract from my previous article which I recommend reading after reading this one.
In this article I will give a gentle introduction to Graph Databases using Azure Cloud Platform.
I will start by giving a quick intro to graph databases explaining their use cases and the pros and cons.
Then, we will move to a practical example using Azure CosmosDB with the Apache TinkerPop Gremlin API as an example. The goal is to show how easy is to create a Serverless Graph Database in Azure to model data as graphs.
The goal of this article is to introduce a key task in NLP which is Named Entity Recognition (NER). The goal is to be able to extract common entities within a text corpus. For example, detect persons, places, medicines, dates, etc. within a given text such as an email or a document.
NER is a technique part of the of the vast NLP field which itself is part of the Machine Learning field which belongs to the parent field of AI.
In this hands-on article, we will use Spacy library to train a deep learning model based on neural networks…
The goal of this post is to dig a bit deeper into the internals of Apache Spark to get a better understanding of how Spark works under the hood, so we can write optimal code that maximizes parallelism and minimized data shuffles.
This is an extract from my previous article which I recommend reading after this one. I assume you already have basic knowledge of Spark.
In Spark you write code that transform the data, this code is lazy evaluated and, under the hood, is converted to a query plan which gets materialized when you call an action such as…
Certified Java Architect/AWS/GCP/Azure/K8s: Microservices/Docker/Kubernetes, AWS/Serverless/BigData, Kafka/Akka/Spark/AI, JS/React/Angular/PWA @JavierRamosRod