Tools Resources

DZone's Featured Tools Resources

Is Podman a Drop-In Replacement for Docker?

By Gunter Rotsaert CORE

In many places, you can read that Podman is a drop-in replacement for Docker. But is it as easy as it sounds? In this blog, you will start with a production-ready Dockerfile and execute the Podman commands just like you would do when using Docker. Let’s investigate whether this works without any problems! Introduction Podman is a container engine, just as Docker is. Podman, however, is a daemonless container engine, and it runs containers by default as rootless containers. This is more secure than running containers as root. The Docker daemon can also run as a non-root user nowadays. Podman advertises on its website that Podman is a drop-in replacement for Docker. Just add alias docker=podman , and you will be fine. Let’s investigate whether it is that simple. In the remainder of this blog, you will try to build a production-ready Dockerfile for running a Spring Boot application. You will run it as a single container, and you will try to run two containers and have some inter-container communication. In the end, you will verify how volumes can be mounted. One of the prerequisites for this blog is using a Linux operating system. Podman is not available for Windows. The sources used in this blog can be found at GitHub. The Dockerfile you will be using runs a Spring Boot application. It is a basic Spring Boot application containing one controller which returns a hello message. Build the jar: Shell $ mvn clean verify Run the jar: Shell $ java -jar target/mypodmanplanet-0.0.1-SNAPSHOT.jar Check the endpoint: Shell $ curl http://localhost:8080/hello Hello Podman! The Dockerfile is based on a previous blog about Docker best practices. The file 1-Dockerfile-starter can be found in the Dockerfiles directory. Dockerfile FROM eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f AS builder WORKDIR application ARG JAR_FILE COPY target/${JAR_FILE} app.jar RUN java -Djarmode=layertools -jar app.jar extract FROM eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f WORKDIR /opt/app RUN addgroup --system javauser && adduser -S -s /usr/sbin/nologin -G javauser javauser COPY --from=builder application/dependencies/ ./ COPY --from=builder application/spring-boot-loader/ ./ COPY --from=builder application/snapshot-dependencies/ ./ COPY --from=builder application/application/ ./ RUN chown -R javauser:javauser . USER javauser ENTRYPOINT ["java", "org.springframework.boot.loader.JarLauncher"] Prerequisites Prerequisites for this blog are: Basic Linux knowledge, Ubuntu 22.04 is used during this post; Basic Java and Spring Boot knowledge; Basic Docker knowledge; Installation Installing Podman is quite easy. Just run the following command. Shell $ sudo apt-get install podman Verify the correct installation. Shell $ podman --version podman version 3.4.4 \You can also install podman-docker, which will create an alias when you use docker in your commands. It is advised to wait for the conclusion of this post before you install this one. Build Dockerfile The first thing to do is to build the container image. Execute from the root of the repository the following command. Shell $ podman build . --tag mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT -f Dockerfiles/1-Dockerfile-starter --build-arg JAR_FILE=mypodmanplanet-0.0.1-SNAPSHOT.jar [1/2] STEP 1/5: FROM eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f AS builder [2/2] STEP 1/10: FROM eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f Error: error creating build container: short-name "eclipse-temurin@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f" did not resolve to an alias and no unqualified-search registries are defined in "/etc/containers/registries.conf" This returns an error while retrieving the base image. The error message refers to /etc/containers/registries.conf. The following is stated in this file. Plain Text # For more information on this configuration file, see containers-registries.conf(5). # # NOTE: RISK OF USING UNQUALIFIED IMAGE NAMES # We recommend always using fully qualified image names including the registry # server (full dns name), namespace, image name, and tag # (e.g., registry.redhat.io/ubi8/ubi:latest). Pulling by digest (i.e., # quay.io/repository/name@digest) further eliminates the ambiguity of tags. # When using short names, there is always an inherent risk that the image being # pulled could be spoofed. For example, a user wants to pull an image named # `foobar` from a registry and expects it to come from myregistry.com. If # myregistry.com is not first in the search list, an attacker could place a # different `foobar` image at a registry earlier in the search list. The user # would accidentally pull and run the attacker's image and code rather than the # intended content. We recommend only adding registries which are completely # trusted (i.e., registries which don't allow unknown or anonymous users to # create accounts with arbitrary names). This will prevent an image from being # spoofed, squatted or otherwise made insecure. If it is necessary to use one # of these registries, it should be added at the end of the list. To conclude, it is suggested to use a fully qualified image name. This means that you need to change the lines containing: Dockerfile eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f Into: Dockerfile docker.io/eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f You just add docker.io/ to the image name. A minor change, but already one difference compared to Docker. The image name is fixed in file 2-Dockerfile-fix-shortname, so let’s try building the image again. Shell $ podman build . --tag mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT -f Dockerfiles/2-Dockerfile-fix-shortname --build-arg JAR_FILE=mypodmanplanet-0.0.1-SNAPSHOT.jar [1/2] STEP 1/5: FROM docker.io/eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f AS builder Trying to pull docker.io/library/eclipse-temurin@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f... Getting image source signatures Copying blob 72ac8a0a29d6 done Copying blob f56be85fc22e done Copying blob f8ed194273be done Copying blob e5daea9ee890 done [2/2] STEP 1/10: FROM docker.io/eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f Error: error creating build container: writing blob: adding layer with blob "sha256:f56be85fc22e46face30e2c3de3f7fe7c15f8fd7c4e5add29d7f64b87abdaa09": Error processing tar file(exit status 1): potentially insufficient UIDs or GIDs available in user namespace (requested 0:42 for /etc/shadow): Check /etc/subuid and /etc/subgid: lchown /etc/shadow: invalid argument Now there is an error about potentially insufficient UIDs or GIDs available in the user namespace. More information about this error can be found here. It is very well explained in that post, and it is too much to repeat all of this in this post. The summary is that the image which is trying to be pulled, has files owned by UIDs over 65.536. Due to that issue, the image would not fit into rootless Podman’s default UID mapping, which limits the number of UIDs and GIDs available. So, how to solve this? First, check the contents of /etc/subuid and /etc/subgid. In my case, the following is the output. For you, it will probably be different. Shell $ cat /etc/subuid admin:100000:65536 $ cat /etc/subgid admin:100000:65536 The admin user listed in the output has 100.000 as the first UID or GID available, and it has a size of 65.536. The format is user:start:size. This means that the admin user has access to UIDs or GIDs 100.000 up to and including 165.535. My current user is not listed here, and that means that my user can only allocate 1 UID en 1 GID for the container. That 1 UID/GID is already taken for the root user in the container. If a container image needs an extra user, there will be a problem, as you can see above. This can be solved by adding UIDs en GIDs for your user. Let’s add values 200.000 up to and including 265.535 to your user. Shell $ sudo usermod --add-subuids 200000-265535 --add-subgids 200000-265535 <replace with your user> Verify the contents of both files again. The user is added to both files. Shell $ cat /etc/subgid admin:100000:65536 <your user>:200000:65536 $ cat /etc/subuid admin:100000:65536 <your user>:200000:65536 Secondly, you need to run the following command. Shell $ podman system migrate Try to build the image again, and now it works. Shell $ podman build . --tag mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT -f Dockerfiles/2-Dockerfile-fix-shortname --build-arg JAR_FILE=mypodmanplanet-0.0.1-SNAPSHOT.jar [1/2] STEP 1/5: FROM docker.io/eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f AS builder Trying to pull docker.io/library/eclipse-temurin@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f... Getting image source signatures Copying blob f56be85fc22e done Copying blob f8ed194273be done Copying blob 72ac8a0a29d6 done Copying blob e5daea9ee890 done Copying config c74d412c3d done Writing manifest to image destination Storing signatures [1/2] STEP 2/5: WORKDIR application --> d4f0e970dc1 [1/2] STEP 3/5: ARG JAR_FILE --> ca97dcd6f2a [1/2] STEP 4/5: COPY target/${JAR_FILE} app.jar --> 58d88cfa511 [1/2] STEP 5/5: RUN java -Djarmode=layertools -jar app.jar extract --> 348cae813a4 [2/2] STEP 1/10: FROM docker.io/eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f [2/2] STEP 2/10: WORKDIR /opt/app --> 4118cdf90b5 [2/2] STEP 3/10: RUN addgroup --system javauser && adduser -S -s /usr/sbin/nologin -G javauser javauser --> cd11f346381 [2/2] STEP 4/10: COPY --from=builder application/dependencies/ ./ --> 829bffcb6c7 [2/2] STEP 5/10: COPY --from=builder application/spring-boot-loader/ ./ --> 2a93f97d424 [2/2] STEP 6/10: COPY --from=builder application/snapshot-dependencies/ ./ --> 3e292cb0456 [2/2] STEP 7/10: COPY --from=builder application/application/ ./ --> 5dd231c5b51 [2/2] STEP 8/10: RUN chown -R javauser:javauser . --> 4d736e8c3bb [2/2] STEP 9/10: USER javauser --> d7a96ca6f36 [2/2] STEP 10/10: ENTRYPOINT ["java", "org.springframework.boot.loader.JarLauncher"] [2/2] COMMIT mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT --> 567fd123071 Successfully tagged localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT 567fd1230713f151950de7151da82a19d34f80af0384916b13bf49ed72fd2fa1 Verify the list of images with Podman just like you would do with Docker: Shell $ podman images REPOSITORY TAG IMAGE ID CREATED SIZE localhost/mydeveloperplanet/mypodmanplanet 0.0.1-SNAPSHOT 567fd1230713 2 minutes ago 209 MB Is Podman a drop-in replacement for Docker for building a Dockerfile? No, it is not a drop-in replacement because you needed to use the fully qualified image name for the base image in the Dockerfile, and you needed to make changes to the user namespace in order to be able to pull the image. Besides these two changes, building the container image just worked. Start Container Now that you have built the image, it is time to start a container. Shell $ podman run --name mypodmanplanet -d localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT The container has started successfully. Shell $ podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 27639dabb573 localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT 18 seconds ago Up 18 seconds ago mypodmanplanet You can also inspect the container logs. Shell $ podman logs mypodmanplanet . ____ _ __ _ _ /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \ ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \ \\/ ___)| |_)| | | | | || (_| | ) ) ) ) ' |____| .__|_| |_|_| |_\__, | / / / / =========|_|==============|___/=/_/_/_/ :: Spring Boot :: (v3.0.5) 2023-04-22T14:38:05.896Z INFO 1 --- [ main] c.m.m.MyPodmanPlanetApplication : Starting MyPodmanPlanetApplication v0.0.1-SNAPSHOT using Java 17.0.6 with PID 1 (/opt/app/BOOT-INF/classes started by javauser in /opt/app) 2023-04-22T14:38:05.898Z INFO 1 --- [ main] c.m.m.MyPodmanPlanetApplication : No active profile set, falling back to 1 default profile: "default" 2023-04-22T14:38:06.803Z INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http) 2023-04-22T14:38:06.815Z INFO 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat] 2023-04-22T14:38:06.816Z INFO 1 --- [ main] o.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/10.1.7] 2023-04-22T14:38:06.907Z INFO 1 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext 2023-04-22T14:38:06.910Z INFO 1 --- [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 968 ms 2023-04-22T14:38:07.279Z INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path '' 2023-04-22T14:38:07.293Z INFO 1 --- [ main] c.m.m.MyPodmanPlanetApplication : Started MyPodmanPlanetApplication in 1.689 seconds (process running for 1.911) Verify whether the endpoint can be accessed. Shell $ curl http://localhost:8080/hello curl: (7) Failed to connect to localhost port 8080 after 0 ms: Connection refused That’s not the case. With Docker, you can inspect the container to see which IP address is allocated to the container. Shell $ podman inspect mypodmanplanet | grep IPAddress "IPAddress": "", It seems that the container does not have a specific IP address. The endpoint is also not accessible at localhost. The solution is to add a port mapping when creating the container. Stop the container and remove it. Shell $ podman stop mypodmanplanet mypodmanplanet $ podman rm mypodmanplanet 27639dabb5730d3244d205200a409dbc3a1f350196ba238e762438a4b318ef73 Start the container again, but this time with a port mapping of internal port 8080 to an external port 8080. Shell $ podman run -p 8080:8080 --name mypodmanplanet -d localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Verify again whether the endpoint can be accessed. This time it works. Shell $ curl http://localhost:8080/hello Hello Podman! Stop and remove the container before continuing this blog. Is Podman a drop-in replacement for Docker for running a container image? No, it is not a drop-in replacement. Although it was possible to use exactly the same commands as with Docker, you needed to explicitly add a port mapping. Without the port mapping, it was not possible to access the endpoint. Volume Mounts Volume mounts and access to directories and files outside the container and inside a container often lead to Permission Denied errors. In a previous blog, this behavior is extensively described for the Docker engine. It is interesting to see how this works when using Podman. You will map an application.properties file in the container next to the jar file. The Spring Boot application will pick up this application.properties file. The file configures the server port to port 8082, and the file is located in the directory properties in the root of the repository. Properties files server.port=8082 Run the container with a port mapping from internal port 8082 to external port 8083 and mount the application.properties file into the container directory /opt/app where also the jar file is located. The volume mount has the property ro in order to indicate that it is a read-only file. Shell $ podman run -p 8083:8082 --volume ./properties/application.properties:/opt/app/application.properties:ro --name mypodmanplanet localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Verify whether the endpoint can be accessed and whether it works. Shell $ curl http://localhost:8083/hello Hello Podman! Open a shell in the container and list the directory contents in order to view the ownership of the file. Shell $ podman exec -it mypodmanplanet sh /opt/app $ ls -la total 24 drwxr-xr-x 1 javauser javauser 4096 Apr 15 10:33 . drwxr-xr-x 1 root root 4096 Apr 9 12:57 .. drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 BOOT-INF drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 META-INF -rw-r--r-- 1 root root 16 Apr 15 10:24 application.properties drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 org With Docker, the file would have been owned by your local system user, but with Podman, the file is owned by root. Let’s check the permissions of the file on the local system. Shell $ ls -la total 12 drwxr-xr-x 2 <myuser> domain users 4096 apr 15 12:24 . drwxr-xr-x 8 <myuser> domain users 4096 apr 15 12:24 .. -rw-r--r-- 1 <myuser> domain users 16 apr 15 12:24 application.properties As you can see, the file on the local system is owned by <myuser>. This means that your host user, who is running the container, is seen as a user root inside of the container. Open a shell in the container and try to change the contents of the file application.properties. You will notice that this is not allowed because you are a user javauser. Shell $ podman exec -it mypodmanplanet sh /opt/app $ vi application.properties /opt/app $ whoami javauser Stop and remove the container. Run the container, but this time with property U instead of ro. The U suffix tells Podman to use the correct host UID and GID based on the UID and GID within the container to change the owner and group of the source volume recursively. Shell $ podman run -p 8083:8082 --volume ./properties/application.properties:/opt/app/application.properties:U --name mypodmanplanet localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Open a shell in the container, and now the user javauser is the owner of the file. Shell $ podman exec -it mypodmanplanet sh /opt/app $ ls -la total 24 drwxr-xr-x 1 javauser javauser 4096 Apr 15 10:41 . drwxr-xr-x 1 root root 4096 Apr 9 12:57 .. drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 BOOT-INF drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 META-INF -rw-r--r-- 1 javauser javauser 16 Apr 15 10:24 application.properties drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 org On the local system, a different UID and GID than my local user have taken ownership. Shell $ ls -la properties/ total 12 drwxr-xr-x 2 <myuser> domain users 4096 apr 15 12:24 . drwxr-xr-x 8 <myuser> domain users 4096 apr 15 12:24 .. -rw-r--r-- 1 200099 200100 16 apr 15 12:24 application.properties This time, changing the file on the local system is not allowed, but it is allowed inside the container for user javauser. Is Podman a drop-in replacement for Docker for mounting volumes inside a container? No, it is not a drop-in replacement. The file permissions function is a bit different than with the Docker engine. You need to know the differences in order to be able to mount files and directories inside containers. Pod Podman knows the concept of a Pod, just like a Pod in Kubernetes. A Pod allows you to group containers. A Pod also has a shared network namespace, and this means that containers inside a Pod can connect to each other. More information about container networking can be found here. This means that Pods are the first choice for grouping containers. When using Docker, you will use Docker Compose for this. There exists something like Podman Compose, but this deserves a blog in itself. Let’s see how this works. You will set up a Pod running two containers with the Spring Boot application. First, you need to create a Pod. You also need to expose the ports you want to be accessible outside of the Pod. This can be done with the -p argument. And you give the Pod a name, hello-pod in this case. Shell $ podman pod create -p 8080-8081:8080-8081 --name hello-pod When you list the Pod, you notice that it already contains one container. This is the infra container. This infra container holds the namespace in order that containers can connect to each other, and it enables starting and stopping containers in the Pod. The infra container is based on the k8s.gcr.io/pause image. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS dab9029ad0c5 hello-pod Created 3 seconds ago aac3420b3672 1 $ podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES aac3420b3672 k8s.gcr.io/pause:3.5 4 minutes ago Created 0.0.0.0:8080-8081->8080-8081/tcp dab9029ad0c5-infra Create a container mypodmanplanet-1 and add it to the Pod. By means of the --env argument, you change the port of the Spring Boot application to port 8081. Shell $ podman create --pod hello-pod --name mypodmanplanet-1 --env 'SERVER_PORT=8081' localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT env Start the Pod. Shell $ podman pod start hello-pod Verify whether the endpoint can be reached at port 8081 and verify that the endpoint at port 8080 cannot be reached. Shell $ curl http://localhost:8081/hello Hello Podman! $ curl http://localhost:8080/hello curl: (56) Recv failure: Connection reset by peer Add a second container mypodmanplanet-2 to the Pod, this time running at the default port 8080. Shell $ podman create --pod hello-pod --name mypodmanplanet-2 localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Verify the Pod status. It says that the status is Degraded. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS dab9029ad0c5 hello-pod Degraded 9 minutes ago aac3420b3672 3 Take a look at the containers. Two containers are running, and a new container is just created. That is the reason the Pod has the status Degraded. Shell $ podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES aac3420b3672 k8s.gcr.io/pause:3.5 11 minutes ago Up 2 minutes ago 0.0.0.0:8080-8081->8080-8081/tcp dab9029ad0c5-infra 321a62fbb4fc localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT env 3 minutes ago Up 2 minutes ago 0.0.0.0:8080-8081->8080-8081/tcp mypodmanplanet-1 7b95fb521544 localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT About a minute ago Created 0.0.0.0:8080-8081->8080-8081/tcp mypodmanplanet-2 Start the second container and verify the Pod status. The status is now Running. Shell $ podman start mypodmanplanet-2 $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS dab9029ad0c5 hello-pod Running 12 minutes ago aac3420b3672 3 Both endpoints can now be reached. Shell $ curl http://localhost:8080/hello Hello Podman! $ curl http://localhost:8081/hello Hello Podman! Verify whether you can access the endpoint of container mypodmanplanet-1 from within mypodmanplanet-2. This also works. Shell $ podman exec -it mypodmanplanet-2 sh /opt/app $ wget http://localhost:8081/hello Connecting to localhost:8081 (127.0.0.1:8081) saving to 'hello' hello 100% |***********************************************************************************************************************************| 13 0:00:00 ETA 'hello' saved Cleanup To conclude, you can do some cleanup. Stop the running Pod. Shell $ podman pod stop hello-pod The Pod has the status Exited now. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS dab9029ad0c5 hello-pod Exited 55 minutes ago aac3420b3672 3 All containers in the Pod are also exited. Shell $ podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES aac3420b3672 k8s.gcr.io/pause:3.5 56 minutes ago Exited (0) About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp dab9029ad0c5-infra 321a62fbb4fc localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT env 48 minutes ago Exited (143) About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp mypodmanplanet-1 7b95fb521544 localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT 46 minutes ago Exited (143) About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp mypodmanplanet-2 Remove the Pod. Shell $ podman pod rm hello-pod The Pod and the containers are removed. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS $ podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES Conclusion The bold statement that Podman is a drop-in replacement for Docker is not true. Podman differs from Docker on certain topics like building container images, starting containers, networking, volume mounts, inter-container communication, etc. However, Podman does support many Docker commands. The statement should be Podman is an alternative to Docker. This is certainly true. It is important for you to know and understand the differences before switching to Podman. After this, it is definitely a good alternative. More

Guide To Selecting the Right GitOps Tool - Argo CD or Flux CD

By Jyoti Sahoo

Kubernetes adoption is on the rise, which has proportionately increased the adoption of tools that perform deployments specifically into Kubernetes. But Kubernetes deployments are challenging if one relies on traditional deployment practices. It is complex and time-consuming, especially for large and complex applications. To simplify deployment into Kubernetes, developers prefer to use GitOps, a pull-based deployment mechanism. GitOps is made possible with tools like Argo CD and Flux CD, which can automate the deployment process into Kubernetes. We will understand the difference and similarities between Argo CD and Flux CD on the concepts below: Installation Interface Application Deployment Deployment strategies Feedback notification Security Multi-tenancy What Is Flux CD? Flux CD is an open-source Continuous Delivery tool for automated Git-based Kubernetes deployments. It uses a declarative approach to manage Kubernetes deployments using Git repositories to store application configurations. It syncs the Kubernetes clusters with the Git configuration repository. Flux CD also provides several other features, such as the ability to perform automated rollbacks and to integrate with Helm and Kustomize to provide a more flexible deployment process. It also allows users to perform deployment strategies like Canary and Blue-Green deployments. Flux Architecture and Components Flux comprises a set of Kubernetes controllers. Kubernetes controllers handle the lifecycle of objects in Kubernetes. Infrastructure and workload dependency is inbuilt, which lets you sequence your prerequisites before an application is deployed. For example, a database must be up and running before the application gets deployed to ensure the application works fine. Source Controllers Source controllers monitor a version-controlled repository like GitHub for detecting changes in the application’s source code and configuration. On detecting changes source controller initiates a deployment process to update the application's deployment manifests and apply the changes to the Kubernetes cluster. It can also trigger automated tests and perform security scans based on validation to ensure consistency and reliability. Kustomize Controller The Kustomize controller manages the deployment of Kubernetes resources using Kustomize. The Kustomize controller in Flux CD monitors the source repository for change and automatically applies the appropriate updates and code upgrades to the base manifests to get the desired configuration. Helm Controller In Flux CD, the Helm controller is a component that manages the deployment of Kubernetes resources using Helm, a package manager for Kubernetes. The Helm controller in Flux CD monitors a Helm chart repository for changes, detects changes, and automatically deploys the updated chart to the Kubernetes cluster. It can also roll back deployment to a previous version in case of errors or issues. Notification Controller In Flux CD, the Notification controller is a component that sends notifications based on events that occur within the Flux CD system. The Notification controller can send messages to various channels, such as Slack, email, or other messaging platforms, to inform team members about application deployment status changes. Image Reflector Controller In Flux CD, the Image Reflector controller is a component that automatically updates the Kubernetes manifests with the latest available container image from a registry. The Image Reflector controller monitors the specified container image registry for new photos and updates the Kubernetes manifests with the latest image tag. Image Automation Controller The Image Automation controller monitors a container image repository for new versions of images and automatically triggers the update of the application deployment manifests with the new image tag. How Flux CD Works As Flux also works on the concept of GitOps, it can automatically deploy the changes in the application found in the Git Repository to the respective Kubernetes clusters ensuring up-to-date applications. Developers and DevOps must update Kubernetes manifests and push them to the source Git Repository. The current configuration so=red in the memory cached pod is frequently synced with the repository using a Kubernetes operator for changes. The Flux container compares that with the pod's existing configuration in cached memory. Any mismatch detected will trigger a set of kubectl apply/delete commands to bring the cluster to the latest configuration. The latest configuration gets stored as cache data, which is continuously tracked for changes. What Is Argo CD? ArgoCD is a Continuous Delivery (CD) tool that has gained popularity in DevOps for performing application delivery onto Kubernetes. It relies on a deployment method known as GitOps. GitOps is a mechanism that can pull the latest code and application configuration from the last known Git commit version and deploy it directly into Kubernetes resources. The wide adoption and popularity are because ArgoCD helps developers manage infrastructure and applications lifecycle in one platform. Read here to Understand the architecture and components of Argo CD. Comparison of Flux CD With Argo CD Both Argo CD and Flux CD are powerful tools for managing Kubernetes deployments. Let us analyze some key differences and similarities between Argo CD and Flux CD: How To Differ in Installation Flux and Argo must be installed on the Kubernetes cluster you want to manage. Here are the steps on how they differ in installation. Argo CD Installation Installation of Argo CD is relatively straightforward. Hands-on with CLI will be enough to keep Argo CD up and running. The standard stores involved in installation are: Creation of a namespace `Argocd` and use of the `kubectl` command to apply the package downloaded from GitHub For a hardened CLI user, one need not install the UI and make Argo CD work with the essential components. As an external IP is not configured on the Argo CD API, the user has to configure one to access it from the outside. Setting up Ingress is necessary to perform advanced touring and rollback strategies. Argo CD wasn't designed to be run in an air-gapped environment, but there are ways to perform air-gapped installations. For Airgapping Argo CD, we have to use HELM charts. Flux CD Installation The Flux CD installation is similar to Argo CD and will be done on a CLI. Download the executable binaries from GitHub. Using the flux bootstrap command, a user can install Flux on a Kubernetes cluster and configure it to manage itself from a Git repository. For Air-Gapped installation of flux without access to the internet, Flux binaries and the container images need to be downloaded from a system with internet access. Unlike ArgoCD, Flux can run in an offline environment off the shelf. How Do They Differ in Interface? A web-based UI is necessary for most Developers and DevOps teams as business demands them to perform operations faster and with shorter learning curves. UI becomes a necessity when teams prefer low-code operations. Whereas a UI adds value to day-to-day operations, it is essential that the UI can provide end-to-end visibility of the system and the proper controls for a better user experience. Argo CD Interface Argo CD provides a web-based user interface that makes managing and monitoring your deployments easy. The GUI dashboard enables users to perform various tasks, from application onboarding to deployment into Kubernetes clusters, without accessing the CLI. Though limited capability for advanced troubleshooting and monitoring of Kubernetes resources, most jobs can be quickly done on Argo CD UI. All deployment-related information is available in a single dashboard, saving the users from logging in to each cluster separately to perform deployment operations. Image Source If you are looking for a much more advanced dashboard for Kubernetes management where you can perform all Kubernetes applications lifecycle management without using the CLI or kubectl command, explore the Kubernetes dashboard by Devtron. Flux CD Interface Flux CD does not provide a user interface and requires you to use the command line interface to manage your deployments. So a user has to rely on the CLI command to manage the deployment lifecycle. This increases the learning curve of using the Flux CD, making Argo CD more popular among users. There are some unofficial Flux web UI projects, but the projects are stuck, and it doesn't add any value to the CLI. Experimental Flux CD GUI Image Source How Do They Differ in Application Deployment? Argo CD and Flux CD support Kubernetes objects/manifest (YAML), Kustomize, and Helm. But both tools are fundamentally different in how they apply those manifests. Let us understand how they differ. Application Deployment in Flux CD Flux uses Kustomize Controller heavily that performs all the lifecycle operations of Kubernetes manifests of applying the update and delete by syncing with the source repository. Without Kustomize, Flux generates its own Kustomization.yaml to address dependencies. Flux also has native support for Helm, which enables it to perform Helm-based releases. It provides Flux with the advantage of using hem chart libraries from marketplaces. Flux CD can connect and manage only one repository per instance. This also puts a limitation on its multicluster abilities.l Fluc can work inside a cluster. For every new cluster, one has to run a different instance of Flux. Application Deployment in Argo CD Argo CD has Kustomize and raw Kubernetes YAML. While Flux needed Kustomize to modify manifest files, Argo doesn't need to use Kustomize. It can perform deployments without Kustomoize. But this sometimes creates an issue of sequencing builds, and some apps may fail to start because their resources aren't yet available. Though it may seem counterintuitive, its creators made this architecture such, to exercise far greater control over its resources. Argo CD performs sync operations in a sequence of steps and is broadly classified into three phases pre-sync, sync, and post-sync. The sequencing allows Argo to ensure resources are healthy before syncing subsequent resources. Argo CD also has native support for Helm, but the approach for application deployment is different. While Flux uses the `golang` library directly, Argo uses the Helm charts and converts them into Argo CD sync waves and hooks. This internal configuration prevents users from using the HELM CLI for clusters. Argo CD provides far more valuable capabilities for multicluster management. It can connect multiple Git repositories to one cluster. Can manage multiple clusters in one instance Image Source How Do They Differ in Providing Multi-Tenancy? In Kubernetes, architecture is centralized across namespaces, which doesn't support a first-class multitenancy architecture. This is built to improve security and fairness and manage noisy neighbors. But the latest Kubernetes releases have features that can help users achieve multitenancy. Large teams and organizations prefer multitenancy because it offers them a cost-effective solution for infrastructure with reduced operational overheads. Multi-Tenancy in Flux CD Inherently Flux CD doesn’t have a multi-tenancy configuration. As new instances of Flux CD for every cluster and that instance can only watch one git repository, it is logically impossible to host separate applications on the same cluster. To achieve multitenancy, users must set up a complex map of applications and environments/namespaces, which will be synced from different git repositories by different instances of Flux CD. This also allows admins to set up distinct roles and permissions for managing various applications and resources in the same cluster. This is not scalable as it will become a management headache. Multi-Tenancy in Argo CD Argo CD architecture supports multitenancy natively. But it isn't available on the core package for users who don't want to use such a feature or do not have the need. Argo CD supports RBAC and SSO natively, which is crucial to limit resource operations by different users on the same cluster. It is the primary requirement to enable. Multi-tenancy. Similarities in Deployment Strategies Argo CD provides rollout strategies such as Canary, Bluegreen, and Progressive deployments. Flux also provides some standard Canary and Bue green releases, but the options to configure them as per need are limited. Deployment strategies are essential in reducing risk while releasing; let us understand how they differ in providing these critical features. To perform these deployment strategies in Flux, developers must use Flagger. Argo CD relies on Argo rollouts. Using ingresses like Envoy API gateway, NGINX, Traefik, or service mesh Istio, Linkered, and AWS App Mesh, allows users of Argo and Flux to perform run tie switching or traffic splitting. Both tools provide essential metrics like CPU and throughout data for validating canary or blue-green releases. Argo and Flux support a range of metrics analysis, such as Prometheus, Datadog, and New Relic. While these deployment strategies are possible, assessing a release before deployment into production requires understanding the new releases' behavior in a real-time environment. APM data is essential to perform behavior analysis enabled by CRDs in ArgoCD. Bit Flux CD lacks a feature to provide a detailed assessment of new releases. This allows Argo CD to be proactive and detect new releases before significant issues are seen in production. Both support rollbacks to older versions instantly. Similarities in Feedback and Notifications Mechanisms Alerting and instant notifications on the right channel are essential. While executing CI/CD pipelines, these notifications send out important information about status updates, metrics, health checks, risks, etc. Most users are active on messaging platforms, so they must be notified through these channels. Flux can deliver alerts using alerting and outgoing notifications. Argo CD uses Argo CD Notifications via its plugin mechanism to provide alerts and updates. Security Concurrence GitOs core concept is to observe Git as the sole source of truth for application and configuration data. In the case of security, one has to divert from this logic of storing all data on Git. Secrets are sensitive data like encoded tokens, passwords, etc., that should not be placed in Git repositories. Flux and Argo CD have their way of managing these secrets, but it is highly recommended to use external secrets management. While Flux has limited accessibility management capabilities and follows that of the Kubernetes RBAC, Argo CD has built-in SSO and its own RBAC configuration making access control easy for large teams. Similarities in Achieving Extensibility Argo CD integrates with several Kubernetes tools, such as Helm and Kustomize, to provide a more flexible deployment process. Flux CD also integrates with these tools but does not provide as many integration options as Argo CD. Argo CD and Flux are pretty modular, and through API servers inbuilt into their architecture, webhooks, and extension are feasible. This allows users to build and execute more bespoke activities suites yo ones team’s needs. Conclusion Both the GitOps tools are resourceful while managing Kubernetes deployments. While the absence of a UI makes Flux a little tricky for new engineers, it is still a preferable tool where the requirements are not high and one needs a cost and resource-effective solution. Arog CD is more prevalent among budding developers because it provides a holistic GitOps solution. Flux is limited to only Kubenrtes deployments and configuration management as code. More

Build a Simple Chat Server With gRPC in .Net Core

By Okosodo Victor

Understanding Dependencies...Visually!

By Scott Sosna CORE

Auditing Tools for Kubernetes

By Vasilii Kulazhenkov

Introduction To Git

Imagine you're working on a critical project, pouring hours of effort into writing code, only to accidentally delete a crucial file. Panic sets in as you realize there's no way to retrieve the previous version. But wait! Introducing Git, the superhero of version control systems. With Git, you can effortlessly track changes, revert to previous versions, collaborate seamlessly with teammates, and even branch out to experiment without fear of irreversible consequences. Git saves the day by empowering developers to confidently navigate the complex world of software development, ensuring smooth workflows and protecting precious code from the clutches of accidental deletions. Git is the most popular tool. Git is a version-control system for tracking changes in computer files and coordinating work on those files among multiple people. It is primarily used for source-code management in software development, but it can be used to keep track of changes in any set of files. How It Works Creating Repository: You can create a repository using the command git init. Navigate to your project folder and enter the command git init to initialize a git repository for your project on the local system. Git Initialization Making Changes: Once the directory has been initialized, you can check the status of the files, whether they are being tracked by a git or not, using the command git status. Status of Git Repository Since no files are being tracked right now, let us stage these files. For that, use the command git add, which will track all the files in the project folder. Git add Once the files or changes have been staged, we are ready to commit them to our repository. We can commit the files using the command git commit -m "custom message." Git Commit Syncing Repositories: Once everything is ready on our locally, we can start pushing our changes to the remote repository. Copy your repository link and paste it in the command git remote add origin "<URL to repository>." To push the changes to your repository, enter the command git push origin <branch-name>. In our case branch is master, hence git push origin master. This command will then prompt for username and password, enter the values, and hit enter. Git Push Your local repository is now synced with the remote repository on GitHub. Remote Repository Similarly, if we want to download the remote repository to our local system, we can use the command git clone <URL>. This folder will create a folder with the repository name and download all the contents of the repository inside this folder. Git Clone The git pull command is also used for pulling the latest changes from the repository; unlike git clone, this command can only work inside an initialized git repository. This command is used when you are already working in the cloned repository, and want to pull the latest changes, that others might have pushed to the remote repository git pull <URL>. Git Pull Until now, you have seen how we can work on Git. But now, imagine multiple developers working on the same repository or project. To handle the workspace of multiple developers, we use branches. To create a branch from an existing branch, we use the command git branch <new-branch-name>. Similarly, to delete the branch git branch -D <branch-name>. To switch to the new branch, use the command git checkout <branch-name>. Want to check the log for every commit detail in your repository? You can accomplish that using the command git log. Git log Want to save your work without committing the code? Git has got you covered. This can be helpful when you want to switch branches but do not want to save your changes to the git repository. To stash your staged files without committing, just type in git stash. If you want to stash your untracked files as well, type git stash -u. Once you are back and want to retrieve working, type in git stash pop. Git Stash git revert command helps you in reverting a commit to a previous version git revert <commit-id>. <commit-id> can be obtained from the output of git log. Git Revert git diff command helps us in checking the differences between 2 versions of a file. git diff <commit-id of version x> <commit-id of version y> Git Diff In conclusion, Git is an essential tool for developers and teams, providing efficient and reliable version control capabilities that facilitate collaboration, track changes, and simplify the management of code and files throughout a project's lifecycle. It's versatility and robust features make it a valuable asset in software development and other industries where version control is crucial.

By Pradeep Gopalgowda

Is Apache Kafka Providing Real Message Ordering?

One of Apache Kafka’s most known mantras is “it preserves the message ordering per topic-partition,” but is it always true? In this blog post, we’ll analyze a few real scenarios where accepting the dogma without questioning it could result in unexpected and erroneous sequences of messages. Basic Scenario: Single Producer We can start our journey with a basic scenario: a single producer sending messages to an Apache Kafka topic with a single partition, in sequence, one after the other. In this basic situation, as per the known mantra, we should always expect correct ordering. But is it true? Well… it depends! The Network Is Not Equal In an ideal world, the single-producer scenario should always result in correct ordering. But our world isn’t perfect! Different network paths, errors, and delays could mean that a message gets delayed or lost. Let’s imagine the situation below: a single producer sending three messages to a topic: Message 1, for some reason, finds a long network route to Apache Kafka Message 2 finds the quickest network route to Apache Kafka Message 3 gets lost in the network Even in this basic scenario, with only one producer, we could get an unexpected series of messages on the topic. The end result on the Kafka topic will show only two events being stored, with the unexpected ordering 2, 1. If you think about it, it’s the correct ordering from the Apache Kafka point of view: a topic is only a log of information, and Apache Kafka will write the messages to the log depending on when it “senses” the arrival of a new event. It’s based on Kafka ingestion time and not on when the message was created (event time). Acks and Retries But not all is lost! If we look into the producing libraries (aiokafka being an example), we have ways to ensure that messages are delivered properly. First of all, to avoid the problem with the message 3 in the above scenario, we could define a proper acknowledgment mechanism. The acks producer parameter allows us to define what confirmation of message reception we want to have from Apache Kafka. Setting this parameter to 1 will ensure that we receive an acknowledgment from the primary broker responsible for the topic (and partition). Setting it to all will ensure that we receive the ack only if both the primary and the replicas correctly store the message, thus saving us from problems when only the primary receives the message and then fails before propagating it to the replicas. Once we set a sensible ack, we should set the possibility to retry sending the message if we don’t receive a proper acknowledgment. Differently from other libraries (kafka-python being one of them), aiokafka will retry sending the message automatically until the timeout (set by the request_timeout_ms parameter) has been exceeded. With acknowledgment and automatic retries, we should solve the problem of the message 3. The first time it is sent, the producer will not receive the ackTherefore, after the retry_backoff_ms interval, it will send the message 3 again. Max In-Flight Requests However, if you watch the end result in the Apache Kafka topic closely, the resulting ordering is not correct: we sent 1,2,3 and got 2,1,3 in the topic… how to fix that? The old method (available in kafka-python) was to set the maximum in-flight request per connection: the number of messages we allow to be “in the air” at the same time without acknowledgment. The more messages we allow in the air at the same time, the more risk of getting out-of-order messages. When using kafka-python, if we absolutely needed to have a specific ordering in the topic, we were forced to limit the max_in_flight_requests_per_connection to 1. Basically, supposing that we set the ack parameter to at least 1, we were waiting for an acknowledgment of every single message (or batch of messages if the message size is less than the batch size) before sending the following one. The absolute correctness of ordering, acknowledgment, and retries comes at the cost of throughput. The smaller amount of messages we allow to be “in the air” at the same time, the more acks we need to receive, and the fewer overall messages we can deliver to Kafka in a defined timeframe. Idempotent Producers To overcome the strict serialization of sending one message at a time and waiting for acknowledgment, we can define idempotent producers. With an idempotent producer, each message gets labeled with both a producer ID and a serial number (a sequence maintained for each partition). This composed ID is then sent to the broker alongside the message. The broker keeps track of the serial number per producer and topic/partition. Whenever a new message arrives, the broker checks the composed ID, and if, within the same producer, the value is equal to the previous number + 1, then the new message is acknowledged. Otherwise, it is rejected. This provides a guarantee of the global ordering of messages allowing a higher number of in-flight requests per connection (maximum of 5 for the Java client). Increase Complexity With Multiple Producers So far, we imagined a basic scenario with only one producer, but the reality in Apache Kafka is that often the producers will be multiple. What are the little details to be aware of if we want to be sure about the end ordering result? Different Locations, Different Latency Again, the network is not equal, and with several producers located in possibly very remote positions, the different latency means that the Kafka ordering could differ from the one based on event time. Unfortunately, the different latencies between different locations on Earth can’t be fixed. Therefore, we will need to accept this scenario. Batching, an Additional Variable To achieve higher throughput, we might want to batch messages. With batching, we send messages in “groups,” minimizing the overall number of calls and increasing the payload to overall message size ratio. But, in doing so, we can again alter the ordering of events. The messages in Apache Kafka will be stored per batch, depending on the batch ingestion time. Therefore, the ordering of messages will be correct per batch, but different batches could have different ordered messages within them. Now, with both different latencies and batching in place, it seems that our global ordering premise would be completely lost… So, why are we claiming that we can manage the events in order? The Savior: Event Time We understood that the original premise about Kafka keeping the message ordering is not 100% true. The ordering of the messages depends on the Kafka ingestion time and not on the event generation time. But what if the ordering based on event time is important? Well, we can’t fix the problem on the production side, but we can do it on the consumer side. All the most common tools that work with Apache Kafka have the ability to define which field to use as event time, including Kafka Streams, Kafka Connect with the dedicated Timestamp extractor single message transformation (SMT), and Apache Flink®. Consumers, when properly defined, will be able to reshuffle the ordering of messages coming from a particular Apache Kafka topic. Let’s analyze the Apache Flink example below: CREATE TABLE CPU_IN ( hostname STRING, cpu STRING, usage DOUBLE, occurred_at BIGINT, time_ltz AS TO_TIMESTAMP_LTZ(occurred_at, 3), WATERMARK FOR time_ltz AS time_ltz - INTERVAL '10' SECOND ) WITH ( 'connector' = 'kafka', 'properties.bootstrap.servers' = '', 'topic' = 'cpu_load_stats_real', 'value.format' = 'json', 'scan.startup.mode' = 'earliest-offset' ) In the above Apache Flink table definition, we can notice: occurred_at: the field is defined in the source Apache Kafka topic in unix time (datatype is BIGINT). time_ltz AS TO_TIMESTAMP_LTZ(occurred_at, 3): transforms the unix time into the Flink timestamp. WATERMARK FOR time_ltz AS time_ltz - INTERVAL '10' SECOND defines the new time_ltz field (calculated from occurred_at) as the event time and defines a threshold for late arrival of events with a maximum of 10 seconds delay. Once the above table is defined, the time_ltz field can then be used to correctly order events and define aggregation windows, making sure that all events within the accepted latency are included in the calculations. The - INTERVAL '10' SECOND defines the latency of the data pipeline and is the penalty we need to include to allow the correct ingestion of late-arriving events. Please note, however, that the throughput is not impacted. We can have as many messages flowing in our pipeline as we want, but we’re “waiting 10 seconds” before calculating any final KPI in order to make sure we include in the picture all the events in a specific timeframe. An alternative approach that works only if the events contain the full state is to keep a certain key (hostname and cpu in the above example) the maximum event time reached so far, and only accept changes where the new event time is greater than the maximum. Wrapping Up The concept of ordering in Kafka can be tricky, even if we only include a single topic with a single partition. This post shared a few common situations that could result in an unexpected series of events. Luckily options like limiting the number of messages in flight, or using idempotent producers, can help achieve an ordering in line with expectations. In the case of multiple producers and the unpredictability of network latency, the option available is to fix the overall ordering on the consumer side by properly handling the event time that needs to be specified in the payload. Some further readings: Kafka Streams event time Check out the Timestamp router SMT in Kafka Connect

By Francesco Tisiot

Simplify Kubernetes Resource Management With Sveltos, Carvel ytt, and Flux

Managing Kubernetes add-ons can be a challenging task, especially when dealing with complex deployments and frequent configuration changes. In this article, we will explore how Sveltos and Carvel ytt can work together to simplify Kubernetes resource management. Sveltos is a powerful Kubernetes add-on management tool, while Carvel ytt is a templating and patching tool for YAML files. We will delve into the integration of Carvel ytt with Sveltos using the ytt controller, enabling seamless deployment and configuration management. Introducing Sveltos Sveltos is an open-source project tool that simplifies the process of managing and deploying add-ons to Kubernetes clusters. It provides a comprehensive solution for installing, configuring, and managing add-ons, making it easier to enhance the functionality and capabilities of Kubernetes.Sveltos provides support for Helm charts, Kustomize, and resource YAMLs. To know more about Sveltos, this article delves into the management of Kubernetes add-ons using Sveltos. This other article focuses on deploying add-ons as a result of events. An Overview of Carvel Ytt Carvel ytt is a tool that is part of the Carvel suite. Its main purpose is to facilitate the generation and management of YAML files based on templates. With ytt, you can easily create and modify YAML files by leveraging templates and data values. This enables you to have a flexible and dynamic approach to configuration management within Kubernetes environments. Unlike Helm and other similar templating tools that treat YAML templates purely as text templates, ytt takes advantage of the inherent language structure of YAML. This means that ytt understands the underlying structure of YAML configurations and utilizes comments to annotate those structures. As a result, ytt goes beyond traditional text templating and becomes a YAML structure-aware templating solution. This unique feature alleviates the need for developers to ensure the structural validity of their generated YAML configurations and makes the process of writing templates much more straightforward. Integrating Carvel ytt With Sveltos via ytt Controller To harness the capabilities of Carvel ytt with Sveltos, we have developed the ytt controller. The ytt controller acts as a bridge between Sveltos and Carvel ytt, enabling the processing of ytt files and making the output accessible for Sveltos. In order to utilize the ytt controller, a Kubernetes Custom Resource Definition (CRD) called YttSource was introduced. By creating instances of YttSource, you can specify the sources of ytt files through various options such as Flux Sources (GitRepository/OCIRepository/Bucket), ConfigMap, or Secret. The integration process involves the following steps: 1) Install the ytt controller Shell [sourcecode language="bash"] kubectl apply -f https://raw.githubusercontent.com/gianlucam76/ytt-controller/main/manifest/manifest.yaml [/sourcecode] 2) Using GitRepository as a source: YAML [sourcecode language="yaml"] apiVersion: extension.projectsveltos.io/v1alpha1 kind: YttSource metadata: name: yttsource-flux spec: namespace: flux-system name: flux-system kind: GitRepository path: ./deployment/ [/sourcecode] Flux is utilized to synchronize the ytt-examples GitHub repository, which contains the ytt files. The YttSource is instructing ytt controller to get ytt files from Flux GitRepository. The ytt controller automatically detects changes in the repository and invokes the ytt module to process the files. The resulting output is stored in the Status section of the YttSource instance. 3) Sveltos can then utilize its template feature to deploy the generated Kubernetes resources to the managed cluster. YAML [sourcecode language="yaml"] apiVersion: config.projectsveltos.io/v1alpha1 kind: ClusterProfile metadata: name: deploy-resources spec: clusterSelector: env=fv templateResourceRefs: - resource: apiVersion: extension.projectsveltos.io/v1alpha1 kind: YttSource name: yttsource-flux namespace: default identifier: YttSource policyRefs: - kind: ConfigMap name: info namespace: default --- apiVersion: v1 kind: ConfigMap metadata: name: info namespace: default annotations: projectsveltos.io/template: "true" # add annotation to indicate Sveltos content is a template data: resource.yaml: | {{ (index .MgtmResources "YttSource").status.resources } [/sourcecode] Shell [sourcecode language="bash"] kubectl exec -it -n projectsveltos sveltosctl-0 -- ./sveltosctl show addons +-------------------------------------+-----------------+-----------+----------------------+---------+-------------------------------+------------------+ | CLUSTER | RESOURCE TYPE | NAMESPACE | NAME | VERSION | TIME | CLUSTER PROFILES | +-------------------------------------+-----------------+-----------+----------------------+---------+-------------------------------+------------------+ | default/sveltos-management-workload | :Service | staging | sample-app | N/A | 2023-05-22 08:00:28 -0700 PDT | deploy-resources | | default/sveltos-management-workload | apps:Deployment | staging | sample-app | N/A | 2023-05-22 08:00:28 -0700 PDT | deploy-resources | | default/sveltos-management-workload | :Secret | staging | application-settings | N/A | 2023-05-22 08:00:28 -0700 PDT | deploy-resources | +-------------------------------------+-----------------+-----------+----------------------+---------+-------------------------------+------------------+ [/sourcecode] For detailed information on the ytt controller and its usage with ConfigMap/Secret, please refer to the Sveltos documentation. This documentation provides comprehensive insights into the ytt controller and offers guidance on integrating it with ConfigMap and Secret resources. Conclusion By integrating Carvel ytt with Sveltos using the ytt controller, we can greatly simplify Kubernetes resource management. This powerful combination enables clean and efficient configuration management, seamless deployment of resources, and effortless synchronization of changes. Sveltos empowers DevOps teams to focus on their core tasks while providing a unified and intuitive interface for managing Kubernetes infrastructure effectively. Carvel ytt enhances the deployment process by enabling declarative configuration management and ensuring consistency across deployments. Together, Sveltos and Carvel ytt create a robust solution for managing Kubernetes resources with ease and efficiency.

By Gianluca Mardente

Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)

To get more clarity about ISR in Apache Kafka, we should first carefully examine the replication process in the Kafka broker. In short, replication means having multiple copies of our data spread across multiple brokers. Maintaining the same copies of data in different brokers makes possible the high availability in case one or more brokers go down or are untraceable in a multi-node Kafka cluster to server the requests. Because of this reason, it is mandatory to mention how many copies of data we want to maintain in the multi-node Kafka cluster while creating a topic. It is termed a replication factor, and that’s why it can’t be more than one while creating a topic on a single-node Kafka cluster. The number of replicas specified while creating a topic can be changed in the future based on node availability in the cluster. On a single-node Kafka cluster, however, we can have more than one partition in the broker because each topic can have one or more partitions. The Partitions are nothing but sub-divisions of the topic into multiple parts across all the brokers on the cluster, and each partition would hold the actual data(messages). Internally, each partition is a single log file upon which records are written in an append-only fashion. Based on the provided number, the topic internally split into the number of partitions at the time of creation. Thanks to partitioning, messages can be distributed in parallel among several brokers in the cluster. Kafka scales to accommodate several consumers and producers at once by employing this parallelism technique. This partitioning technique enables linear scaling for both consumers and providers. Even though more partitions in a Kafka cluster provide a higher throughput but with more partitions, there are pitfalls too. Briefly, more file handlers would be created if we increase the number of partitions as each partition maps to a directory in the file system in the broker. Now it would be easy for us to understand better the ISR as we have discussed replication and partitions of Apache Kafka above. The ISR is just a partition’s replicas that are “in sync” with the leader, and the leader is nothing but a replica that all requests from clients and other brokers of Kafka go to it. Other replicas that are not the leader are termed followers. A follower that is in sync with the leader is called an ISR (in-sync replica). For example, if we set the topic’s replication factor to 3, Kafka will store the topic-partition log in three different places and will only consider a record to be committed once all three of these replicas have verified that they have written the record to the disc successfully and eventually send back the acknowledgment to the leader. In a multi-broker (multi-node) Kafka cluster (please click here to read how a multi-node Kafka cluster can be created), one broker is selected as the leader to serve the other brokers, and this leader broker would be responsible to handle all the read and write requests for a partition while the followers (other brokers) passively replicate the leader to achieve the data consistency. Each partition can only have one leader at a time and handles all reads and writes of records for that partition. The Followers replicate leaders and take over if the leader dies. By leveraging Apache Zookeeper, Kafka internally selects the replica of one broker’s partition, and if the leader of that partition fails (due to an outage of that broker), Kafka chooses a new ISR (in-sync replica) as the new leader. When all of the ISRs for a partition write to their log, the record is said to have been “committed,” and the consumer can only read committed records. The minimum in-sync replica count specifies the minimum number of replicas that must be present for the producer to successfully send records to a partition. Even though the high number of minimum in-sync replicas gives a higher persistence but there might be a repulsive effect, too, in terms of availability. The data availability automatically gets reduced if the minimum number of in-sync replicas won’t be available before publishing. The minimum number of in-sync replicas indicates how many replicas must be available for the producer to send records to a partition successfully. For example, if we have a three-node operational Kafka cluster with minimum in-sync replicas configuration as three, and subsequently, if one node goes down or unreachable, then the rest other two nodes will not be able to receive any data/messages from the producers because of only two active/available in sync replicas across the brokers. The third replica, which existed on the dead or unavailable broker, won’t be able to send the acknowledgment to the leader that it was synced with the latest data like how the other two live replicas did on the available brokers in the cluster. Hope you have enjoyed this read. Please like and share if you feel this composition is valuable.

By Gautam Goswami CORE

Orange Pi Cluster With Docker Swarm and MariaDB

Building a cluster of single-board mini-computers is an excellent way to explore and learn about distributed computing. With the scarcity of Raspberry Pi boards, and the prices starting to get prohibitive for some projects, alternatives such as Orange Pi have gained popularity. In this article, I’ll show you how to build a (surprisingly cheap) 4-node cluster packed with 16 cores and 4GB RAM to deploy a MariaDB replicated topology that includes three database servers and a database proxy, all running on a Docker Swarm cluster and automated with Ansible. This article was inspired by a member of the audience who asked my opinion about Orange Pi during a talk I gave in Colombia. I hope this completes the answer I gave you. What Is a Cluster? A cluster is a group of computers that work together to achieve a common goal. In the context of distributed computing, a cluster typically refers to a group of computers that are connected to each other and work together to perform computation tasks. Building a cluster allows you to harness the power of multiple computers to solve problems that a single computer cannot handle. For example, a database can be replicated in multiple nodes to achieve high availability—if one node fails, other nodes can take over. It can also be used to implement read/write splitting to make one node handle writes, and another reads in order to achieve horizontal scalability. What Is Orange Pi Zero2? The Orange Pi Zero2 is a small single-board computer that runs on the ARM Cortex-A53 quad-core processor. It has 512MB or 1GB of DDR3 RAM, 100Mbps Ethernet, Wi-Fi, and Bluetooth connectivity. The Orange Pi Zero2 is an excellent choice for building a cluster due to its low cost, small size, and good performance. The only downside I found was that the Wi-Fi connection didn’t seem to perform as well as with other single-board computers. From time to time, the boards disconnect from the network, so I had to place them close to a Wi-Fi repeater. This could be a problem with my setup or with the boards. I’m not entirely sure. Having said that, this is not a production environment, so it worked pretty well for my purposes. What You Need Here are the ingredients: Orange Pi Zero2: I recommend the 1GB RAM variant and try to get at least 4 of them. I recently bought 4 of them for €30 each. Not bad at all! Give it a try! MicroSD cards: One per board. Try to use fast ones — it will make quite a difference in performance! I recommend at least 16GB. For reference, I used SanDisk Extreme Pro Micro/SDXC with 32GB, which offers a write speed of 90 MB/s and reads at 170 MB/s. A USB power hub: To power the devices, I recommend a dedicated USB power supply. You could also just use individual chargers, but the setup will be messier and require a power strip with as many outlets as devices as you have. It’s better to use a USB multi-port power supply. I used an Anker PowerPort 6, but there are also good and cheaper alternatives. You’ll have to Google this too. Check that each port can supply 5V and at least 2.4A. USB cables: Each board needs to be powered via a USB-C port. You need a cable with one end of type USB-C and the other of the type your power hub accepts. Bolts and nuts: To stack up the boards. Heat sinks (optional): These boards can get hot. I recommend getting heat sinks to help with heat dissipation. Materials needed for building an Orange Pi Zero2 cluster Assembling the Cluster One of the fun parts of building this cluster is the physical assembly of the boards on a case or some kind of structure that makes them look like a single manageable unit. Since my objective here is to keep the budget as low as possible, I used cheap bolts and nuts to stack the boards one on top of the other. I didn’t find any ready-to-use cluster cases for the Orange Pi Zero2. One alternative is to 3D-print your own case. When stacking the boards together, keep an eye on the antenna placement. Avoid crushing the cable, especially if you installed heat sinks. An assembled Orange Pi Zero2 cluster with 4 nodes Installing the Operating System The second step is to install the operating system on each microSD card. I used Armbian bullseye legacy 4.9.318. Download the file and use a tool like balenaEtcher to make bootable microSD cards. Download and install this tool on your computer. Select the Armbian image file and the drive that corresponds to the micro SD card. Flash the image and repeat the process for each micro SD card. Configuring Orange Pi WiFi Connection (Headless) To configure the Wi-Fi connection, Armbian includes the /boot/armbian_first_run.txt.template file which allows you to configure the operating system when it runs for the first time. The template includes instructions, so it’s worth checking. You have to rename this file to armbian_first_run.txt. Here’s what I used: Plain Text FR_general_delete_this_file_after_completion=1 FR_net_change_defaults=1 FR_net_ethernet_enabled=0 FR_net_wifi_enabled=1 FR_net_wifi_ssid='my_connection_id>' FR_net_wifi_key='my_password' FR_net_wifi_countrycode='FI' FR_net_use_static=1 FR_net_static_gateway='192.168.1.1' FR_net_static_mask='255.255.255.0' FR_net_static_dns='192.168.1.1 8.8.8.8' FR_net_static_ip='192.168.1.181' Use your own Wi-Fi details, including connection name, password, country code, gateway, mask, and DNS. I wasn’t able to read the SD card from macOS. I had to use another laptop with Linux on it to make the changes to the configuration file on each SD card. To mount the SD card on Linux, run the following command before and after inserting the SD card and see what changes: Shell sudo fdisk -l I created a Bash script to automate the process. The script accepts as a parameter the IP to set. For example: Shell sudo ./armbian-setup.sh 192.168.1.181 I run this command on each of the four SD cards changing the IP address from 192.168.1.181 to 192.168.1.184. Connecting Through SSH Insert the flashed and configured micro SD cards on each board and turn the power supply on. Be patient! Give the small devices time to boot. It can take several minutes the first time you boot them. An Orange Pi cluster running Armbian Use the ping command to check whether the devices are ready and connected to the network: Shell ping 192.168.1.181 Once they respond, connect to the mini-computers through SSH using the root user and the IP address that you configured. For example: Shell ssh root@192.168.1.181 The default password is: Plain Text 1234 You’ll be presented with a wizard-like tool to complete the installation. Follow the steps to finish the configuration and repeat the process for each board. Installing Ansible Imagine you want to update the operating system on each machine. You’ll have to log into a machine and run the update command and end the remote session. Then repeat for each machine in the cluster. A tedious job even if you have only 4 nodes. Ansible is an automation tool that allows you to run a command on multiple machines using a single call. You can also create a playbook, a file that contains commands to be executed in a set of machines defined in an inventory. Install Ansible on your working computer and generate a configuration file: Shell sudo su ansible-config init --disabled -t all > /etc/ansible/ansible.cfg exit In the /etc/ansible/ansible.cfg file, set the following properties (enable them by removing the semicolon): Plain Text host_key_checking=False become_allow_same_user=True ask_pass=True This will make the whole process easier. Never do this in a production environment! You also need an inventory. Edit the /etc/ansible/hosts file and add the Orange Pi nodes as follows: Plain Text ############################################################################## # 4-node Orange Pi Zero 2 cluster ############################################################################## [opiesz] 192.168.1.181 ansible_user=orangepi hostname=opiz01 192.168.1.182 ansible_user=orangepi hostname=opiz02 192.168.1.183 ansible_user=orangepi hostname=opiz03 192.168.1.184 ansible_user=orangepi hostname=opiz04 [opiesz_manager] opiz01.local ansible_user=orangepi [opiesz_workers] opiz[02:04].local ansible_user=orangepi In the ansible_user variable, specify the username that you created during the installation of Armbian. Also, change the IP addresses if you used something different. Setting up a Cluster With Ansible Playbooks A key feature of a computer cluster is that the nodes should be somehow logically interconnected. Docker Swarm is a container orchestration tool that will convert your arrangement of Orange Pi devices into a real cluster. You can later deploy any kind of server software. Docker Swarm will automatically pick one of the machines to host the software. To make the process easier, I have created a set of Ansible playbooks to further configure the boards, update the packages, reboot or power off the machines, install Docker, set up Docker Swarm, and even install a MariaDB database with replication and a database cluster. Clone or download this GitHub repository: Shell git clone https://github.com/alejandro-du/orange-pi-zero-cluster-ansible-playbooks.git Let’s start by upgrading the Linux packages on all the boards: Shell ansible-playbook upgrade.yml --ask-become-pass Now configure the nodes to have an easy-to-remember hostname with the help of Avahi, and configure the LED activity (red LED activates on SD card activity): Shell ansible-playbook configure-hosts.yml --ask-become-pass Reboot all the boards: Shell ansible-playbook reboot.yml --ask-become-pass Install Docker: Shell ansible-playbook docker.yml --ask-become-pass Set up Docker Swarm: Shell ansible-playbook docker-swarm.yml --ask-become-pass Done! You have an Orange Pi cluster ready for fun! Deploying MariaDB on Docker Swarm I have to warn you here. I don’t recommend running a database on container orchestration software. That’s Docker Swarm, Kubernetes, and others. Unless you are willing to put a lot of effort into it. This article is a lab. A learning exercise. Don’t do this in production! Now let’s get back to the fun… Run the following to deploy one MariaDB primary server, two MariaDB replica servers, and one MaxScale proxy: Shell ansible-playbook mariadb-stack.yml --ask-become-pass The first time you do this, it will take some time. Be patient. SSH into the manager node: Shell ssh orangepi@opiz01.local Inspect the nodes in the Docker Swarm cluster: Shell docker node ls Inspect the MariaDB stack: Shell docker stack ps mariadb A cooler way to inspect the containers in the cluster is by using the Docker Swarm Visualizer. Deploy it as follows: Shell docker service create --name=viz --publish=9000:8080 --constraint=node.role==manager --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock alexellis2/visualizer-arm:latest On your working computer, open a web browser and go to this URL. You should see all the nodes in the cluster and the deployed containers. Docker Swarm Visualizer showing MariaDB deployed MaxScale is an intelligent database proxy with tons of features. For now, let’s see how to connect to the MariaDB cluster through this proxy. Use a tool like DBeaver, DbGate, or even a database extension for your favorite IDE. Create a new database connection using the following connection details: Host: opiz01.local Port: 4000 Username: user Password: password Create a new table: MariaDB SQL USE demo; CREATE TABLE messages( id INT PRIMARY KEY AUTO_INCREMENT, content TEXT NOT NULL ); Insert some data: MariaDB SQL INSERT INTO messages(content) VALUES ("It works!"), ("Hello, MariaDB"), ("Hello, Orange Pi"); When you execute this command, MaxScale sends it to the primary server. Now read the data: MariaDB SQL SELECT * FROM messages; When you execute this command, MaxScale sends it to one of the replicas. This division of reads and writes is called read-write splitting. The MaxScale UI showing a MariaDB cluster with replication and read-write splitting You can also access the MaxScale UI. Use the following credentials: Username: admin Password: mariadb Watch the following video if you want to learn more about MaxScale and its features. You won’t regret it!

By Alejandro Duarte CORE

Continuous Integration in AWS Using Jenkins Pipelines: Best Practices and Strategies

Developing and releasing new software versions is an ongoing process that demands careful attention to detail. The ability to monitor and analyze the entire process is critical for identifying any potential issues and implementing effective corrective measures. The concept of continuous integration becomes relevant at this point. By adopting a continuous integration approach, software development teams can carefully monitor each stage of the development process and conduct an in-depth analysis of the outcomes. This facilitates the early detection and diagnosis of potential issues, enabling developers to make necessary adjustments and improve the overall development process. In other words, continuous integration provides a systematic way of identifying problems and continuously enhancing software quality, ultimately leading to a better end product. The focus of this post is on exploring the benefits of continuous integration in software development. Specifically, we will delve into the practical aspects of implementing continuous integration using Jenkins, a popular automation tool, and share valuable insights on how this approach can help optimize and streamline your software development process. By the end of this post, you will have a better understanding of how continuous integration can improve your workflow and help you build better software more efficiently. From the Development Team’s Perspective, What Initiates Continuous Integration? With continuous integration, the development team initiates the process by pushing code changes to the repository, which triggers an automated pipeline to build, test, and deploy the updated software version. This streamlines the development cycle, leading to faster feedback and higher-quality software. A structured workflow aims to establish a standardized order of operations for developers, ensuring that subsequent versions of the software are built according to the software development life cycle defined by management. Here are some primary benefits of continuous integration: Version control - With continuous integration, developers can easily track production versions and compare the performance of different versions during development. In addition, the ability to roll back to a previous version is also available, should any production issues arise. Quality assurance - Developers can test their versions on a staging environment, demonstrating how the new version performs in an environment similar to production. Instead of running the version on their local machine, which may not be comparable to the real environment, developers can define a set of tests, including unit tests and integration tests, among others, that will take the new version through a predefined workflow. This testing process serves as their signature, ensuring the new version is safe to be deployed in a production environment. Scheduled triggering - Developers no longer need to manually trigger their pipeline or define a new pipeline for each new project. As a DevOps team, it is our responsibility to create a robust system that attaches to each project its own pipeline. Whether it is a common pipeline with slight changes to match the project or the same pipeline, developers can focus on writing code while continuous integration takes care of the rest. Scheduling an automatic triggering (for example, every morning or evening) ensures that the current code in GitHub is always ready for release. Jenkins in the Era of Continuous Integration To establish the desired pipeline workflow, we will deploy Jenkins and design a comprehensive pipeline that emphasizes version control, automated testing, and triggers. Prerequisite A virtual machine with a Docker engine Containerizing Jenkins To simplify the deployment of our CI/CD pipelines, we will deploy Jenkins in a Docker container. Deployment of Jenkins: docker run -d \ --name jenkins -p 8080:8080 -u root -p 50000:50000 \ -v /var/run/docker.sock:/var/run/docker.sock \ naturalett/jenkins:2.387-jdk11-hello-world Validate the Jenkins container: docker ps | grep -i jenkins Retrieve the Jenkins initial password: docker exec jenkins bash -c -- 'cat /var/jenkins_home/secrets/initialAdminPassword' Connect to Jenkins on the localhost (http://localhost:8080/). Building a Continuous Integration Pipeline I chose to utilize Groovy in Jenkins pipelines due to its numerous benefits: Groovy is a scripting language that is straightforward to learn and utilize. Groovy offers features that enable developers to write code that is concise, readable, and maintainable. Groovy's syntax is similar to Java, making it easier for Java developers to adopt. Groovy has excellent support for working with data formats commonly used in software development. Groovy provides an efficient and effective way to build robust and flexible CI/CD pipelines in Jenkins. The Four Phases of Our Pipeline Phase 1: The Agent To ensure that our code is built with no incompatible dependencies, each pipeline requires a virtual environment. In the following phase, we create an agent (virtual environment) in a Docker container. As Jenkins is also running in a Docker container, we'll mount the Docker socket to enable agent execution. pipeline { agent { docker { image 'docker:19.03.12' args '-v /var/run/docker.sock:/var/run/docker.sock' } } ... ... ... } Phase 2: The History of Versions We recognize the importance of versioning in software development, which allows developers to monitor code changes and evaluate software performance to make informed decisions about rolling back to a previous version or releasing a new one. In the subsequent phase, we generate a Docker image from our code and assign it a tag based on our predetermined set of definitions. For example: Date — Jenkins Build Number — Commit Hash pipeline { agent { ... } stages { stage('Build') { steps { script { def currentDate = new java.text.SimpleDateFormat("MM-dd-yyyy").format(new Date()) def shortCommit = sh(returnStdout: true, script: "git log -n 1 --pretty=format:'%h'").trim() customImage = docker.build("naturalett/hello-world:${currentDate}-${env.BUILD_ID}-${shortCommit}") } } } } } Upon completion of the previous phase, a Docker image of our code has been successfully created and is now available for use in our local environment. docker image | grep -i hello-world Phase 3: The Test In order to ensure that a new release version meets all functional and requirements tests, testing is a critical step. In the following stage, we execute tests against the Docker image that was generated in the previous stage and contains the potential next release. pipeline { agent { ... } stages { stage('Test') { steps { script { customImage.inside { sh """#!/bin/bash cd /app pytest test_*.py -v --junitxml='test-results.xml'""" } } } } } } Phase 4: The Scheduling Trigger Automating the pipeline trigger is crucial in allowing developers to concentrate on writing code while ensuring the stability and readiness of the next release. We accomplish this by setting up a morning schedule that automatically triggers the pipeline as the development team begins their workday. pipeline { agent { ... } triggers { // https://crontab.guru cron '00 7 * * *' } stages { ... } } An End-to-End Pipeline of the Process The pipeline execution process has been made simple by incorporating a pre-defined pipeline into Jenkins. You can get started by initiating the "my-first-pipeline" Jenkins job. The Agent stage creates a virtual environment used for the pipeline. The Trigger stage is responsible for automatic scheduling in the pipeline. The Clone stage is responsible for cloning the project repository. The Build stage involves creating a Docker image for the project. (To access the latest commit and other Git features, we install the Git package.) The Test stage involves performing tests on our Docker image. pipeline { agent { docker { image 'docker:19.03.12' args '-v /var/run/docker.sock:/var/run/docker.sock' } } triggers { // https://crontab.guru cron '00 7 * * *' } stages { stage('Clone') { steps { git branch: 'main', url: 'https://github.com/naturalett/hello-world.git' } } stage('Build') { steps { script { sh 'apk add git' def currentDate = new java.text.SimpleDateFormat("MM-dd-yyyy").format(new Date()) def shortCommit = sh(returnStdout: true, script: "git log -n 1 --pretty=format:'%h'").trim() customImage = docker.build("naturalett/hello-world:${currentDate}-${env.BUILD_ID}-${shortCommit}") } } } stage('Test') { steps { script { customImage.inside { sh """#!/bin/bash cd /app pytest test_*.py -v --junitxml='test-results.xml'""" } } } } } } Summary We have gained a deeper understanding of how Continuous Integration (CI) fits into our daily work and have obtained practical experience with essential pipeline workflows.

By lidor ettinger

Apache Kafka + Apache Flink = Match Made in Heaven

Apache Kafka and Apache Flink are increasingly joining forces to build innovative real-time stream processing applications. This blog post explores the benefits of combining both open-source frameworks, shows unique differentiators of Flink versus Kafka, and discusses when to use a Kafka-native streaming engine like Kafka Streams instead of Flink. The Tremendous Adoption of Apache Kafka and Apache Flink Apache Kafka became the de facto standard for data streaming. The core of Kafka is messaging at any scale in combination with a distributed storage (= commit log) for reliable durability, decoupling of applications, and replayability of historical data. Kafka also includes a stream processing engine with Kafka Streams. And KSQL is another successful Kafka-native streaming SQL engine built on top of Kafka Streams. Both are fantastic tools. In parallel, Apache Flink became a very successful stream-processing engine. The first prominent Kafka + Flink case study I remember is the fraud detection use case of ING Bank. The first publications came up in 2017, i.e., over five years ago: "StreamING Machine Learning Models: How ING Adds Fraud Detection Models at Runtime with Apache Kafka and Apache Flink." This is just one of many Kafka fraud detection case studies. One of the last case studies I blogged about goes in the same direction: "Why DoorDash migrated from Cloud-native Amazon SQS and Kinesis to Apache Kafka and Flink." The adoption of Kafka is already outstanding. And Flink gets into enterprises more and more, very often in combination with Kafka. This article is no introduction to Apache Kafka or Apache Flink. Instead, I explore why these two technologies are a perfect match for many use cases and when other Kafka-native tools are the appropriate choice instead of Flink. Top Reasons Apache Flink Is a Perfect Complementary Technology for Kafka Stream processing is a paradigm that continuously correlates events of one or more data sources. Data is processed in motion, in contrast to traditional processing at rest with a database and request-response API (e.g., a web service or a SQL query). Stream processing is either stateless (e.g., filter or transform a single message) or stateful (e.g., an aggregation or sliding window). Especially state management is very challenging in a distributed stream processing application. A vital advantage of the Apache Flink engine is its efficiency in stateful applications. Flink has expressive APIs, advanced operators, and low-level control. But Flink is also scalable in stateful applications, even for relatively complex streaming JOIN queries. Flink's scalable and flexible engine is fundamental to providing a tremendous stream processing framework for big data workloads. But there is more. The following aspects are my favorite features and design principles of Apache Flink: Unified streaming and batch APIs Connectivity to one or multiple Kafka clusters Transactions across Kafka and Flink Complex Event Processing Standard SQL support Machine Learning with Kafka, Flink, and Python But keep in mind that every design approach has pros and cons. While there are a lot of advantages, sometimes it is also a drawback. Unified Streaming and Batch APIs Apache Flink's DataStream API unifies batch and streaming APIs. It supports different runtime execution modes for stream processing and batch processing, from which you can choose the right one for your use case and the characteristics of your job. In the case of SQL/Table API, the switch happens automatically based on the characteristics of the sources: all bounded events go into batch execution mode; at least one unbounded event means STREAMING execution mode. The unification of streaming and batch brings a lot of advantages: Reuse of logic/code for real-time and historical processing Consistent semantics across stream and batch processing A single system to operate Applications mixing historical and real-time data processing This sounds similar to Apache Spark. But there is a significant difference: Contrary to Spark, the foundation of Flink is data streaming, not batch processing. Hence, streaming is the default execution runtime mode in Apache Flink. Continuous stateless or stateful processing enables real-time streaming analytics using an unbounded stream of events. Batch execution is more efficient for bounded jobs (i.e., a bounded subset of a stream) for which you have a known fixed input and which do not run continuously. This executes jobs in a way that is more reminiscent of batch processing frameworks, such as MapReduce in the Hadoop and Spark ecosystems. Apache Flink makes moving from a Lambda to Kappa enterprise architecture easier. The foundation of the architecture is real-time, with Kafka as its heart. But batch processing is still possible out-of-the-box with Kafka and Flink using consistent semantics. Though, this combination will likely not (try to) replace traditional ETL batch tools, e.g., for a one-time lift-and-shift migration of large workloads. Connectivity to One or Multiple Kafka Clusters Apache Flink is a separate infrastructure from the Kafka cluster. This has various pros and cons. First, I often emphasize the vast benefit of Kafka-native applications: you only need to operate, scale, and support one infrastructure for end-to-end data processing. A second infrastructure adds additional complexity, cost, and risk. However, imagine a cloud vendor taking over that burden, so you consume the end-to-end pipeline as a single cloud service. With that in mind, let's look at a few benefits of separate clusters for the data hub (Kafka) and the stream processing engine (Flink): Focus on data processing in a separate infrastructure with dedicated APIs and features independent of the data streaming platform. More efficient streaming pipelines before hitting the Kafka Topics again; the data exchange happens directly between the Flink workers. Data processing across different Kafka topics of independent Kafka clusters of different business units. If it makes sense from a technical and organizational perspective, you can connect directly to non-Kafka sources and sinks. But be careful, this can quickly become an anti-pattern in the enterprise architecture and create complex and unmanageable "spaghetti integrations". Implement new fail-over strategies for applications. I emphasize Flink is usually NOT the recommended choice for implementing your aggregation, migration, or hybrid integration scenario. Multiple Kafka clusters for hybrid and global architectures are the norm, not an exception. Flink does not change these architectures. Kafka-native replication tools like MirrorMaker 2 or Confluent Cluster Linking are still the right choice for disaster recovery. It is still easier to do such a scenario with just one technology. Tools like Cluster Linking solve challenges like offset management out-of-the-box. Transactions Across Kafka and Flink Workloads for analytics and transactions have very unlike characteristics and requirements. The use cases differ significantly. SLAs are very different, too. Many people think that data streaming is not built for transactions and should only be used for big data analytics. However, Apache Kafka and Apache Flink are deployed in many resilient, mission-critical architectures. The concept of exactly-once semantics (EOS) allows stream processing applications to process data through Kafka without loss or duplication. This ensures that computed results are always accurate. Transactions are possible across Kafka and Flink. The feature is mature and battle-tested in production. Operating separate clusters is still challenging for transactional workloads. However, a cloud service can take over this risk and burden. Many companies already use EOS in production with Kafka Streams. But EOS can even be used if you combine Kafka and Flink. That is a massive benefit if you choose Flink for transactional workloads. So, to be clear: EOS is not a differentiator in Flink (vs. Kafka Streams), but it is an excellent option to use EOS across Kafka and Flink, too. Complex Event Processing With FlinkCEP The goal of complex event processing (CEP) is to identify meaningful events in real-time situations and respond to them as quickly as possible. CEP does usually not send continuous events to other systems but detects when something significant occurs. A common use case for CEP is handling late-arriving events or the non-occurrence of events. The big difference between CEP and event stream processing (ESP) is that CEP generates new events to trigger action based on situations it detects across multiple event streams with events of different types (situations that build up over time and space). ESP detects patterns over event streams with homogenous events (i.e. patterns over time). Pattern matching is a technique to implement either pattern but the features look different. FlinkCEP is an add-on for Flink to do complex event processing. The powerful pattern API of FlinkCEP allows you to define complex pattern sequences you want to extract from your input stream. After specifying the pattern sequence, you apply them to the input stream to detect potential matches. This is also possible with SQL via the MATCH_RECOGNIZE clause. Standard SQL Support Structured Query Language (SQL) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS). However, it is so predominant that other technologies like non-relational databases (NoSQL) and streaming platforms adopt it, too. SQL became a standard of the American National Standards Institute (ANSI) in 1986 and the International Organization for Standardization (ISO) in 1987. Hence, if a tool supports ANSI SQL, it ensures that any 3rd party tool can easily integrate using standard SQL queries (at least in theory). Apache Flink supports ANSI SQL, including the Data Definition Language (DDL), Data Manipulation Language (DML), and Query Language. Flink’s SQL support is based on Apache Calcite, which implements the SQL standard. This is great because many personas, including developers, architects, and business analysts, already use SQL in their daily job. The SQL integration is based on the so-called Flink SQL Gateway, which is part of the Flink framework allowing other applications to interact with a Flink cluster through a REST API easily. User applications (e.g., Java/Python/Shell program, Postman) can use the REST API to submit queries, cancel jobs, retrieve results, etc. This enables a possible integration of Flink SQL with traditional business intelligence tools like Tableau, Microsoft Power BI, or Qlik. However, to be clear, ANSI SQL was not built for stream processing. Incorporating Streaming SQL functionality into the official SQL standard is still in the works. The Streaming SQL working group includes database vendors like Microsoft, Oracle, and IBM, cloud vendors like Google and Alibaba, and data streaming vendors like Confluent. More details: "The History and Future of SQL: Databases Meet Stream Processing". Having said this, Flink supports continuous sliding windows and various streaming joins via ANSI SQL. There are things that require additional non-standard SQL keywords but continuous sliding windows or streaming joins, in general, are possible. Machine Learning with Kafka, Flink, and Python In conjunction with data streaming, machine learning solves the impedance mismatch of reliably bringing analytic models into production for real-time scoring at any scale. I explored ML deployments within Kafka applications in various blog posts, e.g., embedded models in Kafka Streams applications or using a machine learning model server with streaming capabilities like Seldon. PyFlink is a Python API for Apache Flink that allows you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines, and ETL processes. If you’re already familiar with Python and libraries such as Pandas, then PyFlink makes it simpler to leverage the full capabilities of the Flink ecosystem. PyFlink is the missing piece for an ML-powered data streaming infrastructure, as almost every data engineer uses Python. The combination of Tiered Storage in Kafka and Data Streaming with Flink in Python is excellent for model training without the need for a separate data lake. When To Use Kafka Streams Instead of Apache Flink Don't underestimate the power and use cases of Kafka-native stream processing with Kafka Streams. The adoption rate is massive, as Kafka Streams is easy to use. And it is part of Apache Kafka. To be clear: Kafka Streams is already included if you download Kafka from the Apache website. Kafka Streams Is a Library, Apache Flink Is a Cluster The most significant difference between Kafka Streams and Apache Flink is that Kafka Streams is a Java library, while Flink is a separate cluster infrastructure. Developers can deploy the Flink infrastructure in session mode for bigger workloads (e.g., many small, homogenous workloads like SQL queries) or application mode for fewer bigger, heterogeneous data processing tasks (e.g., isolated applications running in a Kubernetes cluster). No matter your deployment option, you still need to operate a complex cluster infrastructure for Flink (including separate metadata management on a ZooKeeper cluster or an etcd cluster in a Kubernetes environment). TL;DR: Apache Flink is a fantastic stream processing framework and a top #5 Apache open-source project. But it is also complex to deploy and difficult to manage. Benefits of Using the Lightweight Library of Kafka Streams Kafka Streams is a single Java library. This adds a few benefits: Kafka-native integration supports critical SLAs and low latency for end-to-end data pipelines and applications with a single cluster infrastructure instead of operating separate messaging and processing engines with Kafka and Flink. Kafka Streams apps still run in their VMs or Kubernetes containers, but high availability and persistence are guaranteed via Kafka Topics. Very lightweight with no other dependencies (Flink needs S3 or similar storage as the state backend) Easy integration into testing/CI/DevOps pipelines Embedded stream processing into any existing JVM application, like a lightweight Spring Boot app or a legacy monolith built with old Java EE technologies like EJB. Interactive Queries allow leveraging the state of your application from outside your application. The Kafka Streams API enables your applications to be queryable. Flink's similar feature "queryable state" is approaching the end of its life due to a lack of maintainers. Kafka Streams is well-known for building independent, decoupled, lightweight microservices. This differs from submitting a processing job into the Flink (or Spark) cluster; each data product team controls its destiny (e.g., don’t depend on the central Flink team for upgrades or get forced to upgrade). Flink's application mode enables a similar deployment style for microservices. But: Kafka Streams and Apache Flink Live In Different Parts of a Company Today, Kafka Streams and Flink are usually used for different applications. While Flink provides an application mode to build microservices, most people use Kafka Streams for this today. Interactive queries are available in Kafka Streams and Flink, but it got deprecated in Flink as there is not much demand from the community. These are two examples that show that there is no clear winner. Sometimes Flink is the better choice, and sometimes Kafka Streams makes more sense. "In summary, while there certainly is an overlap between the Streams API in Kafka and Flink, they live in different parts of a company, largely due to differences in their architecture and thus we see them as complementary systems." That's the quote of a "Kafka Streams vs. Flink comparison" article written in 2016 (!) by Stephan Ewen, former CTO of Data Artisans, and Neha Narkhede, former CTO of Confluent. While some details changed over time, this old blog post is still pretty accurate today and a good read for a more technical audience. The domain-specific language (DSL) of Kafka Streams differs from Flink but is also very similar. How are both characteristics possible? It depends on who you ask. This (legitimate) subject for debate often segregates Kafka Streams and Flink communities. Kafka Streams has Stream and Table APIs. Flink has DataStream, Table, and SQL API. I guess 95% of use cases can be built with both technologies. APIs, infrastructure, experience, history, and many other factors are relevant for choosing the proper stream processing framework. Some architectural aspects are very different in Kafka Streams and Flink. These need to be understood and can be a pro or con for your use case. For instance, Flink's checkpointing has the advantage of getting a consistent snapshot, but the disadvantage is that every local error always stops the whole job and everything has to be rolled back to the last checkpoint. Kafka Streams does not have this concept. Local errors can be recovered locally (move the corresponding tasks somewhere else; the task/threads without errors just continue normally). Another example is Kafka Streams' hot standby for high availability versus Flink's fault-tolerant checkpointing system. Kafka + Flink = A Powerful Combination for Stream Processing Apache Kafka is the de facto standard for data streaming. It includes Kafka Streams, a widely used Java library for stream processing. Apache Flink is an independent and successful open-source project offering a stream processing engine for real-time and batch workloads. The combination of Kafka (including Kafka Streams) and Flink is already widespread in enterprises across all industries. Both Kafka Streams and Flink have benefits and tradeoffs for stream processing. The freedom of choice of these two leading open-source technologies and the tight integration of Kafka with Flink enables any kind of stream processing use case. This includes hybrid, global, and multi-cloud deployments, mission-critical transactional workloads, and powerful analytics with embedded machine learning. As always, understand the different options and choose the right tool for your use case and requirements. What is your favorite for streaming processing, Kafka Streams, Apache Flink, or another open-source or proprietary engine? In which use cases do you leverage stream processing? Let’s connect on LinkedIn and discuss it!

By Kai Wähner CORE

Terraform Explained in Five Minutes

What Is Terraform? Terraform is an open-source “Infrastructure as Code” tool created by HashiCorp. A declarative coding tool, Terraform enables developers to use a high-level configuration language called HCL (HashiCorp Configuration Language) to describe the desired “end-state” cloud or on-premises infrastructure for running an application. It then generates a plan for reaching that end-state and executes the plan to provision the infrastructure. Because Terraform uses a simple syntax, can provision infrastructure across multiple clouds and on-premises data centers, and can safely and efficiently re-provision infrastructure in response to configuration changes, it is currently one of the most popular infrastructure automation tools available. If your organization plans to deploy a hybrid cloud or multi-cloud environment, you’ll likely want or need to get to know Terraform. Why Infrastructure as Code (IaC)? To better understand the advantages of Terraform, it helps to first understand the benefits of Infrastructure as Code (IaC). IaC allows developers to codify infrastructure in a way that makes provisioning automated, faster, and repeatable. It’s a key component of Agile and DevOps practices such as version control, continuous integration, and continuous deployment. Infrastructure as code can help with the following: Improve speed: Automation is faster than manually navigating an interface when you need to deploy and/or connect resources. Improve reliability: If your infrastructure is large, it becomes easy to misconfigure a resource or provision services in the wrong order. With IaC, the resources are always provisioned and configured exactly as declared. Prevent configuration drift: Configuration drift occurs when the configuration that provisioned your environment no longer matches the actual environment. (See ‘Immutable infrastructure’ below.) Support experimentation, testing, and optimization: Because Infrastructure as Code makes provisioning new infrastructure so much faster and easier, you can make and test experimental changes without investing lots of time and resources, and if you like the results, you can quickly scale up the new infrastructure for production. Why Terraform? There are a few key reasons developers choose to use Terraform over other Infrastructure as Code tools: Open source: Terraform is backed by large communities of contributors who build plugins to the platform. Regardless of which cloud provider you use, it’s easy to find plugins, extensions, and professional support. This also means Terraform evolves quickly, with new benefits and improvements added consistently. Platform agnostic: Meaning you can use it with any cloud services provider. Most other IaC tools are designed to work with a single cloud provider. Immutable infrastructure: Most Infrastructure as Code tools create mutable infrastructure, meaning the infrastructure can change to accommodate changes such as a middleware upgrade or new storage server. The danger with mutable infrastructure is configuration drift — as the changes pile up, the actual provisioning of different servers or other infrastructure elements ‘drifts’ further from the original configuration, making bugs or performance issues difficult to diagnose and correct. Terraform provisions immutable infrastructure, which means that with each change to the environment, the current configuration is replaced with a new one that accounts for the change, and the infrastructure is reprovisioned. Even better, previous configurations can be retained as versions to enable rollbacks if necessary or desired. Terraform Modules Terraform modules are small, reusable Terraform configurations for multiple infrastructure resources that are used together. Terraform modules are useful because they allow complex resources to be automated with reusable, configurable constructs. Writing even a very simple Terraform file results in a module. A module can call other modules — called child modules — which can make assembling configuration faster and more concise. Modules can also be called multiple times, either within the same configuration or in separate configurations. Terraform Providers Terraform providers are plugins that implement resource types. Providers contain all the code needed to authenticate and connect to a service — typically from a public cloud provider — on behalf of the user. You can find providers for the cloud platforms and services you use, add them to your configuration, and then use their resources to provision infrastructure. Providers are available for nearly every major cloud provider, SaaS offering, and more, developed and/or supported by the Terraform community or individual organizations. Refer to the Terraform documentation for a detailed list. Terraform vs. Kubernetes Sometimes, there is confusion between Terraform and Kubernetes and what they actually do. The truth is that they are not alternatives and actually work effectively together. Kubernetes is an open-source container orchestration system that lets developers schedule deployments onto nodes in a compute cluster and actively manages containerized workloads to ensure that their state matches the users’ intentions. Terraform, on the other hand, is an Infrastructure as Code tool with a much broader reach, letting developers automate complete infrastructure that spans multiple public clouds and private clouds. Terraform can automate and manage Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), or even Software-as-a-Service (SaaS) level capabilities and build all these resources across all those providers in parallel. You can use Terraform to automate the provisioning of Kubernetes — particularly managed Kubernetes clusters on cloud platforms — and to automate the deployment of applications into a cluster. Terraform vs. Ansible Terraform and Ansible are both Infrastructure as Code tools, but there are a couple of significant differences between the two: While Terraform is purely a declarative tool (see above), Ansible combines both declarative and procedural configurations. In the procedural configuration, you specify the steps, or the precise manner, in which you want to provision infrastructure to the desired state. Procedural configuration is more work, but it provides more control. Terraform is open source; Ansible is developed and sold by Red Hat. IBM and Terraform IBM Cloud Schematics is IBM’s free cloud automation tool based on Terraform. IBM Cloud Schematics allows you to fully manage your Terraform-based infrastructure automation so you can spend more time building applications and less time building environments.

By Pradeep Gopalgowda

How To Install CMAK, Apache Kafka, Java 18, and Java 19 [Video Tutorials]

This article explains how to install the following in video tutorials: CMAK (Cluster manager for Apache Kafka) or Kafka manager Apache Kafka on Windows or Windows 10/11 Java 18 on Windows 10/11, JDK installation Java 19 on Windows 10/11, JDK installation Java JDK 19 on Amazon EC2 Instance or Linux operating system Apache Kafka on Amazon EC2 Instance or Linux operating system How to Install and Use CMAK (Cluster Manager for Apache Kafka) or Kafka Manager Learn how to install and use CMAK. This is a tool for managing Apache Kafka clusters that allow us to view all the topics, partitions, numbers of offsets, and which are assigned to what and all topics, etc. Apache Kafka on Windows or Windows 10/11 This video tutorial will cover the following topics: How to install Apache Kafka on the Windows operating system Hands-on Kafka using CLI Install and run Kafka on Windows Kafka installation How to download and set up Kafka in Windows Java 18 on Windows The following shows how to install Java 18 on a Windows operating system. Java 19 on Windows Here, learn how to download and install Java 19 on the Windows operating system. Java JDK 19 on Amazon EC2 or Linux Operating Systems This video tutorial explains how to install Java JDK 19 on Amazon EC2 or Linux operating systems. Apache Kafka on Amazon EC2 Instance or Linux Operating System This video tutorial explains the following: How to install Apache Kafka on Amazon EC2 Instance or Linux operating system Hands-on Kafka using CLI Install and Run Kafka on the Linux operating system Kafka installation How to download and set up Kafka in the Linux operating system

By Ram N

Building a Flask Web Application With Docker: A Step-by-Step Guide

Flask is a popular web framework for building web applications in Python. Docker is a platform that allows developers to package and deploy applications in containers. In this tutorial, we'll walk through the steps to build a Flask web application using Docker. Prerequisites Before we begin, you must have Docker installed on your machine. You can download the appropriate version for your operating system from the official Docker website. Additionally, it would help if you had a basic understanding of Flask and Python. Creating a Flask Application The first step is to create a Flask application. We'll create a simple "Hello, World!" application for this tutorial. Create a new file called app.py and add the following code: Python from flask import Flask app = Flask(__name__) @app.route('/') def hello(): return 'Hello, World!' Save the file and navigate to its directory in a terminal. Creating a Dockerfile The next step is to create a Dockerfile. A Dockerfile is a script that describes the environment in which the application will run. We'll use the official Python 3.8 image as the base image for our Docker container. FROM python:3.8-slim-buster: This sets the base image for our Docker container to the official Python 3.8 image. WORKDIR /app: This sets the working directory inside the container to /app. COPY requirements.txt .: This copies the requirements.txt file from our local machine to the /app directory inside the container. RUN pip install --no-cache-dir -r requirements.txt: This installs the dependencies listed in requirements.txt. COPY . .: This copies the entire local directory to the /app directory inside the container. CMD [ "python", "app.py" ]: This sets the command to run when the container starts to python app.py. Create a new file called Dockerfile and add the following code: Dockerfile FROM python:3.8-slim-buster # Set the working directory WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy the application code COPY . . # Run the application CMD [ "python", "app.py" ] Save the Dockerfile and navigate to its directory in a terminal. Building the Docker Image The next step is to build a Docker image from the Dockerfile. Run the following command to build the image: Python docker build -t my-flask-app . This command builds an image named my-flask-app from the Dockerfile in the current directory. The . at the end of the command specifies that the build context is the current directory. Starting the Docker Container Now that we have a Docker image, we can start a container from it. Run the following command to start a new container from the my-flask-app image and map port 5000 on the host to port 5000 in the container: Python docker run -p 5000:5000 my-flask-app This command starts a new container from the my-flask-app image and maps port 5000 on the host to port 5000 in the container. Testing the Flask Application Finally, open your web browser and navigate to http://localhost:5000. You should see the "Hello, World!" message displayed in your browser, indicating that the Flask application is running inside the docker application. Customizing the Flask Application You can customize the Flask application by modifying the app.py file and rebuilding the Docker image. For example, you could modify the hello function to return a different message: Python @app.route('/') def hello(): return 'Welcome to my Flask application!' Save the app.py file and rebuild the Docker image using the docker build command from earlier. Once the image is built, start a new container using the docker run command from earlier. When you navigate to http://localhost:5000, you should see the updated message displayed in your browser. Advantages Docker simplifies the process of building and deploying Flask applications, as it provides a consistent and reproducible environment across different machines and operating systems. Docker allows for easy management of dependencies and versions, as everything needed to run the application is contained within the Docker image. Docker facilitates scaling and deployment of the Flask application, allowing for the quick and easy creation of new containers. Disadvantages Docker adds an additional layer of complexity to the development and deployment process, which may require additional time and effort to learn and configure. Docker may not be necessary for small or simple Flask applications, as the benefits may not outweigh the additional overhead and configuration. Docker images and containers can take up significant disk space, which may concern applications with large dependencies or machines with limited storage capacity. Conclusion In this tutorial, we've walked through the steps to build a Flask web application using Docker. We've created a simple Flask application, written a Dockerfile to describe the environment in which the application will run, built a Docker image from the Dockerfile, started a Docker container from the image, and tested the Flask application inside the container. With Docker, you can easily package and deploy your Flask application in a consistent and reproducible manner, making it easier to manage and scale your application.

By Joseph owino

Tools

DZone's Featured Tools Resources

Top Tools Experts

The Latest Tools Topics