Containers Trend Report. Explore the current state of containers, containerization strategies, and modernizing architecture.
Securing Your Software Supply Chain with JFrog and Azure. Leave with a roadmap for keeping your company and customers safe.
The cultural movement that is DevOps — which, in short, encourages close collaboration among developers, IT operations, and system admins — also encompasses a set of tools, techniques, and practices. As part of DevOps, the CI/CD process incorporates automation into the SDLC, allowing teams to integrate and deliver incremental changes iteratively and at a quicker pace. Together, these human- and technology-oriented elements enable smooth, fast, and quality software releases. This Zone is your go-to source on all things DevOps and CI/CD (end to end!).
This is a question that I hear on a fairly regular basis, not just internally but from external customers as well. So it’s one that I would like to help you walk through so that you can really figure out what makes sense in your organization, and I think the answer is probably going to surprise you a little bit. I think probably the most important thing to understand is this isn’t a versus question. You don’t have to have one or the other. As a matter of fact, I would argue, and I think that many people would agree, that SRE is actually an essential component of DevOps, and a good, properly implemented DevOps method leads to the necessity of SRE when it comes to deployment. So there are two sides to the same coin, so that will obviously lead to a little bit of confusion because DevOps is the development methodology; it’s all about integrating your development teams and your operations teams. It’s about knocking down those silos between them. It’s about ensuring that everybody is singing the same songbook, and that’s very important. SRE is in charge of automating all of the things and making sure that you never go down. Two sides of the same coin There are really two parts of the same group, so let’s look at the differences because they do have some differences. Probably the first and largest one is that when we think about our DevOps.The DevOps guys, particularly your developers, are doing the Core Development. They are answering the question, “What do we want to do?” they are working with product, they’re working with sales, they’re working with marketing to develop the design and deploy. What is it that we do? They’re working on the core. On the other hand, SRE is not working on the Core Development. What they are working on is the implementation of the core, they are working on the deployment, and they are constantly giving feedback back to that core development group to say, “Hey, something that you guys have designed isn’t working exactly the way that you think that it is” If you want to think about it this way DevOps is trying to develop. SRE is saying how we deploy and maintain and run to solve this problem. It’s theoretical versus practical. Ideally, they’re talking to each other every day because SRE should be logging defects; they should be logging tickets back with development. Still, probably most importantly, they need to understand that they have the same goals. These groups should never be aligned against one another. And so, they do have to have a common understanding. Let’s see about the most important part; we’re going to talk about failure because failure is not necessarily failure; it’s just a way of life. It doesn’t matter what you deploy. It doesn’t matter how well it goes; it will happen. There is a failure budget or an error budget where things will go wrong. SRE team, when it comes to failure, they’re going to anticipate it, they’re going to monitor it, they’re going to log it, they’re going to record everything, and ideally, they can identify a failure before it happens. They’re going to have predictive analytics that will say, “All right, this thing is going to go bad based on what we’ve seen before.” So, SRE is responsible for mitigating some of those failures through monitoring, logging, and doing the preemptive parts. So we’ll do the monitors, we’ll do the logs. SRE is also going to lead all of your post-actual failure incident management. They’re going to get you through the incident, to begin with, and then they’re going to hot wash it, and when it’s done, you have to get Dev online because these are the guys who are going to solve the core problem; some RCAs might be solved by SRE internally. Then SRE team will integrate the fix into their monitoring and their logging efforts to make sure that we don’t get into another RCA for the same kind of problem. There are different skill sets. Core development DevOps, these are the guys that really love writing software. SRE is a little bit more of an investigative mindset, right? You have to be willing to go and do that analysis, figure out what things have gone wrong, and automate everything. But there’s a lot that they have in common. Everyone should be writing automation; everyone should get rid of toil as much as possible because we just don’t have the time to do manual tasks. When we can put the computers in charge of it, computers are not great at thinking on their own, but if you need it to do the same thing repeatedly, you can’t beat computing for that. And so, automation is key; you have a slightly different mindset. DevOps is going to automate deployment; they’re going to automate tasks; they’re going to automate features. SRE will automate redundancy and manual tasks that they can turn into programmatic tasks to keep the stack up.
The industry-wide move to continuous integration (CI) build, and test presents a challenge for the integration world due to the number and variety of resource dependencies involved, such as databases, MQ-based services, REST endpoints, etc. While it is quite common to automate testing using dedicated services (a “test” or “staging” database, for example), the fixed number of these services limits the number of builds that can be tested and therefore limits the agility of integration development. Containerization provides a way to increase CI build scalability without compromising quality by allowing a database to be created for each run and then deleted again after testing is complete. This does not require the integration solution to be deployed into containers in production and is compatible with deploying to integration nodes: only the CI pipeline needs to be able to use containers and only for dependent services. Quick summary: Start a containerized database, configure ACE policy, and run tests; a working example can be found at ot4i. Background Integration flows often interact with multiple external services such as databases, MQ queue managers, etc., and testing the flows has historically required live services to be available. This article is focused on databases, but other services follow the same pattern. A common pattern in integration development relies on development staff building applications and doing some level of testing locally, followed by checking the resulting source into a source-code management system (git, SVN, etc.). The source is built into a BAR file in a build pipeline and then deployed to an integration node for further testing, followed by promotion to the next stage, etc. While the names and quantity of the stages differ between different organizations, the overall picture looks something like this: This style of deployment pipeline allows organizations to ensure their applications behave as expected and interact with other services correctly but does not usually allow for changes to be delivered both quickly and safely. The key bottlenecks tend to be in the test stages of the pipeline, with build times as a less-common source of delays. While it is possible to speed up delivery by cutting back on testing (risky) or adding large numbers of QA staff (expensive), the industry has tended towards a different solution: continuous integration (CI) builds with finer-grained testing at earlier stages to catch errors quickly. With a CI pipeline enabled and automated finer-grained testing added, the picture changes to ensure more defects are found early on. QA and the other stages are still essential, but these stages do not see the same level of simple coding bugs and merge failures that might have been seen with the earlier pipelines; such bugs should be found in earlier stages, leaving the QA teams able to focus on the more complex scenarios and performance testing that might be harder to achieve in earlier stages. A simple CI pipeline (which could be Jenkins, Tekton, or many other tools) might look something like this: Note that (as discussed above) there would usually be environments to the right of the pipeline, such as staging or pre-prod, that are not shown in order to keep the diagram simple. The target of the pipeline could be containers or integration nodes, with both shown in the diagram; the DB2 database used by the pipeline could also be in either infrastructure. The pipeline steps labeled “Unit Test” and “Integration Test” are self-explanatory (with the services used by integration testing not shown), but “Component Test” is more unusual. The term “component test” was used in the ACE product development pipeline to mean “unit tests that use external services” and is distinct from integration testing because component tests only focus on one service. See ACE unit and component tests for a discussion of the difference between test styles in integration. This pipeline benefits from being able to shift testing “left,” with more testing being automated and running faster: the use of ACE v12 test capabilities (see JUnit support for flow testing for details) allows developers to run the tests on their own laptops from the toolkit as well as the tests being run automatically in the pipeline, and this can dramatically reduce the time required to validate new or modified code. This approach is widely used in other languages and systems, relying heavily on unit testing to achieve better outcomes, and can also be used in integration. This includes the use of component tests to verify interactions with services, resulting in the creation of large numbers of quick-to-run tests to cover all the required use cases. However, while shifting left is an improvement, it is still limited by the availability of live services to call during the tests. As development agility becomes more important and the frequency of build/test cycles becomes greater due to mandatory security updates as well as code changes, the need to further speed up testing becomes more pressing. While it is possible to do this while still using pre-provisioned infrastructure (for example, creating a larger fixed set of databases to be used in testing), there are still limits on how much testing can be performed at one time: providing enough services for all tests to be run in parallel might be theoretically possible but cost-prohibitive, and the next section describes a cheaper solution. On-Demand Database Provisioning While the Wikipedia article on shift-left testing says the “transition to traditional shift-left testing has largely been completed,” this does not appear to be true in the integration world (and is debatable in the rest of the industry). As the point of a lot of integration flows is to connect systems together, the availability of these systems for test purposes is a limiting factor in how far testing can be shifted left in practice. Fortunately, it is possible to run many services in containers, and these services can then be created on-demand for a pipeline run. Using a DB2 database as an example, the pipeline picture above would now look as follows: This pipeline differs from the previous picture in that it now includes creating a database container for use by tests. This requires the database to be set up (schemas, tables, etc. created) either during the test or in advance, but once the scripts and container images are in place, then the result can be scaled without the need for database administrators to create dedicated resources. Note the target could still be either integration nodes or containers. Creating a new database container every time means that there will never be data left in tables from previous runs, nor any interference from other tests being run by other pipeline runs at the same time; the tests will be completely isolated from each other in both space and time. Access credentials can also be single-use, and multiple databases can be created if needed for different tests that need greater isolation (including integration testing). While isolation may not seem to be relevant if the tests do nothing, but trigger reads from a database, the benefits become apparent when inserting new data into the database: a new database will always be in a clean state when testing starts, and so there is no need to keep track of entries to clean up after testing is complete. This is especially helpful when the flow code under test is faulty and inserts incorrect data or otherwise misbehaves, as (hopefully) tests will fail, and the whole database (including the garbage data) will be deleted at the end of the run. While it might be possible to run cleanup scripts with persistent databases to address these problems, temporary databases eliminate the issue entirely (along with the effort required to write and maintain cleanup scripts). More Test Possibilities Temporary databases combined with component testing also make new styles of testing feasible, especially in the error-handling area. It can be quite complicated to trigger the creation of invalid database table contents from external interfaces (the outer layers of the solution will hopefully refuse to accept the data in most cases), and yet the lower levels of code (common libraries or sub-flows) should be written to handle error situations where the database contains unexpected data (which could come from code bugs in other projects unrelated to integration). Writing a targeted component test to drive the lower level of code using a temporary database with invalid data (either pre-populated or created by the test code) allows error-handling code to be validated automatically in an isolated way. Isolated component testing of this sort lowers the overall cost of a solution over time: without automated error testing, the alternatives tend to be either manually testing the code once and then hoping it carries on working (fast but risky) or else having to spend a lot of developer time inspecting code and conducting thought experiments (“what happens if this happens and then that happens?”) before changing any of the code (slow but safer). Targeted testing with on-demand service provision allows solution development to be faster and safer simultaneously. The underlying technology that allows both faster and safer development is containerization and the resulting ease with which databases and other services can be instantiated when running tests. This does not require Kubernetes, and in fact, almost any technology would work (docker, Windows containers, etc.) as containers are significantly simpler than VMs when it comes to on-demand service provision. Cloud providers can also offer on-demand databases, and those would also be an option as long as the startup time is acceptable; the only critical requirements are that the DB be dynamic and network-visible. Startup Time Considerations On-demand database containers clearly provide isolation, but what about the time taken to start the container? If it takes too long, then the pipeline might be slowed down rather than sped up and consume more resources (CPU, memory, disk, etc.) than before. Several factors affect how long a startup will take and how much of a problem it is: The choice of database (DB2, Postgres, etc.) makes a lot of difference, with some database containers taking a few seconds to start while others take several minutes. This is not usually something that can be changed for existing applications, though for new use cases, it might be a factor in choosing. It is possible to test with a different type of database that is used in production, but this seriously limits the tests. The amount of setup needed (tables, stored procedures, etc.) to create a useful database once the database has started. This could be managed by the tests themselves in code, but normally it is better to use the existing database scripts responsible for creating production or test databases (especially if the database admins also do CI builds). Using real scripts helps ensure the database looks as it should, but also requires more work up-front before the tests can start. Available hardware resources can also make a big difference, especially if multiple databases are needed to isolate tests. This is also affected by the choice of database, as some databases are more resource-intensive than others. The number of tests to be run and how long they take affect how much the startup time actually matters. For a pipeline with ten minutes of database testing, a startup time of one minute is less problematic than it would be for a pipeline with only thirty seconds of testing. Some of these issues can be mitigated with a small amount of effort: database container images can be built in advance and configured with the correct tables and then stored (using docker commit if needed) as a pre-configured image that will start more quickly during the pipeline runs. The database can also be started at the beginning of the pipeline so it has a chance to start while the compile and unit test phases are running; the example in the ACE demo pipeline repo (see below) does this with a DB2 container. Limitations While on-demand databases are useful for functional testing, performance testing is harder: database containers are likely to be sharing hardware resources with other containers, and IO may be unpredictable at times. Security may also be hard to validate, depending on the security configuration of the production databases. These styles of testing may be better left to later environments that use pre-provisioned resources, but the earlier pipeline stages should have found most of the functional coding errors before then. To be most effective, on-demand provisioning requires scripts to create database objects. These may not always be available if the database has been built manually over time, though with the moves in the industry towards database CI, this should be less of a problem in the future. Integration Node Deployments Although the temporary database used for testing in the pipeline is best to run as a container, this does not mean that the pipeline must also end with deployment into container infrastructure such as Kubernetes. The goal of the earlier pipeline stages is to find errors in code and configuration as quickly as possible, and this can be achieved even if the application will be running in production in an integration node. The later deployment environments (such as pre-prod) should match the production deployment topology as closely as possible, but better pipeline-based testing further left should mean that fewer bugs are found in the later environments: simpler code and configuration issues should be caught much earlier. This will enhance agility overall even if the production topology remains unchanged. In fact, it is often better to improve the earlier pipeline stages first, as improved development efficiency can allow more time for work such as containerization. Example of Dynamic Provisioning The ACE demo pipeline on OT4i (google “ace demo pipeline”) has been extended to include the use of on-demand database provision. The demo pipeline uses Tekton to build, test, and deploy a database application (see description here), and the component tests can use a DB2 container during the pipeline run: The pipeline uses the DB2 Community Edition as the database container (see DB2 docs) and can run the IBM-provided container due to not needing to set up database objects before running tests (tables are created by the tests). Due to the startup time for the container, the database is started in the background before the build and unit test step, and the pipeline will wait if needed, for the database to finish starting before running the component tests. A shutdown script is started on a timer in the database container to ensure that it does not keep running if the pipeline is destroyed for any reason; this is less of a concern in a demo environment where resources are free (and limited!) but would be important in other environments. Note that the DB2 Community Edition license is intended for development uses but still has all the capabilities of the production-licensed code (see DB2 docs here), and as such, is a good way to validate database code; other databases may require licenses (or be completely free to use). Summary CI pipelines for integration applications face challenges due to large numbers of service interactions, but these issues can be helped by the use of on-demand service provisioning. This is especially true when combined with targeted testing using component-level tests on subsections of a solution, allowing for faster and safer development cycles. This approach is helped by the widespread availability of databases and other services in containers that can be used in pipelines without requiring a wholesale move to containers in production. Used appropriately, the resulting shift of testing to the left has the potential to help many integration organizations develop high-quality solutions with less effort, even without a wholesale move to containers in all environments.
Containerization has resulted in many businesses and organizations developing and deploying applications differently. A recent report by Gartner indicated that by 2022, more than 75% of global organizations would be running containerized applications in production, up from less than 30% in 2020. However, while containers come with many benefits, they certainly remain a source of cyberattack exposure if not appropriately secured. Previously, cybersecurity meant safeguarding a single "perimeter." By introducing new layers of complexity, containers have rendered this concept outdated. Containerized environments have many more abstraction levels, which necessitates using specific tools to interpret, monitor, and protect these new applications. What Is Container Security? Container security is using a set of tools and policies to protect containers from potential threats that will affect an application, infrastructure, system libraries, run time, and more. Container security involves implementing a secure environment for the container stack, which consists of the following: Container image Container engine Container runtime Registry Host Orchestrator Most software professionals automatically assume that Docker and Linux kernels are secure from malware, an easily overestimated assumption. Top 5 Container Security Best Practices 1. Host and OS Security Containers provide isolation from the host, although they both share kernel resources. Often overlooked, this aspect makes it more difficult but not impossible for an attacker to compromise the OS through a kernel exploit so they can gain root access to the host. Hosts that run your containers need to have their own set of security access in place by ensuring the underlying host operating system is up to date. For example, it is running the latest version of the container engine. Ideally, you will need to set up some monitoring to be alerted for any vulnerabilities on the host layer. Also, choose a "thin OS," which will speed up your application deployment and reduce the attack surface by removing unnecessary packages and keeping your OS as minimal as possible. Essentially, in a production environment, there is no need to let a human admin SSH to the host to apply any configuration changes. Instead, it would be best to manage all hosts through IaC with Ansible or Chef, for instance. This way, only the orchestrator can have ongoing access to run and stop containers. 2. Container Vulnerability Scans Regular vulnerability scans of your container or host should be carried out to detect and fix potential threats that hackers could use to access your infrastructure. Some container registries provide this kind of feature; when your image is pushed to the registry, it will automatically scan it for potential vulnerabilities. One way you can be proactive is to set up a vulnerability scan in your CI pipeline by adopting the "shift left" philosophy, which means you implement security early in your development cycle. Again, Trivy would be an excellent choice to achieve this. Suppose you were trying to set up this kind of scan to your on-premise nodes. In that case, Wazuh is a solid option that will log every event and verify them against multiple CVE (Common Vulnerabilities and Exposure) databases. 3. Container Registry Security Container registries provide a convenient and centralized way to store and distribute images. It is common to find organizations storing thousands of images in their registries. Since the registry is so important to the way a containerized environment works, it must be well protected. Therefore, investing time to monitor and prevent unauthorized access to your container registry is something you should consider. 4. Kubernetes Clusters Security Another action you can take is to re-enforce security around your container orchestration, such as preventing risks from over-privileged accounts or attacks over the network. Following the least-privileged access model, protecting pod-to-pod communications would limit the damage done by an attack. A tool that we would recommend in this case is Kube Hunter, which acts as a penetration testing tool. As such, it allows you to run a variety of tests on your Kubernetes cluster so you can start taking steps to improve security around it. You may also be interested in Kubescape, which is similar to Kube Hunter; it scans your Kubernetes cluster, YAML files, and HELM Charts to provide you with a risk score: 5. Secrets Security A container or Dockerfile should not contain any secrets. (certificate, passwords, tokens, API Keys, etc.) and still, we often see secrets hard-coded into the source code, images, or build process. Choosing a secret management solution will allow you to store secrets in a secure, centralized vault. Conclusion These are some of the proactive security measures you may take to protect your containerized environments. This is vital because Docker has only been around for a short period, which means its built-in management and security capabilities are still in their infancy. Thankfully, the good news is that achieving decent security in a containerized environment can be easily done with multiple tools, such as the ones we listed in the article.
Shift Left and Shift Right are two terms commonly used in the DevOps world to describe approaches for improving software quality and delivery. These approaches are based on the idea of identifying defects and issues as early as possible in the development process. This way, teams can address the issues quickly and efficiently, allowing software to meet user expectations. Shift Left focuses on early testing and defect prevention, while Shift Right emphasizes testing and monitoring in production environments. Here, in this blog, we will discuss the differences between these two approaches: Shift Left and Shift Right. The Shift-Left Approach Shift Left meaning in DevOps, refers to the practice of moving testing and quality assurance activities earlier in the software development lifecycle. This means that testing is performed as early as possible in the development process. Ideally, it is applied at the start, during the requirements-gathering phase. Shift-Left allows teams to identify and fix defects earlier in the process. This reduces the cost and time required for fixing them later in the development cycle. The goal of Shift Left is to ensure that software is delivered with higher quality and at a faster pace. Shifting left meaning in DevOps involves different aspects. Here are the key aspects of the Shift-Left Approach in DevOps: Early Involvement: The Shift-Left Approach involves testing and quality assurance teams early in the development process. This means that testers and developers work together from the beginning rather than waiting until the end. Automated Testing: Automation plays a key role in the Shift-Left Approach. Test automation tools are used to automate the testing process and ensure that defects are detected early. Collaboration: Collaboration is key to the Shift-Left Approach. Developers and testers work together to ensure that quality is built into the product from the beginning. Continuous Feedback: The Shift-Left Approach emphasizes continuous feedback throughout the development process. This means that defects are identified and fixed as soon as they are discovered, rather than waiting until the end of the SDLC. Continuous Improvement: The Shift-Left Approach is focused on continuous improvement. By identifying defects early, the development team can improve the quality of the software and reduce the risk of defects later in the SDLC. After knowing the shift left meaning, let’s see some examples too. Here are some examples of Shift Left practices in DevOps: Test-Driven Development (TDD): Writing automated tests before writing code to identify defects early in the development process. Code Reviews: Conducting peer reviews of code changes to identify and address defects and improve code quality. Continuous Integration (CI): Automating the build and testing of code changes to catch bugs early and ensure that the software is always in a deployable state. Static Code Analysis: Using automated tools to analyze code for potential defects, vulnerabilities, and performance issues. The Shift Right Approach Shift Right in DevOps, on the other hand, refers to the practice of monitoring and testing software in production environments. This approach involves using feedback from production to improve the software development process. By monitoring the behavior of the software in production, teams can identify and resolve issues quickly. This allows users to gain insights into how the software is used by end users. The goal of Shift Right is to ensure that software is reliable, scalable, and provides a good user experience. This approach involves: Monitoring production systems, Collecting feedback from users, and Using that feedback to identify areas for improvement. Here are the key aspects of the Shift Right Approach in DevOps: Continuous Monitoring: Continuous monitoring of the production environment helps to identify issues in real time. This includes monitoring system performance, resource utilization, and user behavior. Real-World Feedback: Real-world feedback from users is critical to identifying issues that may not have been detected during development and testing. This feedback can be collected through user surveys, social media, and other channels. Root Cause Analysis: When issues are identified, root cause analysis is performed to determine the underlying cause. This involves analyzing logs, system metrics, and other data to understand what went wrong. Continuous Improvement: Once the root cause has been identified, the DevOps team can work to improve the system. This may involve deploying patches or updates, modifying configurations, or making other changes to the system. Here are some examples of the Shift Right Approach: Monitoring and Alerting: Setting up monitoring tools to collect data on the performance and behavior of the software in production environments. Also, setting up alerts to notify the team when issues arise. A/B Testing: Deploying multiple versions of the software and testing them with a subset of users. This helps teams to determine which version performs better in terms of user engagement or other metrics. Production Testing: Testing the software in production environments to identify defects that may only occur in real-world conditions. Chaos Engineering: Introducing controlled failures or disruptions to the production environment to test the resilience of the software. Both Shifts Left, and Shift Right approaches are important in DevOps. They are often used together to create a continuous feedback loop that allows teams to improve software delivery. The key is to find the right balance between the two. This can easily be done using the right DevOps platform and analyzing business needs. Understanding the Differences Between Shift Left and Shift Right Shift Left and Shift Right are two different approaches in DevOps that focus on different stages of the software development and deployment lifecycle. Here are some of the key differences between these two approaches: Focus Shift Left focuses on testing and quality assurance activities that are performed early in the software development lifecycle. While Shift Right focuses on monitoring and testing activities that occur in production environments. Goals The goal of Shift Left is to identify and fix defects early in the development process. This helps to ensure that software is delivered with higher quality and at a faster pace. The goal of Shift Right is to ensure that software is secure, reliable, scalable, and provides a good user experience. Activities Shift Left activities include unit testing, integration testing, and functional testing, as well as automated testing and continuous integration. Shift Right activities include monitoring, logging, incident response, and user feedback analysis. Timing Shift Left activities typically occur before the software is deployed, while Shift Right activities occur after deployment. Risks The risks associated with Shift Left are related to the possibility of missing defects that may only be discovered in production environments. The risks associated with Shift Right are related to the possibility of introducing changes that may cause production incidents or disrupt the user experience. Conclusion Both Shifts Left, and Shift Right approaches are critical for the success of microservices. Hope, after reading this article, you’ve got a clear idea of Shifting left meaning and Shifting Right meaning. By using Shift Left and Shift Right, developers can ensure that their microservices are reliable, scalable, and efficient. In addition, these approaches help to ensure that microservices are adopted with security and compliance.
Value streams have been a central tenet of Lean thinking for decades, starting with Toyota and the Lean Manufacturing movement, and are now widely adopted across industries. Despite this, many businesses still need to harness the full potential of value streams to drive organizational change and achieve greater efficiency and effectiveness. Instead, they may focus narrowly on metrics like team velocity or production pipeline speed, missing the broader picture of the end-to-end system. In modern product development, understanding value streams is crucial to optimizing our ways of working and delivering value to customers. By mapping the path to value, we can gain visibility into our processes and identify improvement areas, such as code deployment bottlenecks or mismatches between personnel and roles. In this blog, we will explore the concept of value stream mapping and its role in actualizing the purpose of DevOps transitions. We'll debunk common myths and misunderstandings around value stream mapping and introduce principles to help you succeed in this activity and beyond. Whether you're a seasoned DevOps practitioner or just starting on your journey, you will want to take advantage of this opportunity to unlock the holy grail of Agile-DevOps value stream hunting. What Is Value Steaming, and Why the Path to Value Streaming Is Quintessential for Your Agile-DevOps Journey? Value stream mapping is the process of analyzing and improving the flow of value to customers by mapping out the end-to-end process, from idea to delivery. Understanding value streams and mapping the path to value streaming is essential for any Agile-DevOps journey. Consider a software development team struggling to deliver value to customers efficiently in real-life scenarios. They may focus on completing tasks and meeting deadlines but only take a holistic view of the entire process. Through value stream mapping, they can identify bottlenecks in the development process, such as long wait times for testing or approval processes, and adjust and streamline the flow of value to customers. Value stream mapping is quintessential to an Agile-DevOps journey because it helps teams understand how their work fits into the larger picture of delivering value to customers. By mapping out the entire process, teams can see where delays occur, where handoffs are inefficient, and where there is room for improvement. Consider a DevOps team struggling to smoothly integrate code changes into the production environment. Through value stream mapping, they may discover that their testing process could be more time-consuming or that there are too many manual steps in the deployment process. By identifying these inefficiencies, they can automate testing and deployment, leading to faster value delivery to customers. By taking a holistic view of the entire process, teams can identify inefficiencies, reduce waste, and deliver customer value more efficiently and effectively. Value stream mapping helps organizations identify and eliminate inefficiencies in their processes, leading to the faster, more efficient delivery of value to customers. Following are some more examples: A financial services company wants to improve the time it takes to process customer loan applications. Through value stream mapping, they discover that there are long wait times between different departments and multiple handoffs that slow down the process. By identifying these inefficiencies, they can redesign the operation to eliminate unnecessary steps and reduce wait times, resulting in faster loan processing and improved customer satisfaction. A healthcare organization wants to improve patient care by reducing the time it takes for lab results to be processed and returned to the doctor. Through value stream mapping, they discover that there are too many manual steps in the lab testing process and bottlenecks in the information flow between departments. By redesigning the process to automate testing and improve communication, they can reduce the time it takes to process lab results, leading to faster patient diagnosis and treatment. A software development company wants to improve the quality of its code releases. Through value stream mapping, they discover that multiple handoffs between development, testing, and operations teams lead to errors and delays. By redesigning the process to automate testing and improve communication between teams, they can reduce the time it takes to identify and fix bugs, resulting in higher-quality code releases and happier customers. Embarking a Lightweight Quest to Value Stream Mapping for Agile-DevOps Teams A lightweight approach to value stream mapping can help Agile-DevOps teams to streamline their processes, improve efficiency, and deliver value to their customers more quickly. By avoiding unnecessary complexity and focusing on the most critical areas of the process, teams can achieve success and stay competitive in today's fast-paced business environment. A lightweight approach means using simple tools and methods to map out your processes instead of getting bogged down in complex and time-consuming activities. This approach can be particularly beneficial for Agile-DevOps teams, often focused on delivering value quickly and efficiently. By taking a lightweight approach, teams can focus on identifying the most critical areas of the process that need improvement and acting quickly to address them. A lightweight approach also allows for greater flexibility and agility, which is essential in the fast-paced world of Agile-DevOps. Teams can quickly adapt and adjust their value stream mapping activities as needed to stay aligned with their goals and objectives. Busting the Myths and Misconceptions: The Truth About Value Streams and Value Stream Mapping Some common myths and misconceptions about value streams and value stream mapping include the idea that they are only relevant to manufacturing or physical products, that they are too complex and time-consuming to implement, or that they are only helpful for large organizations. However, the truth is that value streams and value stream mapping can be applied to any industry or process, regardless of its size or complexity. Instead, they provide a holistic view of the end-to-end process, allowing teams to identify and address bottlenecks, reduce waste, and improve efficiency. Another misconception is that value stream mapping is a one-time activity, but in reality, it should be an ongoing process that evolves with the organization's needs and goals. It's also optional to completely understand all the processes upfront. It's perfectly acceptable to start with a smaller scope and build on that as needed. By busting these myths and misconceptions, teams can better understand the actual value of value stream mapping and how it can be a valuable tool in their Agile-DevOps journey. They can avoid unnecessary complexity and focus on the critical areas of the process that need improvement. Ultimately, this will lead to a more efficient and effective operation and better customer value delivery. Unlocking Business Excellence: Maximize the Benefits of Agile-DevOps Value Stream Mapping Using 8 Lean Principles If you want to take your Agile-DevOps team to the next level, then unlocking business excellence with Agile-DevOps value stream mapping and eight Lean principles is the way to go. Value stream mapping (VSM) is a Lean tool that visually represents the process steps required to deliver value to customers. The VSM process identifies bottlenecks, waste, and opportunities for improvement in the value stream. In addition, it helps Agile-DevOps teams to focus on value-added activities and eliminate non-value-added activities, resulting in reduced lead time, improved quality, and increased customer satisfaction. To maximize the benefits of VSM, Agile-DevOps teams should follow eight Lean principles. These principles are: Define value from the customer's perspective: Identity what your customers consider valuable and focus your efforts on delivering that value. Map the value stream: Create a visual representation of the entire value stream, from idea to delivery, to identify inefficiencies and opportunities for improvement. Create flow: Eliminate waste and create a smooth workflow through the value stream to improve delivery time. Implement pull: Use customer demand to drive work and avoid overproduction. Seek perfection: Continuously improve the value stream to eliminate waste and improve efficiency. Empower the team: Provide your Agile-DevOps team with the tools, resources, and authority they need to succeed. Practice Lean leadership: Create a culture of continuous improvement and empower your team to drive change. Respect people: Treat your team members respectfully and create a positive work environment encouraging collaboration and innovation. By implementing these eight Lean principles, Agile-DevOps teams can unlock business excellence and deliver superior customer value. Deploying the Power of Principles: Succeeding in Value Stream Mapping in a Lightweight Way and the Horizons Beyond By embracing a lightweight approach and deploying the power of Lean principles, organizations can succeed in value stream mapping and achieve business excellence. The lightweight approach enables organizations to identify areas that need improvement, break down silos, and facilitate collaboration across teams, thus unlocking the true potential of value stream mapping. It also helps organizations to sustain their efforts and continue to make improvements in the long run. By embracing the eight Lean principles, organizations can achieve business excellence by continuously improving their value stream and delivering value to their customers. These principles include creating customer value, mapping the value stream, establishing flow, implementing pull, seeking perfection, embracing scientific thinking, empowering teams, and respecting people. So, if you're looking to unlock the true potential of your Agile-DevOps transition, take advantage of value stream mapping. Don't wait; take the first step towards success, start your value stream mapping (VSM) journey today, and take your Agile-DevOps team to the next level!
TestOps is an emerging approach to software testing that combines the principles of DevOps with testing practices. TestOps aims to improve the efficiency and effectiveness of testing by incorporating it earlier in the software development lifecycle and automating as much of the testing process as possible. TestOps teams typically work in parallel with development teams, focusing on ensuring that testing is integrated throughout the development process. This includes testing early and often, using automation to speed up the testing process, and creating continuous testing and improvement. TestOps also works closely with operations teams to ensure that the software is deployed in a stable and secure environment. TestOps is an approach to software testing that emphasizes collaboration between the testing and operations teams to improve the overall efficiency and quality of the software development and delivery processes. Place of TestOps in Software Development The Need for TestOps Initial Investment Adopting DevOps requires an initial investment of time, resources, and financial investment. This can be a significant barrier to adoption for some organizations, particularly those with limited budgets or resources. Learning Curve DevOps requires a significant cultural shift in the way that teams work together, and it can take time to learn new processes, tools, and techniques. This can be challenging for some organizations, particularly those with entrenched processes and cultures. Security Risks DevOps practices can increase the risk of security vulnerabilities if security measures are not properly integrated into the development process. This can be particularly problematic in industries with strict security requirements, such as finance and healthcare. Automation Dependencies DevOps relies heavily on automation, which can create dependencies on tools and technologies that may be difficult to maintain or update. This can lead to challenges in keeping up with new technologies or changing requirements. Cultural Resistance DevOps requires a collaborative and cross-functional culture, which may be difficult to achieve in organizations with siloed teams or where there is resistance to change. Advantages of the TestOps Continuous Testing TestOps allows continuous testing that enables organizations to detect defects early in the development process. This reduces the cost and effort required to fix defects and ensures that software applications can be delivered with high quality. Improved Quality By integrating testing processes into the DevOps pipeline, TestOps ensures that quality is built into software applications from the outset. This reduces the risk of defects and improves the overall quality of the software. Greater Efficiency TestOps enables the automation of testing processes, which can help organizations reduce the time and effort required to test software applications. This can also reduce the costs associated with testing. Increased Collaboration TestOps promotes collaboration between development and testing teams, which can help identify and resolve issues earlier in the development process. This can lead to faster feedback and better communication between teams. Faster Time-to-Market TestOps allows the automation of testing processes, which reduces the time required to test software applications. This enables organizations to release software applications faster, which can give them a competitive advantage in the marketplace. Scope of TestOps in the Future The scope of TestOps in the future is significant as software development continues to become more complex and fast-paced. TestOps combines software testing with DevOps practices. As a result, it is becoming increasingly important for organizations to implement TestOps to ensure that they can deliver high-quality software applications to market quickly. Some of the trends that are likely to shape the future of TestOps include: Increasing Adoption of Agile and DevOps Methodologies Agile and DevOps methodologies are becoming increasingly popular among organizations that want to deliver software applications faster and more efficiently. TestOps is a natural extension of these methodologies, which are likely to become an essential component of Agile and DevOps practices in the future. Greater Focus on Automation Automation is a critical aspect of TestOps and will likely become even more important in the future. The use of automation tools and techniques can help organizations reduce the time and effort required to test software applications while also improving the accuracy and consistency of testing. The Growing Importance of Cloud Computing Cloud computing is becoming increasingly popular among organizations that want to reduce their IT infrastructure costs and improve scalability. TestOps can be implemented in cloud environments, and they are likely to become even more important as more organizations move their software applications to the cloud. Overall, the scope of TestOps in the future is vast, and it is likely to become an essential component of software development practices in the coming years. Conclusion Is TestOps the future of software testing? Obliviously Yes. With the increasing adoption of Agile and DevOps methodologies, there is a growing need for software testing processes that can keep pace with rapid development and deployment cycles. TestOps can help organizations achieve this by integrating testing into the software development lifecycle and ensuring that testing is a continuous and automated process. Furthermore, as more and more software is deployed in cloud environments, TestOps will become even more important in ensuring that applications are secure, scalable, and reliable. In summary, TestOps is a key trend in software testing that is likely to continue to grow in the future as organizations look for ways to improve the efficiency and quality of their software development and delivery processes.
Platform engineering is the discipline of building and maintaining a self-service platform for developers. The platform provides a set of cloud-native tools and services to help developers deliver applications quickly and efficiently. The goal of platform engineering is to improve developer experience (DX) by standardizing and automating most of the tasks in the software delivery lifecycle (SDLC). Instead of context switching like provisioning infrastructure, managing security, and learning curve, developers can focus on coding and delivering the business logic using automated platforms. Platform engineering has an inward-looking perspective as it focuses on optimizing developers in the organization for better productivity. Organizations benefit greatly from developers working at the optimum level because it leads to faster release cycles. The platform makes it happen by providing everything developers need to get their code into production so they do not have to wait for other IT teams for infrastructure and tooling. The self-service platform that makes developers' day-to-day activities more effortless and autonomous is called an internal developer platform (IDP). What Is an Internal Developer Platform (IDP)? IDP is a platform that comprises self-serving cloud-native tools and technologies which developers can use to build, test, deploy, monitor or does almost anything regarding application development and delivery with as little overhead as possible. Platform engineers or platform teams build it after consulting the developers and understanding their unique challenges and workflows. After discussing and implementing Kubernetes CI/CD pipelines and GitOps solutions for many large hi-tech enterprises, we realized a typical IDP would consist of the below 5 pillars: CI/CD platforms for automated deployments (Jenkins, Docker Hub, Argo CD, Devtron, Spinnaker) Container orchestration platforms for managing containers (Kubernetes, Nomad, Docker Swarm) Security management tools for authentication, authorization, and secret management (HashiCorp Vault, AWS Secrets Manager, Okta Identity Cloud) Infrastructure as code (IaC) tools for automated infrastructure provisioning (Terraform, Ansible, Chef, AWS CloudFormation) Observability stacks for workloads and applications visualization across all the clusters (Devtron Kubernetes dashboard, Prometheus, Grafana, ELK stack) The platform team designs IDP in a way that is easy to use for developers with a minimal learning curve. IDPs can help reduce developers' cognitive load and improve DX by automating repetitive tasks, reducing maintenance overhead, and eliminating the need for endless scripting. IDP enables development teams to independently manage resources, infrastructure needs, deployments, and rollbacks by providing a self-service platform. This increases developer autonomy and accountability, reduces dependencies, and streamlines the development cycle. Why Is Platform Engineering Important? Platform engineering can help organizations reap several internal (developers) and external (end users) benefits: Kubernetes Dashboard is an external service developed on top of Kubernetes architecture. Under the hood, the Dashboard uses APIs to read all cluster-wide information for visibility into a single pane. It also uses the APIs to deploy resources and applications into a cluster. Both CLI and Kubernetes Dashboards depend on the kube-API-server to process the requests. To get started with the CLI, the Ops team must deploy the Kubernetes Dashboard in the same cluster (similar to Kubectl deployment). Improved developer experience (DX): The plethora of cloud-native tools increases the cognitive load of developers, as it takes a good amount of time to decide which one to use for their specific use cases and master it. Platform engineering solves this and improves DX by providing a simplified, standardized set of tools and services that suit developers’ unique workflows. Increased productivity: The IDP provides everything developers need to get their code tested and deployed in a self-service manner. This reduces the delays in different stages of SDLC, like waiting for someone to provision the infrastructure to deploy, for example. Platform engineering ensures developer productivity by helping them focus mainly on the core development work. Standardization by design: IT teams use a variety of tooling in a typical software organization, varying from team to team. Maintaining and keeping track of things becomes complex in such a situation. Platform engineering solves this by standardizing the tools and services, and it is easier for them to solve any bottlenecks because the platform is identical for every developer. Faster releases: The platform team ensures developers are working on delivering the business logic by providing toolchains that are easily consumable, reusable, and configurable. Developers are very productive as a result, and it accelerates faster time-to-market for features and innovations reliably and securely. Implementing a successful platform team in an organization and leveraging the above benefits requires following some common principles. Treating the platform as a product is one of them. Platform as a Product One of the core principles of platform engineering is productizing the platform. The platform team needs to employ a product management mindset to design and maintain a platform that is not only user-friendly but meets the expectations and needs of the customers (app developers). It starts with collecting data points around the problems developers have and identifying which area to facilitate. This could improve deployment frequency, reduce the change failure rate, improve reliability and security, improve DX, etc. It is important to note that building a platform is all about building a core product that solves common challenges most teams have. It is not about solving the problems of a single team but providing the product across multiple teams to solve the same set of problems. For example, if multiple teams require the same piece of infrastructure, it makes sense for the platform team to work on that shared piece and distribute it. This idea of reusing the platform and repeatability is crucial as it allows for standardization, consistency, and scalability in application delivery. As in product management, the platform team owns the product, chooses certain metrics, and continues taking customer feedback to improve the user experience. The platform's product roadmap evolves with respect to feedback, and it accommodates changing needs and desires of the customers. Roles and Responsibilities of Platform Engineers The primary role of a platform engineer is to design and maintain a self-service platform (IDP) and provide platform services for developers. It starts with engaging with the developers and understanding their pain points: Listen to the Customers Interview developers and different IT teams to understand their engineering landscape and challenges and to know what they are optimizing for. They may be trying to build an effective CI/CD pipeline or implement better access control, among many other challenges around software delivery. Prioritize Identify common challenges most teams share and prioritize solving them over problems individual teams face. For example, if most teams find it hard to store and retrieve secrets securely, it is ideal to prioritize and solve them for everyone. Platform Designing Design IDP with required tools that would solve those problems for users, along with documentation to enable developers to self-serve resources and infrastructure. Adopting a secret management tool would solve challenges around securely managing secrets in the above case. Part of platform designing also includes writing scripts to automate routine development tasks, such as spinning up new environments and provisioning infrastructure to reduce errors and friction points in the development flow. Metrics Choose specific metrics around the goals to measure the platform's effectiveness. For example, if the goal is to improve DX, the metrics include engagement scores, team feedback, etc. Similarly, the metrics will change if the goal is to reduce the change failure rate or to increase deployment frequency. Gather Feedback and Maintain the Platform Continue listening to the customers and watch the metrics. Gather user feedback to add new tools to the platform and optimize for a better user experience. This also includes staying up-to-date with emerging tools and technologies in the DevOps and cloud infrastructure space and adopting them if necessary. It is easy to confuse the roles of a DevOps engineer or SRE with that of a platform engineer since they all manage the underlying infrastructure and support software development teams. Although there are certain overlapping responsibilities between all these roles, each differs from the others with its unique focus. Platform Engineering vs. DevOps DevOps is a philosophy that brought a cultural shift to SDLC to improve software delivery speed and quality. DevOps facilitated collaboration and communication between development and ops teams and accelerated automation to streamline deployments. Platform engineering — a practice rather than a philosophy — can be considered the next iteration of DevOps as it shares some core principles of DevOps: collaboration (with Ops), continuous improvement, and automation. The daily tasks of a platform team and DevOps differ from each other in some aspects. DevOps use certain tools and automation to streamline getting the code to production, managing it, and observing it using logging and monitoring tools. They mostly work on building an effective CI/CD pipeline. Platform engineers take all the tools used by DevOps and integrate them into a shared platform, which different IT teams can use on an enterprise level. This eliminates the need for teams to configure and manage infrastructure and tooling on their own and saves significant time, effort, and resources. Platform engineers also create the documentation and optimize the platform so developers can self-serve the tools and infrastructure in their workflow. Platform teams are required only in matured companies with many different IT teams using complex tools and infrastructure. Naturally, a dedicated platform team to manage the complexity will become necessary in such an engineering landscape. The platform team builds and manages the infrastructure, helping DevOps speed up continuous delivery. However, it is common for the DevOps team to perform platform engineering tasks (configuring Terraform, for example) in startups. Platform Engineering vs. SRE Site reliability engineers (SREs) focus on ensuring the application is reliable, secure, and always available. They work with developers and Ops teams to create systems or infrastructure that support delivering highly reliable applications. SREs also perform capacity planning and infrastructure scaling and manage and respond to incidents so that the platform meets required service level objectives (SLOs). On the other hand, platform engineering manages complex infrastructure and builds an efficient platform for developers to optimize SDLC. While both work on platforms and their roles sound similar, their goals differ. The major difference between platform engineering and SRE regards whom they face and cater their services to. SREs face end users and ensure the application is reliable and available for them. Platform engineers face internal developers and focus on improving their developer experience. The daily tasks of both teams differ with respect to these goals. Platform engineering provides the underlying infrastructure for rapid application delivery, while SREs do the same to deliver highly reliable and available applications. SREs work more on troubleshooting and incident response, and platform engineers focus on complex infrastructure and enabling developer self-service. To achieve their respective goals, both SREs and platform teams use different tools in their workflows. SREs mostly use monitoring and logging tools like Prometheus or Grafana to detect anomalies in real-time and to set automated alerts. Platform teams work with different sets of tools spanning various stages of the software delivery process, such as container orchestration tools, CI/CD pipeline tools, and IaC tools. All in all, SREs and platform teams work on building a reliable and scalable infrastructure with different goals but with some overlapping between the tools they use. How To Implement Platform Engineering in an Organization A platform team will not be an immediate requirement in a startup with a few engineers. Once the organization grows to multiple IT teams and starts dealing with complex tooling and infrastructure, it is ideal to have platform engineers to manage the complexity. Create the Role (Head/VP of Engineering) Top-level engineers like the VP or Head of Engineering usually create the role of a platform engineer when developers spend more time configuring the tools and infrastructure rather than delivering the business logic. They would find that most IT teams are solving the same problems, like spinning up a new environment, which lags the delivery process. So the Head of Engineering would define the scope of platform engineering, identify the areas of responsibility, and create the role of a platform engineer/team. Create an Internal Developer Platform (Platform Engineers/Team) The platform engineer starts by building the logs of the infrastructure and tools that are already used in the organization. Then they would interview developers and understand their challenges and build the internal developer platform with tools and services that solve problems on an enterprise level. They will build the platform in a way that is flexible and facilitates different architectures and deployment styles. Platform engineers also create documentation and conduct training sessions to help developers self-serve the platform. It is ideal for platform engineers to have a developer background so they know what it is like to be a developer and understand the challenges better. Onboard Users (Application Developers) Once the platform is ready, platform engineers onboard application developers. It will require internal marketing and letting teams know of the platform and what it can solve. The best way to onboard users is to pull them to the platform rather than throw the platform at them. This can be done by starting with a small team and helping them overcome a challenge. For example, help a small team optimize CI/CD pipeline and provide the best experience possible in the process. Word-of-mouth from early adopters will have a positive ripple effect throughout the organization, which will help onboard more users to the platform. Platform engineering does not stop at onboarding the users. It is a continuous process where the platform accommodates emerging tools and technologies and the changing needs and requirements of the users. Conclusion: Platform Engineering With Open-Source Tools Selecting an open-source platform that is built to enable platform engineers with a standardized toolchain that helps developers accelerate software delivery is important. Devtron is one such platform that helps developers by automating CI/CD platform, security, and observability for end-to-end SDLC.
"DevOps is dead." Well, not exactly. But the DevOps methodology of "you build it, you run it" has been failing development teams for years. On this week's episode of Dev Interrupted, we sit down with Kaspar von Grünberg, founder and CEO of Humanitec. Listen as Kaspar explains the significant cognitive load placed on developers as a result of DevOps practices, how that has caused software engineering to be the only industry since Medieval times not to drive towards specialization, and why platform engineers provide a solution to the outdated DevOps model. Episode Highlights: (2:24) What is platform engineering? (7:05) Should VPEs have a platform team right now? (11:29) Difference between SREs and platform teams (17:14) DevOps is dead (19:11) How scale affects team size (26:12) Standardization of the space (28:08) Kaspar's work at Humanitec (32:30) The future of platform engineering Episode Excerpt: Dan: If I'm a VP of engineering right now, listening to this pod, should I have a platform engineering team? Kaspar: Yes. I mean, no, you can always argue a customer has an interest in you having a plethora of engineering teams, but I am looking at the return on investment of these teams. And there, it's so large, you can gain so much from this. There's so much inefficiency in these workflows that, yes, you definitely should have one. And having a platform engineering team doesn't like sounds, you know, more costly if you want that. It is- take a product manager, halftime if you want. But structure this correctly, structure this as a product, find a couple of people that are responsible for this, take them from different groups, you don't need me to rehire, apply these principles, you know, take this on the structure. And you'll see a very, very fast return with fairly low costs. And so definitely, definitely yes. And I want to get back to one of the absolutely correct things you said. We have these fundamentalists shouting at us. You know, everybody has to do everything in context. Otherwise, you're abstracting people away. And those are these. It's this type of thing you can always say. Everybody could always say that they say never restrict developers, never take away from this. Of course not. But that's not the idea. Like platform engineering is not about taking context away, the contrary holds true. It's about providing context. If you're looking at 700 different script formats, that's not context. That's cognitive overload. You don't win anything. And so that is really like, our industry is the only industry that is not actually driving towards specialization, from the medieval ages to now. Every industry has always specialized. We're the only fucking industry in the world that is actually working against specialization. I already have a problem with these fundamentalist approaches or viewpoints that many of them just have never really worked at scale. And scale for me means, like, production-grade two fifty-three, four or five hundred engineers over a longer period of time. And to believe that in these situations, you can just shift everything to everybody is so insanely naive. And then the next argument I always hear is like, Hey, you build it, you run it, Werner Vogels said so, I mean, let's pause and look at the situation where Werner Vogels said that he said in a blog post, 2006. That guy was in the CTO position for a UX director of research for like 12 months before he was a researcher. That guy had never worked at scale. And Amazon's teams, at that point, were a couple of dozen developers. The sentence that this guy said in 2006 says nothing about the reality of a bank in the US East with two and a half thousand developers that are drowning in policy. It's just naive to say that. It doesn't make sense.________ Join Kaspar and I at PlatformCon 2023 on June 8th and 9th — it's virtual and free!
Argo CD is a continuous delivery (CD) tool that has gained popularity in DevOps for performing application delivery onto Kubernetes. It relies on a deployment method known as GitOps. GitOps is a mechanism that can pull the latest code and application configuration from the last known Git commit version and deploy it directly into Kubernetes resources. The wide adoption and popularity are because Argo CD helps developers manage infrastructure and applications lifecycle in one platform. To understand Argo CD in-depth, we must first understand how continuous delivery works for most application teams and how it differs from GitOps. What Is Continuous Delivery? Continuous delivery (CD) is a software delivery process/framework that enables developers to continuously push their code changes into the production servers without accessing the infrastructure through a push-based pipeline. This process reduces the time a code takes to reach its end users and improves release velocity. It is a repeatable process that enables scaling your application to address the growing demands of end users. How Does Continuous Delivery Work? Continuous delivery is a part of the application delivery lifecycle that deploys the application post-build into a resource. In this context, our infrastructure will be Kubernetes without the use of Argo CD. Alternatively, if we use Jenkins, the process will be as follows: To release a feature upgrade or a bug fix to the end users, a pre-configured continuous integration (CI) pipeline will build a Docker image as per the Docker file.Ops Then, a script will push it to the configured Docker repository. To move this image into Kubernetes, the deployment.YAML file will be updated with the new image tag and name for fetching the latest image from the Docker registry. Challenges With CD Into Kubernetes Before the rise of Kubernetes, CD most applications teams enabled CD with tools like Jenkins and Spinnaker. The architecture of Kubernetes is complex, and these tools could not efficiently deploy into Kubernetes and deploy without errors due to the complexity of Kubernetes architecture. These issues made using tools like Jenkins challenging to work with Kubernetes. Challenges with traditional CD while deploying into Kubernetes: Tools Installation and Management One needs to install tools like Kubectl and Helm. These add to the operational activities. Accessing Kubernetes Clusters One must configure access management in the CD tool to enable authorization to Kubernetes clusters and execute changes. If Kubernetes clusters run on a cloud provider, those credentials must be configured and shared outside the cluster. Security and Operational Challenge The configurations will rise in proportion with the increase in clusters which increases operational overhead. While increased operational overheads may not be challenging, it risks the system's security when cluster credentials get shared with external services and tools. Issues While Scaling Infrastructure Each team needs its own set of Kubernetes cluster credentials so that the users can access that specific application resources in a cluster. While operating Kubernetes at scale with a CD tool like Jenkins deploying into multiple clusters needs reconfiguration again for each new cluster. Lack of Visibility When a CD tool deploys an application into Kubernetes without GitOps or a Kubernetes-native CI/CD pipeline, the tool loses visibility into the deployment post applying the configuration to the deployment.YAML files. Once the kubectl command has been executed team must wait until someone reports an incident. Also, the status of execution remains unclear. The continuous delivery into Kubernetes can be made efficient with Argo CD, which works on the principle of GitOps. Before understanding GitOps, let us understand push vs. pull-based CI/CD in the upcoming section on GitOps. What Is GitOps and How Is It Different From Traditional CD? Traditional CD and GitOps differ on the core principles of push and pull-based deployments. Most CI/CD processes work on a push mechanism, which means things move to their respective destination at the trigger of an event. For example, when a developer has finished writing his code, he must execute a set of commands to move his code into the server for deployment. In a Kubernetes environment, the developer has to configure the clusters using tools like Kubectl and Helm in the pipeline to apply changes. Argo is a CD tool that uses a pull-based mechanism. A pull-based CD mechanism means the destination triggers an event to pull the data from the source (Git) to deploy at the destination. Argo CD, which resided inside the cluster for reasons explained later on the blog, pulls the most recent verified version of code into the cluster for deployment. There are a lot of benefits to this model, like improved security and ease of use. This pull-based mechanism is called GitOps, where the source code management systems like Git are treated as the only source of truth for application and configuration data. How Does Argo CD Work? Argo CD works in a reversed flow mechanism as compared to push-style deployments. The new mechanism enables Argo CD to run from inside a Kubernetes cluster. Kubernetes faces challenges with the traditional CD mechanism because CI/CD tools, like Jenkins, sit outside the cluster, whereas Argo CD sits inside the cluster. While inside the cluster, Argo CD pulls changes from Git and applies them to the residing cluster. Instead of pushing changes like older generation tools by being inside the cluster, Argo CD prevents sensitive information from being exposed to other tools outside the Kubernetes cluster and environment. Argo CD can be set up in two simple steps: Deploy the Argo CD agent to the cluster. Configure Argo CD to track a Git repository for changes. When Argo CD monitors change, it automatically applies them to the Kubernetes cluster. When developers commit the new code updates to the Git repository, automated CI pipelines will auto-start the build process and build the container image. Then as per the configurations, the CI pipeline will push and update the Kubernetes manifest files. The pipelines will update the new image version name and details on the deployment.yaml file. Argo CD can track this new update, pull the image, and deploy it onto the target cluster. When the Kubernetes cluster is ready, Argo CD sends a report about the status of the application and that the synchronization is complete and correct. Argo CD also works in the other direction, monitoring changes in the Kubernetes cluster and discarding them if they don’t match the current configuration in Git. Best Practices To Follow While Using Argo CD Separate Git repositories for application source code and app configuration Separate Git repo for system configurations Why Separate Repositories? The main reason for having separate repositories for application source code and app configurations is because app config code is not only present in the deployment file but also in the configmaps, secrets, storage, svc, etc. Kubernetes uses. These files change independently from the source code. When a Developer or DevOps wants to change a service.yaml file, which is an application configuration and not a part of the software code, he has to run the whole CI pipeline to sync those changes into production. The CI pipeline will run only when a code change is updated. Clubbing the app configurations and software code together makes the setup complex and inefficient. So as soon as DevOps changes the config on the git repository, Argo CD will become aware of the changes and update the destination cluster as it constantly monitors the repo as soon as config files change in the Git repository. Benefits of Using Argo CD K8s configurations can be defined as code in the Git repository. Config files are not applied for individual laptops of developers/DevOps. Updates are traceable as tags, branches, or pinned specific manifest versions at Git commits. Same and only interface for updating the cluster Git as the single source of truth Avoid untraceable kubectl command applications Version-controlled changes with audit trail Single sign-on (SSO) with providers such as GitLab, GitHub, Microsoft, OAuth2, OIDC, LinkedIn, LDAP, and SAML 2.0 Support for webhooks triggering actions in GitLab, GitHub, and BitBucket Easy Rollbacks If a new code commit is pushed into the Git repository and the changes are applied to the cluster with Argo CD auto sync, the cluster fails. DevOps can revert to the previous working state from the list of the last known best repository versions, just like a Windows restore point. Also, one need not process the laborious process of manually riveting every cluster and doing a clean-up, as Argo CD will do that all by itself. Avoid Snowflake Clusters With Argo CD Argo CD is watching the Git repository and changes in the cluster. Anytime a change happens in the Git repository or the cluster, Argo CD will compare the two states to check for any misconfigurations or differences. If they don't match the desired state defined in the Git repo with the actual state of the cluster, Argo CD will become active and quickly sync the cluster as described in the cluster. So, in that case, if someone goes and makes manual changes in the cluster, Argo CD will detect it and sync to the desired state. It overrides the manual changes. The automatic override helps the system stay stable and guarantees that the Git repository is the only source of truth at any point in time. It also provides full transparency of the cluster and lets it become a Snowflake cluster. Recreate Clusters From Scratch When a cluster completely crashes, and one has to build it from scratch, Argo CD can be pointed to the Git repository, where the complete cluster configuration is defined. It will recreate the same state as the previous one. This is a fully autonomous process where developers and DevOps do not have to worry about disaster recovery post-clean-up processes. This is possible because Argo CD accepts the cluster configuration as code in a declarative way. What Does Argo CD Provide Over Standard GitOps Benefits? Better Team Collaboration With Easy Access Control Production clusters must have limited access, and only some of your team members should be allowed access. To configure different access rules to these clusters, Argo CD can enable approvals for pull requests for authorized developers and engineers. This helps in managing the cluster permissions. No need to create a cluster role and user account on Kubernetes. Non-human users like CI/CD tools or other peripheral tools in the DevOps ecosystem outside the cluster can be configured in Argo CD, which resides inside the cluster. This architecture of Argo CD ensures that those credentials remain inside the cluster making the system robust and secure. Argo CD Architecture Overview Argo CD is a Kubernetes-native CD tool that supports and reads various Kubernetes manifest files such as YAML, Ksonnet, Jsonnet, Helm charts, and Kustomize. It can follow updates to branches, tags, or pinned to a specific version of manifests at a Git commit. Argo CD control plane consists of three main components: Application Controller API Server Repository Service Application Controller Argo CD can detect changes in Git and sync with the repo due to the Application Controller component. This feature enables syncing out-of-date or modified destination configurations to the last approved Git version. The application controller syncs between the local cache created by the repo service and the Git repository because it is a less resource-intensive process. The application controller can also be configured to accept direct changes in code and configuration at the destination without reverting back to the last known configuration on Git. This authority must be granted only to a selected team resource. When this direct change is made, it notifies the DevOps and developers about the difference in the configuration to update the Git repositories. API Server The API server, like a Kubernetes API server, is a service to expose the components of Kubernetes and Argo CD to interfaces like a CLI or web GUI or other third-party tools. The APIs are primarily used to carry out functionalities like application deployment and management, executing rollback of any user-defined actions, managing cluster credentials stored in K8s secrets, and enforcing RBAC Git webhooks. Repository Service Accessing the Git repository is always time-consuming for Argo CD; accessing the Git repo every time will constitute a pull request. Hence, this internal Repository Service makes a local cache of the application manifest and Git repositories on the Kubernetes cluster a replica of the Git repository. It is responsible for generating and returning Kubernetes manifests on input data like repository URL, Git revisions (i.e., branch, tags), application path, and template-specific settings; the server generates Kubernetes manifests.
DevSecOps, in layman's language, is a combined form of software development, security, and software operations. According to Gartner's research, "It is estimated that at least 95% of cloud security failures through 2022 will be the fault of the enterprise". Therefore, while developing any application, the developer must not have loose ends that may make an enterprise vulnerable to such attacks. Similarly, DevSecOps is understanding the software and learning to code while learning to operate and maintain that code at the same time. It is essential to keep in mind that a single security breach can lead customers to lose confidence in any business. Therefore, it is vital to prioritize the maintenance of rigorous security measures. DevSecOps involves integrating security into both application development and operations, as well as promoting collaboration between teams and leveraging automation and tooling to construct robust and secure applications. In the DevSecOps approach, security is addressed proactively during the development process rather than as an afterthought. Security testing and bug fixing are integrated into the development cycle to detect security vulnerabilities early in the software development life cycle. This approach facilitates innovation, boosts developer velocity, and allows for speedy release cycles while maintaining a focus on security. DevSecOps has proven beneficial for achieving faster development, more rapid feature releases, and the implementation of agile practices. By integrating security into the development process from the start, DevSecOps helps to reduce the risk of security breaches and other cybersecurity threats that can be costly to organizations in terms of reputation, legal liabilities, and financial losses. DevSecOps also helps to promote a culture of collaboration and communication between teams, which can lead to better alignment of security and development objectives. It can also help organizations to meet compliance requirements, as security is integrated into the development process from the beginning. Overall, DevSecOps is essential for any organization that wants to build secure, reliable, and high-quality software products that meet the needs of their customers while minimizing security risks. Despite the progress that enterprises have made in adopting modern business practices, such as transitioning to cloud providers and utilizing agile frameworks, DevSecOps is frequently overlooked by stakeholders in terms of organizational priority. There is often a lack of a clear framework for DevSecOps initiatives that executives can readily support. Gartner forecasts that up until 2022, 75% of DevOps initiatives will fail to meet expectations due to difficulties in organizational learning and implementing changes. Adopting DevSecOps The process of enterprises implementing DevSecOps is a long-term undertaking that spans multiple years, and initiating it early can be advantageous for the organization in the long run. While there is a wealth of resources available to promote awareness of DevOps and its benefits, there is comparatively less information available on DevSecOps and a comprehensive framework that organizations can use to smoothly integrate DevSecOps into their operations. Below are some of the key steps to keep in mind as you begin your DevSecOps journey. 1. Do We Need DevSecOps? The initial step in embarking on a DevSecOps journey is to gain a complete understanding of what DevSecOps entails and why it is necessary. Once this comprehension is established, the focus should shift to assessing who will benefit from the adoption of DevSecOps and how. This will necessitate a thorough evaluation of the business use case, available resources, and the organization's current pain points. During this phase, it is essential to be transparent about any existing technical debt, defects, and bugs, as this will aid in identifying areas for improvement and opportunities to pinpoint the root cause of defects. This process will enable the identification of gaps in current applications and processes and provide insights to evaluate available opportunities. 2. How Do We Promote/Champion It? The subsequent step involves identifying a group of ambassadors or champions who are aligned with and committed to the DevSecOps mission within the organization. This presents an opportunity to recruit enthusiastic individuals who can bring fresh perspectives to the table. Multiple channels should be utilized to promote the opportunity throughout the organization to attract a diverse group of individuals, including engineers, operations personnel, security specialists, testers, and managers. Ideal candidates should be motivated, eager to learn, adaptable to ambiguity, and adept team players. This cohort will assist in bringing the DevSecOps mission to life and advocating for the cause, facilitating more widespread DevSecOps adoption throughout the enterprise. 3. DevSecOps Strategy To implement an organization-wide change with DevSecOps, creating a DevSecOps strategy is crucial. This involves instilling a "security-first" mindset in all individuals involved and incorporating security best practices from the beginning. When developing the strategy, it is important to consider priorities, the cost of time and resources, and ensure that efforts are time-bound for successful implementation. The strategy serves to establish alignment on what needs to be achieved as part of the DevSecOps adoption. 4. Leadership Buy-In Leadership and executives hold a significant responsibility in promoting and embracing DevSecOps, and it is crucial to obtain their buy-in to ensure no other business objective or key result hinders the adoption of DevSecOps in the organization. This is where you present the DevSecOps strategy to the leadership and inform them of the initial setup costs in terms of time, money, and resources. At the same time, it is an opportunity to educate them about the long-term benefits and their impact on the organization. 5. Implementation Phase The real work begins in the execution phase, where time is of the essence. It is crucial to start small, gather feedback, and iterate. During this phase, the available tools should be evaluated to help expedite the adoption process. 6. Success Criteria and Measurement Feedback is a crucial aspect of growth in life, starting from childhood, where we receive continuous feedback from our parents that helps us learn and improve our skills. Similarly, in a DevSecOps environment, it is essential to have a system in place for constant feedback to facilitate continuous improvement. Tools such as alarms, dashboards, and monitoring through alerts can help audit applications and detect issues proactively. Additionally, establish an ongoing feedback mechanism, such as quarterly Agile retrospectives or surveys for employees to provide feedback. It is essential to have governance and guardrails in place to measure the success of DevSecOps adoption. The steps above serve as a roadmap to enable organizations to successfully implement DevSecOps and create secure software right from the outset. Short-term investments made in DevSecOps can yield long-term benefits such as the ability to release better, faster, and more secure products to customers. Continuous feedback mechanisms can facilitate ongoing improvement and iteration. By utilizing DecSecOps, organizations can avoid common pitfalls associated with it and ensure that the integration of DevSecOps represents a cultural shift in their development processes rather than a one-time effort.
Boris Zaikin
Senior Software Cloud Architect,
Nordcloud GmBH
Pavan Belagatti
Developer Advocate,
Harness
Nicolas Giron
Site Reliability Engineer (SRE),
KumoMind
Alireza Chegini
DevOps Architect / Azure Specialist,
Smartwyre