Performance Resources

DZone's Featured Performance Resources

Improve Application Latency With Read Replicas Using YugabyteDB [Video]

By Denis Magda CORE

Scalability and low latency are crucial for any application that relies on real-time data. One way to achieve this is by storing data closer to the users. In this post, we'll discuss how you can use YugabyteDB and its read replica nodes to improve the read latency for users across the globe. Whether you prefer reading or watching, let’s walk through a practical example using an application that streams market orders to see how you can achieve low latency for reads regardless of a user's location. YugabyteDB Managed Cluster Let's say you have a YugabyteDB Managed cluster consisting of three nodes running in the United States. Next, assume the following: You start the application that connects to this cluster and streams market orders into the database. A user from the United States opens a database connection and queries for the most popular stocks. In this case, the latency will be around 20 milliseconds. However, if a different user from Southeast Asia sends the same query, the latency will be much higher—around 190 milliseconds. The increased latency for users from distant locations, such as Southeast Asia, can negatively impact the experience. One of the options to tackle this issue is by storing data closer to the users with read replicas. Deploying Read Replica Read replicas store a full or partial copy of the primary database. Directing read queries from the users to the nearest read replica makes it possible to improve the read latency significantly. Let’s add a replica node to the USA-based YugabyteDB Managed cluster to demonstrate how replicas work in practice. The replica will be added to the asia-east1 region. Follow these steps to set up your replica node: Navigate to your YugabyteDB Managed cluster dashboard. Click on "Configure Read Replicas." Select the "asia-east1" region. Confirm the configuration. The replica node in Southeast Asia will be ready within a few minutes. Once the node becomes active in the specified region, it synchronizes with the main database to keep the data consistent. Connecting Users to Nearby Replica Nodes The next step is to connect the users from Southeast Asia to the newly deployed replica node in the asia-east1 region. Once that’s done, it's time to compare the latencies. Now, when users from Southeast Asia query for the most popular stocks again, the latency drops dramatically to 20 milliseconds – similar to the latency for the user from the United States. As a result, users from both locations will have a comparably better user experience thanks to the improved latency. Summary Read replicas can significantly improve the read latency for users in distant locations. By deploying replica nodes in different regions and connecting users to their nearest replica, you can match the experience for all of your users, regardless of where they are located. More

Reducing Network Latency and Improving Read Performance With CockroachDB and PolyScale.ai

By Artem Ervits CORE

Motivation CockroachDB makes multi-region simple. When CockroachDB spans geographically, we often need to add a minimum of two more regions for a multi-region cluster. This unique capability has many strengths but it comes at a cost. Oftentimes, our customers demand CockroachDB be available in regions where we see low demand, and bringing those regions online is not cost-effective to the organization. As of this writing, we support the most popular regions in GCP and AWS; some regions are not exposed in the cloud console but are available via support ticket. PolyScale operates a global network of PoPs (Points of Presence). Think of PoPs as regional database connections. This versatility provides a cost-effective solution to reduce global network latency by bringing the database closer to the end user. The network of PoPs spans multiple cloud providers, thereby bridging the gap between cloud providers. PolyScale complements CockroachDB in the way that CockroachDB can be more accessible in many more geographic locations and many other cloud providers than provided out of the box by Cockroach Cloud. According to their website: "PolyScale is a Platform as a Service that makes data-driven apps faster by simplifying global data distribution and caching. PolyScale caches database data locally to users reducing latency, accelerating read performance, and scaling throughput." PolyScale provides several benefits: Increased query performance Lowered global latency Reduced database infrastructure costs Increase engineering productivity High availability and fault tolerance I will be evaluating this product for a customer use case. This is attractive because if it works, the customer can benefit from lowered latency, increased performance, and reduced infrastructure costs. They already have a multi-region cluster with industry best uptime SLA and resiliency story and now they just need faster reads in many more locations. They will also benefit from increased productivity, as PolyScale is very easy to get started with. I will be looking at each benefit individually and share my opinion. That said, I want to thank the PolyScale team as they've been invaluable in my research. They've shown partnership, patience, and an open mind in my effort to make this work. High-Level Steps Provision a CockroachDB cluster Create a PolyScale account and cache Verify setup Verify network latency claims Verify inserts YCSB Workload B YCSB Workload C PolyScale Self-Hosted PgBench Workload Conclusion Step-By-Step Instructions Start a Cluster I am using a CockroachDB Serverless cluster you can provision with a generous free tier. I intentionally created a cluster in GCP's europe-west1 region to test out the PolyScale promise of reduced network latency. I am accessing the cluster from North Jersey. Create a PolyScale Account and Cache Once you have a working CockroachDB cluster, you can navigate to the PolyScale website and sign up for their service. Once the account is set up, create a cache with your database information. Once you create, you should see a prompt with the new connection information. PolyScale relies on the application_name to identify the cache ID. You can see the application_name session variable in the connection string. Verify Setup The first thing it does is run a network test across all of the available PoP endpoints. The network test attempts to establish a TCP connection to each endpoint. Alternatively, you can navigate to the Connect page and run the test manually. Verify Network Latency Claims I would like to run a network latency test using a SELECT 1 query. The goal is to verify the Lowered global latency claim. If I run the query against the cluster directly using a Cockroach client: artem@artem-read-replicas-7698.8nj.cockroachlabs.cloud:26257/defaultdb> select 1; ?column? ------------ 1 Time: 78ms total (execution 1ms / network 78ms) After several attempts, I get a consistent 78-80ms result. If I query the cluster via PolyScale, I have to execute the query several times before I start seeing speed up. Let's leave the execution latency aside for now. Time: 99ms total (execution 452ms / network 99ms) artem@psedge.global:5432/defaultdb> select 1; ?column? ------------ 1 Time: 8ms total (execution 452ms / network 8ms) artem@psedge.global:5432/defaultdb> select 1; ?column? ------------ 1 Time: 10ms total (execution 452ms / network 10ms) We reduced the network round trip from 80ms to about 10ms. The PoP endpoint is in Cliffton New Jersey, which is about 20 miles away from me. I want to test this with the psql client as well. Using the psql client, we have to make sure to turn the timing on, i.e. \timing: psql "host=artem-read-replicas-7698.8nj.cockroachlabs.cloud sslmode=require port=26257 user='artem' dbname='defaultdb'" defaultdb=> \timing Timing is on. defaultdb=> select 1; ?column? ---------- 1 (1 row) Time: 89.312 ms defaultdb=> select 1; ?column? ---------- 1 (1 row) Time: 82.807 ms defaultdb=> select 1; And using PolyScale: Time: 93.463 ms defaultdb=> select 1; ?column? ---------- 1 (1 row) Time: 51.222 ms defaultdb=> select 1; ?column? ---------- 1 (1 row) Time: 25.901 ms defaultdb=> select 1; ?column? ---------- 1 (1 row) Time: 34.170 ms defaultdb=> select 1; ?column? ---------- 1 (1 row) Time: 36.517 ms The results are certainly in favor of PolyScale. I feel optimistic that PolyScale can deliver reduced latencies at lower costs. This however is not a conclusive test and we need to look further. Verify Inserts Given PolyScale's strengths are in scaling reads, I was curious how writes work using the product. I was happy to see that inserts still work even though we're reading from a cache. That is advantageous compared to CockroachDB follower reads, as we don't have to have read-only transactions to access reads. According to PolyScale, they need to see the writes for cache eviction. Inserting five records directly yields the following result: artem@artem-read-replicas-7698.8nj.cockroachlabs.cloud:26257/defaultdb> insert into test (val) values (1), (2), (3), (4), (5) returning id; id ---------------------------------------- c32b396e-4c6a-4fa4-ba6a-88ff7f8ade78 e7f87e27-a64d-4c52-b8ba-a31ac2cb9f0c b63dc86b-caef-4cf8-a0bd-d93963077538 bbba98cf-e2a1-45eb-9f3b-c8b8ebda7082 812c002f-5bac-462c-892a-9ad4d4674e04 (5 rows) Time: 103ms total (execution 5ms / network 98ms) Using PolyScale: fca50ae8-e264-4dba-bc8c-256b067e0c64@psedge.global:5432/ycsb> insert into test (val) values (1), (2), (3), (4), (5) returning id; id ---------------------------------------- b9b78909-8c14-4a8b-bbaf-7729e565d99d d7e761f6-48b1-4587-9b72-ee21cb896786 47d3a388-72ef-4527-b3d1-b56d55eb2d28 680da903-b811-4949-a7a9-cf4600a775cb 50d81939-2a80-45d2-88d0-1f6f86f2de60 (5 rows) Time: 102ms total (execution 6ms / network 95ms) I was pleasantly surprised that insert speeds are not impacted, considering there is another hop to a PoP location before it hits the gateway node. YCSB Workload B The next thing I want to validate is scaling reads in a mixed workload environment. To be completely transparent, it took me a few days to grasp the full potential of the platform. Originally, I was planning to test PolyScale using the pgbench workload generator. Still, unfortunately, PolyScale does not work well with mixed workload scenarios where a large portion of the workload is write and read queries to generate unique fingerprints, meaning our cache hit ratio is very low. PolyScale is a caching tool and thereby we need to focus on read-heavy workloads where a large number of queries have their values in the cache. That said, I switched my focus to the ycsb workload. There are several workloads to choose from, and the ones we are going to look at are workloads B and C. Workload B has a 95/5 percent ratio of reads to writes. We don't always have a choice of workloads when we talk with customers and typically those workloads are mixed. In my case, a 95/5 ratio fits my customer use case. This test can also validate the applicability of PolyScale in the face of writes. What use is it to us if we cannot write to the database? The first thing we're going to do is save the connection URL in an environment variable. export DATABASE_URL="postgresql://artem:password@artem-read-replicas-7698.8nj.cockroachlabs.cloud:26257/ycsb?sslmode=verify-full" We then initialize the workload: cockroach workload init ycsb \ --data-loader IMPORT \ --drop \ $DATABASE_URL We can now capture the baseline using our CockroachDB Serverless cluster directly. cockroach workload run ycsb \ --duration=5m \ --display-every=5s \ --display-format=simple \ --concurrency=100 \ --tolerate-errors \ --workload B \ $DATABASE_URL I230510 15:23:42.835123 1 workload/cli/run.go:460 [-] 3 creating load generator... done (took 4m50.264092916s) _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 5.0s 0 14.1 833.2 113.2 125.8 385.9 604.0 read 5.0s 0 0.8 47.0 113.2 130.0 134.2 436.2 update 10.0s 0 851.5 842.3 109.1 125.8 243.3 570.4 read 10.0s 0 44.0 45.5 113.2 125.8 134.2 419.4 update 15.0s 0 857.2 847.2 109.1 121.6 419.4 436.2 read 15.0s 0 53.4 48.1 109.1 130.0 419.4 419.4 update 20.0s 0 811.2 838.2 117.4 130.0 419.4 838.9 read 20.0s 0 42.0 46.6 117.4 142.6 419.4 436.2 update 25.0s 0 866.2 843.8 109.1 125.8 218.1 520.1 read 25.0s 0 42.8 45.8 109.1 130.0 142.6 453.0 update 30.0s 0 871.3 848.4 109.1 121.6 130.0 503.3 read 30.0s 0 42.0 45.2 113.2 125.8 234.9 419.4 update 35.0s 0 860.9 850.2 109.1 125.8 251.7 536.9 read 35.0s 0 50.4 45.9 109.1 125.8 134.2 436.2 update 40.0s 0 816.7 846.0 109.1 121.6 419.4 637.5 read 40.0s 0 43.8 45.7 113.2 125.8 419.4 436.2 update 45.0s 0 876.1 849.3 109.1 121.6 201.3 570.4 read 45.0s 0 44.0 45.5 113.2 125.8 134.2 419.4 update 50.0s 0 910.8 855.5 100.7 121.6 125.8 436.2 read 50.0s 0 50.2 46.0 104.9 121.6 130.0 419.4 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 55.0s 0 843.7 854.4 113.2 125.8 142.6 637.5 read 55.0s 0 40.2 45.4 113.2 130.0 134.2 142.6 update 60.0s 0 825.4 852.0 113.2 125.8 419.4 453.0 read 60.0s 0 43.6 45.3 117.4 130.0 419.4 436.2 update 65.0s 0 824.8 849.9 113.2 125.8 419.4 536.9 read 65.0s 0 47.4 45.4 117.4 125.8 419.4 503.3 update 70.0s 0 889.9 852.8 104.9 121.6 125.8 536.9 read 70.0s 0 45.0 45.4 109.1 125.8 130.0 503.3 update 75.0s 0 811.4 850.0 113.2 121.6 419.4 771.8 read 75.0s 0 49.4 45.7 117.4 130.0 402.7 419.4 update 80.0s 0 835.2 849.1 109.1 121.6 419.4 838.9 read 80.0s 0 46.6 45.7 113.2 125.8 130.0 503.3 update 85.0s 0 847.1 849.0 109.1 125.8 419.4 671.1 read 85.0s 0 41.4 45.5 113.2 130.0 251.7 436.2 update 90.0s 0 844.8 848.7 109.1 117.4 402.7 453.0 read 90.0s 0 49.8 45.7 113.2 121.6 130.0 130.0 update 95.0s 0 847.7 848.7 109.1 125.8 419.4 671.1 read 95.0s 0 39.0 45.4 113.2 130.0 436.2 536.9 update 100.0s 0 850.8 848.8 113.2 121.6 209.7 536.9 read 100.0s 0 46.0 45.4 113.2 125.8 142.6 503.3 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 105.0s 0 825.1 847.7 113.2 125.8 419.4 671.1 read 105.0s 0 44.2 45.3 117.4 130.0 142.6 419.4 update 110.0s 0 875.5 848.9 109.1 125.8 201.3 771.8 read 110.0s 0 43.4 45.3 113.2 130.0 134.2 142.6 update 115.0s 0 830.5 848.1 113.2 134.2 419.4 520.1 read 115.0s 0 41.2 45.1 113.2 134.2 142.6 419.4 update 120.0s 0 840.9 847.8 113.2 125.8 130.0 637.5 read 120.0s 0 47.2 45.2 113.2 130.0 134.2 134.2 update 125.0s 0 852.4 848.0 109.1 125.8 130.0 453.0 read 125.0s 0 42.2 45.0 109.1 125.8 134.2 436.2 update 130.0s 0 834.4 847.5 113.2 125.8 352.3 570.4 read 130.0s 0 47.2 45.1 113.2 130.0 159.4 436.2 update 135.0s 0 830.0 846.8 113.2 125.8 419.4 453.0 read 135.0s 0 45.4 45.1 117.4 130.0 151.0 436.2 update 140.0s 0 838.7 846.5 109.1 125.8 419.4 503.3 read 140.0s 0 46.6 45.2 113.2 134.2 142.6 453.0 update 145.0s 0 840.0 846.3 113.2 121.6 130.0 536.9 read 145.0s 0 42.6 45.1 113.2 125.8 134.2 436.2 update 150.0s 0 876.3 847.3 109.1 121.6 402.7 453.0 read 150.0s 0 43.2 45.0 113.2 125.8 419.4 436.2 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 155.0s 0 861.4 847.8 109.1 125.8 243.3 503.3 read 155.0s 0 40.8 44.9 113.2 134.2 419.4 453.0 update 160.0s 0 824.2 847.0 113.2 125.8 419.4 453.0 read 160.0s 0 43.4 44.9 113.2 130.0 419.4 436.2 update 165.0s 0 829.1 846.5 109.1 125.8 419.4 536.9 read 165.0s 0 47.4 44.9 113.2 130.0 436.2 436.2 update 170.0s 0 855.5 846.8 113.2 125.8 142.6 520.1 read 170.0s 0 52.0 45.1 113.2 130.0 142.6 436.2 update 175.0s 0 839.6 846.5 113.2 125.8 419.4 469.8 read 175.0s 0 39.0 45.0 113.2 130.0 159.4 436.2 update 180.0s 0 855.5 846.8 113.2 125.8 134.2 436.2 read 180.0s 0 46.2 45.0 113.2 130.0 436.2 436.2 update 185.0s 0 851.7 846.9 109.1 125.8 251.7 536.9 read 185.0s 0 47.0 45.1 109.1 125.8 134.2 419.4 update 190.0s 0 832.2 846.5 113.2 121.6 419.4 570.4 read 190.0s 0 43.4 45.0 113.2 125.8 419.4 453.0 update 195.0s 0 818.6 845.8 113.2 125.8 419.4 536.9 read 195.0s 0 42.4 44.9 117.4 130.0 419.4 536.9 update 200.0s 0 827.4 845.4 113.2 125.8 419.4 637.5 read 200.0s 0 43.6 44.9 113.2 130.0 419.4 436.2 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 205.0s 0 835.4 845.1 113.2 125.8 419.4 453.0 read 205.0s 0 43.0 44.9 117.4 134.2 419.4 436.2 update 210.0s 0 854.0 845.3 109.1 121.6 125.8 436.2 read 210.0s 0 42.4 44.8 113.2 125.8 142.6 419.4 update 215.0s 0 824.3 844.8 113.2 125.8 419.4 671.1 read 215.0s 0 42.6 44.8 117.4 130.0 436.2 520.1 update 220.0s 0 852.2 845.0 113.2 125.8 142.6 453.0 read 220.0s 0 43.4 44.7 117.4 130.0 402.7 419.4 update 225.0s 0 860.7 845.4 109.1 121.6 402.7 520.1 read 225.0s 0 42.4 44.7 113.2 121.6 134.2 436.2 update 230.0s 0 832.2 845.1 113.2 125.8 419.4 453.0 read 230.0s 0 40.6 44.6 117.4 130.0 419.4 436.2 update 235.0s 0 807.4 844.3 113.2 125.8 419.4 503.3 read 235.0s 0 43.4 44.6 117.4 130.0 419.4 453.0 update 240.0s 0 886.2 845.1 109.1 121.6 125.8 704.6 read 240.0s 0 46.0 44.6 109.1 125.8 130.0 436.2 update 245.0s 0 881.4 845.9 109.1 121.6 125.8 671.1 read 245.0s 0 46.6 44.6 109.1 125.8 134.2 419.4 update 250.0s 0 850.0 846.0 109.1 125.8 159.4 453.0 read 250.0s 0 46.6 44.7 113.2 130.0 151.0 159.4 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 255.0s 0 864.9 846.3 109.1 125.8 134.2 671.1 read 255.0s 0 46.6 44.7 113.2 125.8 402.7 436.2 update 260.0s 0 839.1 846.2 113.2 125.8 419.4 503.3 read 260.0s 0 47.0 44.7 113.2 134.2 436.2 436.2 update 265.0s 0 865.6 846.6 100.7 125.8 402.7 536.9 read 265.0s 0 47.8 44.8 109.1 130.0 167.8 419.4 update 270.0s 0 841.0 846.5 113.2 130.0 419.4 453.0 read 270.0s 0 46.4 44.8 117.4 130.0 134.2 453.0 update 275.0s 0 890.1 847.3 109.1 121.6 151.0 671.1 read 275.0s 0 43.8 44.8 113.2 125.8 419.4 503.3 update 280.0s 0 936.7 848.9 100.7 117.4 121.6 637.5 read 280.0s 0 48.8 44.9 104.9 117.4 125.8 130.0 update 285.0s 0 920.8 850.1 100.7 117.4 130.0 503.3 read 285.0s 0 49.4 45.0 104.9 121.6 130.0 402.7 update 290.0s 0 927.4 851.5 100.7 117.4 125.8 503.3 read 290.0s 0 47.6 45.0 104.9 117.4 125.8 130.0 update 295.0s 0 925.4 852.7 104.9 117.4 121.6 503.3 read 295.0s 0 47.0 45.0 109.1 121.6 125.8 226.5 update 300.0s 0 939.3 854.1 100.7 117.4 121.6 604.0 read 300.0s 0 54.0 45.2 104.9 117.4 243.3 402.7 update _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 256245 854.1 111.1 109.1 125.8 402.7 838.9 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 13559 45.2 113.7 113.2 130.0 402.7 536.9 update _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 269804 899.3 111.2 109.1 125.8 402.7 838.9 The majority of the read queries are lookups on the key. The table schema looks like so: table_name | create_statement -------------+------------------------------------------------------------ usertable | CREATE TABLE public.usertable ( | ycsb_key VARCHAR(255) NOT NULL, | field0 STRING NOT NULL, | field1 STRING NOT NULL, | field2 STRING NOT NULL, | field3 STRING NOT NULL, | field4 STRING NOT NULL, | field5 STRING NOT NULL, | field6 STRING NOT NULL, | field7 STRING NOT NULL, | field8 STRING NOT NULL, | field9 STRING NOT NULL, | CONSTRAINT usertable_pkey PRIMARY KEY (ycsb_key ASC), | FAMILY fam_0_ycsb_key (ycsb_key), | FAMILY fam_1_field0 (field0), | FAMILY fam_2_field1 (field1), | FAMILY fam_3_field2 (field2), | FAMILY fam_4_field3 (field3), | FAMILY fam_5_field4 (field4), | FAMILY fam_6_field5 (field5), | FAMILY fam_7_field6 (field6), | FAMILY fam_8_field7 (field7), | FAMILY fam_9_field8 (field8), | FAMILY fam_10_field9 (field9) | ) This is beneficial to using PolyScale as we can cache these lookups. We can now run the same workload against the PolyScale cache: Let's store the connection URL in an environment variable just like before. export DATABASE_URL="postgres://artem:password@psedge.global:5432/ycsb?application_name=<application_name_hash>" Run the workload: I230510 18:44:55.372022 1 workload/cli/run.go:622 [-] 1 random seed: 13993795009590803103 I230510 18:44:55.372060 1 workload/cli/run.go:429 [-] 2 creating load generator... W230510 18:44:55.553757 1 workload/cli/run.go:438 [-] 3 retrying after error while creating load: failed to initialize the load generator: pq: User unknown to PolyScale W230510 18:44:55.707697 1 workload/cli/run.go:438 [-] 4 retrying after error while creating load: failed to initialize the load generator: pq: User unknown to PolyScale W230510 18:44:55.948552 1 workload/cli/run.go:438 [-] 5 retrying after error while creating load: failed to initialize the load generator: pq: User unknown to PolyScale As indicated above, PolyScale identifies a user with the application_name parameter set to the cacheId for the cache. If you are using a client where you cannot change the application_name parameter, you need to instead create a database user that has a username of the cacheId. Otherwise, you will see an error that says User unknown to PolyScale. In our case, the ycsb workload has application_name hard coded to ycsb. The common workaround in any situation is to either hardcode the cache ID or change the SQL username to the cache ID. With the new user created, then we can change the environment variable. export DATABASE_URL="postgres://your_cache_id:password@psedge.global:5432/ycsb" To make things a bit more interesting, let's look at the PolyScale cache summary page to understand the effectiveness of the cache. I've been testing the product for the past couple of days, and the summary page was cluttered with old information. Today the product does not allow me to clear the stats and I created a new cache specifically for this test. Once we run queries against it, it will update with the necessary information. Finally, we can run the YCSB Workload B: cockroach workload run ycsb \ --duration=5m \ --display-every=5s \ --display-format=simple \ --concurrency=100 \ --tolerate-errors \ --workload B \ $DATABASE_URL I230510 15:35:40.198227 1 workload/cli/run.go:460 [-] 3 creating load generator... done (took 5m4.394515542s) _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 5.0s 0 15.4 954.8 104.9 151.0 192.9 352.3 read 5.0s 0 0.8 52.0 109.1 151.0 192.9 243.3 update 10.0s 0 1040.9 997.7 100.7 159.4 335.5 738.2 read 10.0s 0 51.8 51.9 104.9 151.0 234.9 369.1 update 15.0s 0 962.6 986.0 109.1 184.5 318.8 939.5 read 15.0s 0 47.0 50.3 125.8 192.9 335.5 369.1 update 20.0s 0 1406.7 1091.1 92.3 121.6 243.3 536.9 read 20.0s 0 70.2 55.2 100.7 134.2 268.4 369.1 update 25.0s 0 1581.2 1189.2 31.5 117.4 134.2 251.7 read 25.0s 0 84.0 61.0 104.9 125.8 142.6 159.4 update 30.0s 0 1607.8 1258.9 28.3 113.2 159.4 268.4 read 30.0s 0 83.6 64.8 100.7 134.2 192.9 201.3 update 35.0s 0 1297.6 1264.5 46.1 142.6 318.8 1073.7 read 35.0s 0 70.2 65.5 117.4 151.0 318.8 335.5 update 40.0s 0 1554.3 1300.7 37.7 134.2 226.5 520.1 read 40.0s 0 78.4 67.1 109.1 159.4 318.8 503.3 update 45.0s 0 1966.0 1374.6 23.1 109.1 125.8 318.8 read 45.0s 0 103.8 71.2 100.7 121.6 134.2 318.8 update 50.0s 0 1579.2 1395.1 29.4 113.2 192.9 251.7 read 50.0s 0 82.8 72.4 100.7 130.0 218.1 251.7 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 55.0s 0 939.3 1353.7 100.7 117.4 130.0 226.5 read 55.0s 0 49.8 70.3 104.9 125.8 134.2 134.2 update 60.0s 0 917.0 1317.3 100.7 121.6 184.5 260.0 read 60.0s 0 48.4 68.5 104.9 142.6 192.9 243.3 update 65.0s 0 751.1 1273.7 113.2 234.9 318.8 704.6 read 65.0s 0 38.4 66.2 121.6 285.2 302.0 771.8 update 70.0s 0 839.4 1242.7 109.1 151.0 285.2 369.1 read 70.0s 0 40.6 64.4 113.2 142.6 151.0 285.2 update 75.0s 0 933.3 1222.1 100.7 117.4 142.6 243.3 read 75.0s 0 46.4 63.2 104.9 121.6 151.0 167.8 update 80.0s 0 933.7 1204.0 100.7 121.6 142.6 318.8 read 80.0s 0 45.2 62.0 104.9 121.6 142.6 184.5 update 85.0s 0 892.1 1185.7 104.9 121.6 134.2 352.3 read 85.0s 0 51.4 61.4 109.1 125.8 142.6 142.6 update 90.0s 0 644.4 1155.6 142.6 285.2 318.8 738.2 read 90.0s 0 32.2 59.8 142.6 243.3 302.0 335.5 update 95.0s 0 748.8 1134.2 125.8 184.5 209.7 352.3 read 95.0s 0 36.4 58.6 125.8 176.2 209.7 268.4 update 100.0s 0 872.0 1121.1 109.1 134.2 159.4 251.7 read 100.0s 0 44.6 57.9 113.2 130.0 142.6 176.2 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 105.0s 0 904.4 1110.8 104.9 125.8 142.6 335.5 read 105.0s 0 50.0 57.5 109.1 130.0 151.0 167.8 update 110.0s 0 844.1 1098.7 109.1 142.6 159.4 335.5 read 110.0s 0 47.6 57.0 113.2 134.2 159.4 268.4 update 115.0s 0 678.4 1080.4 130.0 285.2 318.8 671.1 read 115.0s 0 35.8 56.1 134.2 201.3 318.8 335.5 update 120.0s 0 830.8 1070.0 113.2 142.6 201.3 352.3 read 120.0s 0 40.0 55.4 113.2 134.2 159.4 176.2 update 125.0s 0 866.8 1061.9 109.1 151.0 285.2 352.3 read 125.0s 0 42.4 54.9 109.1 159.4 302.0 318.8 update 130.0s 0 874.6 1054.7 109.1 134.2 167.8 260.0 read 130.0s 0 43.2 54.5 113.2 142.6 159.4 167.8 update 135.0s 0 806.2 1045.5 109.1 167.8 318.8 671.1 read 135.0s 0 44.8 54.1 109.1 159.4 285.2 352.3 update 140.0s 0 737.9 1034.5 121.6 218.1 318.8 637.5 read 140.0s 0 39.6 53.6 125.8 184.5 302.0 335.5 update 145.0s 0 853.1 1028.2 109.1 142.6 176.2 352.3 read 145.0s 0 45.2 53.3 113.2 151.0 176.2 268.4 update 150.0s 0 889.4 1023.6 104.9 134.2 167.8 335.5 read 150.0s 0 45.2 53.0 109.1 134.2 176.2 176.2 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 155.0s 0 888.0 1019.2 104.9 134.2 176.2 335.5 read 155.0s 0 48.4 52.9 109.1 142.6 167.8 192.9 update 160.0s 0 773.5 1011.5 113.2 201.3 318.8 939.5 read 160.0s 0 42.2 52.5 121.6 159.4 318.8 637.5 update 165.0s 0 1002.0 1011.3 113.2 159.4 192.9 369.1 read 165.0s 0 54.0 52.6 125.8 167.8 209.7 318.8 update 170.0s 0 1294.3 1019.6 54.5 151.0 243.3 402.7 read 170.0s 0 66.4 53.0 113.2 201.3 243.3 318.8 update 175.0s 0 1718.8 1039.6 28.3 117.4 142.6 285.2 read 175.0s 0 93.8 54.2 104.9 125.8 159.4 318.8 update 180.0s 0 1838.6 1061.7 26.2 117.4 167.8 671.1 read 180.0s 0 99.2 55.4 104.9 142.6 260.0 318.8 update 185.0s 0 1359.8 1069.8 46.1 151.0 285.2 1006.6 read 185.0s 0 70.8 55.8 130.0 201.3 335.5 453.0 update 190.0s 0 2097.0 1096.8 25.2 117.4 167.8 335.5 read 190.0s 0 106.6 57.2 109.1 134.2 226.5 285.2 update 195.0s 0 2127.7 1123.3 23.1 113.2 151.0 335.5 read 195.0s 0 111.6 58.6 104.9 151.0 251.7 318.8 update 200.0s 0 2206.6 1150.4 22.0 109.1 134.2 352.3 read 200.0s 0 111.2 59.9 104.9 130.0 209.7 302.0 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 205.0s 0 1436.5 1157.3 41.9 159.4 243.3 1409.3 read 205.0s 0 77.0 60.3 121.6 209.7 318.8 419.4 update 210.0s 0 1578.2 1167.3 37.7 151.0 251.7 520.1 read 210.0s 0 84.2 60.9 117.4 209.7 318.8 402.7 update 215.0s 0 2116.6 1189.4 24.1 113.2 167.8 285.2 read 215.0s 0 106.8 61.9 104.9 130.0 243.3 268.4 update 220.0s 0 1326.0 1192.5 96.5 117.4 192.9 335.5 read 220.0s 0 60.4 61.9 104.9 151.0 243.3 285.2 update 225.0s 0 902.1 1186.1 100.7 130.0 167.8 352.3 read 225.0s 0 53.2 61.7 104.9 134.2 184.5 243.3 update 230.0s 0 696.1 1175.4 130.0 234.9 318.8 704.6 read 230.0s 0 36.0 61.1 134.2 268.4 318.8 671.1 update 235.0s 0 828.0 1168.0 113.2 151.0 176.2 318.8 read 235.0s 0 45.6 60.8 113.2 142.6 167.8 218.1 update 240.0s 0 884.1 1162.1 104.9 125.8 192.9 503.3 read 240.0s 0 48.2 60.6 104.9 130.0 176.2 234.9 update 245.0s 0 900.0 1156.8 104.9 134.2 184.5 251.7 read 245.0s 0 47.2 60.3 104.9 159.4 184.5 218.1 update 250.0s 0 783.3 1149.3 113.2 167.8 302.0 671.1 read 250.0s 0 46.6 60.0 117.4 167.8 302.0 318.8 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 255.0s 0 754.3 1141.6 121.6 176.2 268.4 436.2 read 255.0s 0 40.6 59.6 121.6 167.8 251.7 302.0 update 260.0s 0 887.2 1136.7 104.9 130.0 159.4 335.5 read 260.0s 0 48.0 59.4 109.1 142.6 167.8 184.5 update 265.0s 0 852.3 1131.3 109.1 142.6 167.8 335.5 read 265.0s 0 53.0 59.3 109.1 142.6 167.8 192.9 update 270.0s 0 913.8 1127.3 104.9 125.8 142.6 335.5 read 270.0s 0 50.4 59.1 104.9 134.2 151.0 176.2 update 275.0s 0 647.6 1118.5 130.0 285.2 352.3 838.9 read 275.0s 0 36.4 58.7 130.0 285.2 352.3 704.6 update 280.0s 0 740.7 1111.8 121.6 209.7 251.7 352.3 read 280.0s 0 37.8 58.3 125.8 209.7 243.3 251.7 update 285.0s 0 805.1 1106.4 109.1 184.5 218.1 352.3 read 285.0s 0 44.4 58.1 113.2 184.5 218.1 352.3 update 290.0s 0 881.8 1102.5 109.1 134.2 176.2 260.0 read 290.0s 0 45.2 57.9 109.1 134.2 176.2 268.4 update 295.0s 0 904.2 1099.2 104.9 125.8 142.6 335.5 read 295.0s 0 44.6 57.6 109.1 130.0 151.0 167.8 update 300.0s 0 680.0 1092.2 130.0 226.5 302.0 1006.6 read 300.0s 0 31.2 57.2 134.2 226.5 318.8 704.6 update _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 327660 1092.2 85.5 100.7 151.0 234.9 1409.3 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 17160 57.2 115.5 109.1 159.4 268.4 771.8 update _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 344820 1149.4 87.0 100.7 151.0 243.3 1409.3 It may be hard to see, but if we look at the very bottom of each run, we can quickly see the difference: CockroachDB directly: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 256245 854.1 111.1 109.1 125.8 402.7 838.9 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 13559 45.2 113.7 113.2 130.0 402.7 536.9 update _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 269804 899.3 111.2 109.1 125.8 402.7 838.9 PolyScale: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 327660 1092.2 85.5 100.7 151.0 234.9 1409.3 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 17160 57.2 115.5 109.1 159.4 268.4 771.8 update _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 344820 1149.4 87.0 100.7 151.0 243.3 1409.3 We're seeing a speedup across total ops, ops/sec, and reduced p50, p99, and pMax. Not sure why p95 is higher for PolyScale, but overall this is good. Let's switch to the PolyScale summary page: Our workload is about 48% effective in cache hits. Let's now run a read-only workload to understand the behavior when there are 100% reads. YCSB Workload C CockroachDB: cockroach workload run ycsb \ --duration=5m \ --display-every=5s \ --display-format=simple \ --concurrency=100 \ --tolerate-errors \ --workload C \ $DATABASE_URL I230510 14:33:48.366710 1 workload/cli/run.go:460 [-] 3 creating load generator... done (took 4m37.746131792s) _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 5.0s 0 13.0 732.8 125.8 167.8 436.2 520.1 read 10.0s 0 754.6 743.7 121.6 159.4 436.2 637.5 read 15.0s 0 777.2 754.9 121.6 142.6 436.2 838.9 read 20.0s 0 725.1 747.4 130.0 159.4 453.0 872.4 read 25.0s 0 889.5 775.8 109.1 134.2 436.2 805.3 read 30.0s 0 932.6 802.0 109.1 117.4 125.8 503.3 read 35.0s 0 961.6 824.8 104.9 117.4 125.8 436.2 read 40.0s 0 981.7 844.4 100.7 113.2 125.8 503.3 read 45.0s 0 886.1 849.0 109.1 130.0 402.7 436.2 read 50.0s 0 774.8 841.6 121.6 142.6 436.2 570.4 read 55.0s 0 796.9 837.5 121.6 134.2 436.2 469.8 read 60.0s 0 755.5 830.7 121.6 151.0 436.2 671.1 read 65.0s 0 743.5 824.0 121.6 151.0 453.0 805.3 read 70.0s 0 766.0 819.9 125.8 176.2 453.0 604.0 read 75.0s 0 686.3 810.9 121.6 352.3 536.9 604.0 read 80.0s 0 785.4 809.4 121.6 142.6 436.2 503.3 read 85.0s 0 769.1 807.0 121.6 142.6 436.2 738.2 read 90.0s 0 783.8 805.7 121.6 142.6 436.2 570.4 read 95.0s 0 813.2 806.1 117.4 134.2 419.4 805.3 read 100.0s 0 768.2 804.2 121.6 142.6 436.2 838.9 read _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 105.0s 0 789.8 803.5 121.6 134.2 436.2 771.8 read 110.0s 0 754.4 801.3 125.8 151.0 436.2 671.1 read 115.0s 0 751.0 799.1 125.8 159.4 436.2 604.0 read 120.0s 0 780.2 798.3 125.8 151.0 436.2 671.1 read 125.0s 0 805.6 798.6 121.6 151.0 436.2 520.1 read 130.0s 0 896.7 802.4 109.1 130.0 318.8 536.9 read 135.0s 0 958.9 808.2 104.9 121.6 134.2 453.0 read 140.0s 0 921.7 812.2 109.1 121.6 125.8 436.2 read 145.0s 0 905.5 815.4 109.1 121.6 130.0 671.1 read 150.0s 0 880.7 817.6 113.2 125.8 402.7 570.4 read 155.0s 0 886.6 819.8 113.2 125.8 402.7 486.5 read 160.0s 0 892.7 822.1 109.1 125.8 402.7 436.2 read 165.0s 0 901.9 824.5 109.1 121.6 134.2 469.8 read 170.0s 0 876.0 826.1 113.2 130.0 142.6 453.0 read 175.0s 0 933.3 829.1 104.9 121.6 151.0 453.0 read 180.0s 0 922.6 831.7 104.9 125.8 151.0 469.8 read 185.0s 0 886.2 833.2 109.1 125.8 419.4 469.8 read 190.0s 0 906.8 835.1 109.1 130.0 142.6 570.4 read 195.0s 0 904.4 836.9 109.1 125.8 209.7 671.1 read 200.0s 0 896.3 838.4 113.2 125.8 134.2 436.2 read _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 205.0s 0 922.1 840.4 104.9 121.6 402.7 436.2 read 210.0s 0 899.6 841.8 109.1 125.8 385.9 838.9 read 215.0s 0 893.2 843.0 109.1 125.8 134.2 453.0 read 220.0s 0 909.5 844.5 109.1 121.6 402.7 536.9 read 225.0s 0 917.6 846.2 109.1 130.0 134.2 520.1 read 230.0s 0 887.1 847.1 109.1 130.0 142.6 503.3 read 235.0s 0 890.6 848.0 113.2 125.8 419.4 520.1 read 240.0s 0 905.5 849.2 109.1 125.8 130.0 520.1 read 245.0s 0 949.6 851.2 104.9 117.4 125.8 738.2 read 250.0s 0 869.6 851.6 113.2 130.0 419.4 520.1 read 255.0s 0 920.7 853.0 104.9 121.6 130.0 536.9 read 260.0s 0 914.2 854.1 109.1 125.8 130.0 536.9 read 265.0s 0 870.5 854.4 113.2 125.8 419.4 771.8 read 270.0s 0 877.6 854.9 109.1 130.0 419.4 503.3 read 275.0s 0 934.5 856.3 109.1 121.6 130.0 436.2 read 280.0s 0 909.4 857.3 109.1 125.8 134.2 503.3 read 285.0s 0 889.1 857.8 113.2 130.0 159.4 536.9 read 290.0s 0 829.1 857.3 121.6 142.6 419.4 453.0 read 295.0s 0 840.0 857.0 121.6 142.6 419.4 469.8 read 300.0s 0 820.2 856.4 117.4 134.2 419.4 738.2 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 256926 856.4 116.7 113.2 134.2 419.4 872.4 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 256926 856.4 116.7 113.2 134.2 419.4 872.4 PolyScale: 230510 17:42:50.929786 1 workload/cli/run.go:460 [-] 3 creating load generator... done (took 5m3.332333041s) _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 5.0s 0 17.8 1098.0 96.5 125.8 176.2 335.5 read 10.0s 0 1384.6 1241.1 92.3 113.2 142.6 385.9 read 15.0s 0 1320.9 1267.8 92.3 142.6 234.9 637.5 read 20.0s 0 1176.9 1245.0 100.7 159.4 251.7 738.2 read 25.0s 0 1619.8 1320.0 88.1 121.6 167.8 318.8 read 30.0s 0 1737.6 1389.6 32.5 109.1 125.8 251.7 read 35.0s 0 1735.7 1439.0 30.4 113.2 142.6 335.5 read 40.0s 0 1100.3 1396.7 75.5 218.1 352.3 1073.7 read 45.0s 0 1504.0 1408.6 46.1 142.6 176.2 369.1 read 50.0s 0 1769.9 1444.7 30.4 117.4 142.6 335.5 read 55.0s 0 1711.0 1468.9 31.5 121.6 176.2 251.7 read 60.0s 0 1655.3 1484.5 35.7 121.6 159.4 570.4 read 65.0s 0 1169.2 1460.2 62.9 192.9 335.5 1543.5 read 70.0s 0 1744.9 1480.6 32.5 121.6 142.6 352.3 read 75.0s 0 1741.3 1497.9 30.4 125.8 226.5 469.8 read 80.0s 0 1920.5 1524.4 25.2 109.1 134.2 335.5 read 85.0s 0 1415.1 1517.9 48.2 151.0 302.0 973.1 read 90.0s 0 1585.5 1521.7 44.0 134.2 159.4 453.0 read 95.0s 0 1846.7 1538.8 27.3 113.2 151.0 335.5 read 100.0s 0 1893.2 1556.5 26.2 109.1 134.2 260.0 read _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 105.0s 0 1896.9 1572.7 26.2 113.2 142.6 302.0 read 110.0s 0 1136.0 1552.9 60.8 192.9 318.8 1073.7 read 115.0s 0 1780.1 1562.7 32.5 121.6 142.6 369.1 read 120.0s 0 1943.9 1578.6 26.2 109.1 125.8 453.0 read 125.0s 0 2003.5 1595.6 24.1 109.1 121.6 335.5 read 130.0s 0 1674.4 1598.6 37.7 125.8 226.5 704.6 read 135.0s 0 1230.6 1585.0 58.7 192.9 302.0 1006.6 read 140.0s 0 1833.8 1593.9 28.3 121.6 151.0 402.7 read 145.0s 0 1798.1 1600.9 29.4 121.6 167.8 352.3 read 150.0s 0 2037.5 1615.5 24.1 109.1 117.4 335.5 read 155.0s 0 1334.2 1606.4 52.4 167.8 318.8 738.2 read 160.0s 0 1634.6 1607.3 39.8 142.6 192.9 318.8 read 165.0s 0 970.6 1588.0 104.9 125.8 134.2 335.5 read 170.0s 0 1193.2 1576.4 100.7 121.6 167.8 285.2 read 175.0s 0 1413.8 1571.7 96.5 134.2 226.5 704.6 read 180.0s 0 1348.0 1565.5 56.6 142.6 285.2 453.0 read 185.0s 0 1836.5 1572.9 27.3 109.1 125.8 176.2 read 190.0s 0 2030.2 1584.9 23.1 104.9 117.4 335.5 read 195.0s 0 2008.1 1595.7 24.1 113.2 142.6 260.0 read 200.0s 0 1669.5 1597.6 37.7 134.2 243.3 1073.7 read _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 205.0s 0 1506.2 1595.4 46.1 142.6 226.5 637.5 read 210.0s 0 1680.9 1597.4 35.7 113.2 125.8 352.3 read 215.0s 0 1105.0 1585.9 100.7 134.2 184.5 318.8 read 220.0s 0 1573.3 1585.7 92.3 109.1 121.6 268.4 read 225.0s 0 1169.8 1576.4 65.0 184.5 318.8 906.0 read 230.0s 0 1700.8 1579.1 32.5 130.0 192.9 503.3 read 235.0s 0 1958.2 1587.2 25.2 113.2 134.2 318.8 read 240.0s 0 1996.5 1595.7 25.2 109.1 151.0 318.8 read 245.0s 0 1011.0 1583.8 104.9 134.2 201.3 369.1 read 250.0s 0 858.2 1569.3 113.2 159.4 302.0 671.1 read 255.0s 0 1172.7 1561.5 100.7 125.8 151.0 260.0 read 260.0s 0 1514.6 1560.6 92.3 121.6 167.8 260.0 read 265.0s 0 1877.9 1566.6 26.2 109.1 130.0 209.7 read 270.0s 0 1500.3 1565.3 46.1 159.4 302.0 637.5 read 275.0s 0 1304.1 1560.6 54.5 176.2 243.3 604.0 read 280.0s 0 888.0 1548.6 109.1 234.9 385.9 486.5 read 285.0s 0 954.8 1538.2 104.9 121.6 142.6 260.0 read 290.0s 0 1069.4 1530.1 100.7 121.6 130.0 352.3 read 295.0s 0 1105.8 1522.9 104.9 159.4 302.0 906.0 read 300.0s 0 1830.9 1528.0 26.2 104.9 117.4 335.5 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 458408 1528.0 65.4 54.5 130.0 209.7 1543.5 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 458408 1528.0 65.4 54.5 130.0 209.7 1543.5 Let's again compare side by side. CockroachDB: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 256926 856.4 116.7 113.2 134.2 419.4 872.4 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 256926 856.4 116.7 113.2 134.2 419.4 872.4 PolyScale: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 458408 1528.0 65.4 54.5 130.0 209.7 1543.5 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 458408 1528.0 65.4 54.5 130.0 209.7 1543.5 Here is where we are seeing almost 2x performance improvement over CockroachDB. Let's again look at the summary page. We're now getting 53% cache hits cumulatively with Workload B. The number of total queries has grown significantly and we can see the cache contributes to the performance heavily in this case. PolyScale Self-Hosted As the last thing in our experiment, I would like to evaluate the PolyScale Self-Hosted option, which allows customers to deploy their own PoP in the location of their choosing. This should potentially improve the performance and latency of the customer application. Let's run all of the tests to date using the local PoP. You have to follow the PolyScale docs to stand up an instance of the PolyScale Docker environment running a local cache. You will need an API key, which at the time of this writing requires filling out a form. Then start the container using the following command: docker run -e PCE_API_KEY=<your_api_key> \ -p 26257:9010 \ ghcr.io/polyscale/pce:stable Port 9010 is the internal port where PolyScale exposes the PostgreSQL protocol. We now have the flexibility to map the external port to the native CockroachDB port 26257 and stop using the PostgreSQL port 5432 like the hosted PolyScale cache. Then, all we have to do is swap the psedge.global for 127.0.0.1 and the port of your choice. You can reuse the same cache as before. The added benefit of this method is the breadth of observability you get. My Docker logs show the following: [2023-05-11 18:38:09.381][43][info][filter] [src/filters/postgres/parsers_backend.cpp:124] [314][22] closing state for describe after no data found [2023-05-11 18:38:09.485][43][info][filter] [src/filters/postgres/parsers_backend.cpp:124] [314][23] closing state for describe after no data found [2023-05-11 18:38:09.593][43][info][filter] [src/filters/postgres/parsers_backend.cpp:124] [314][24] closing state for describe after no data found Finally, connect to the PoP using the following command and the cockroach client: cockroach sql --url "postgresql://your_cache_id:password@127.0.0.1:26257/ycsb?sslmode=require" artem@127.0.0.1:26257/ycsb> select 1; ?column? ------------ 1 Time: 89ms total (execution 1ms / network 88ms) artem@127.0.0.1:26257/ycsb> select 1; ?column? ------------ 1 Time: 5ms total (execution 0ms / network 5ms) Would you look at that? The network latency is now 5ms - a big improvement over our original attempt going through Clifton PoP. Recall also that our execution latency was consistently 452 ms, i.e., Time: 99ms total (execution 452ms / network 99ms), and I originally said let's leave that aside. Well, I'm relieved to see that that execution latency is gone now that we're deploying a PoP locally to the client. For posterity, here's the test using psql client. psql "host=127.0.0.1 \ sslmode=require \ port=26257 \ user='username' \ dbname='defaultdb' \ application_name='<your_cache_id>'" defaultdb=> select 1; ?column? ---------- 1 (1 row) Time: 93.111 ms defaultdb=> select 1; ?column? ---------- 1 (1 row) Time: 6.727 ms Let's insert a few records and measure: artem@127.0.0.1:26257/ycsb> insert into test (val) values (1), (2), (3), (4), (5) returning id; id ---------------------------------------- 806dcb98-0be8-43b2-aa73-8b7805111e99 f2a17d1b-2fca-40be-82a3-21403a63a1e9 44803c21-3e38-492f-abe9-2e81e46e063a 58b622b2-cacc-4d15-be47-d5ff280df848 c0b417d9-5306-4e65-be34-7156265c86f2 Time: 94ms total (execution 5ms / network 90ms) Not a huge improvement. Let's see the YCSB workload B next. Export DATABASE_URL="postgres://your_cache_id:password@127.0.0.1:26257/ycsb". cockroach workload run ycsb \ --duration=5m \ --display-every=5s \ --display-format=simple \ --concurrency=100 \ --tolerate-errors \ --workload B \ $DATABASE_URL I230511 18:22:07.713241 1 workload/cli/run.go:460 [-] 3 creating load generator... done (took 4m49.643270167s) _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 5.0s 0 27.8 1641.0 83.9 88.1 121.6 159.4 read 5.0s 0 1.3 77.8 88.1 92.3 125.8 159.4 update 10.0s 0 2692.9 2166.6 1.2 88.1 88.1 184.5 read 10.0s 0 146.8 112.3 88.1 92.3 96.5 109.1 update 15.0s 0 3062.3 2465.2 1.1 88.1 92.3 104.9 read 15.0s 0 165.8 130.1 88.1 92.3 96.5 104.9 update 20.0s 0 2253.1 2412.2 1.9 88.1 92.3 117.4 read 20.0s 0 111.2 125.4 88.1 92.3 96.5 100.7 update 25.0s 0 2160.8 2361.9 3.3 88.1 92.3 192.9 read 25.0s 0 117.6 123.8 88.1 92.3 96.5 100.7 update 30.0s 0 2451.5 2376.8 1.2 88.1 88.1 96.5 read 30.0s 0 120.6 123.3 88.1 92.3 92.3 96.5 update 35.0s 0 2497.5 2394.0 1.4 88.1 92.3 192.9 read 35.0s 0 132.4 124.6 88.1 92.3 96.5 469.8 update 40.0s 0 2485.6 2405.5 1.3 88.1 92.3 453.0 read 40.0s 0 126.2 124.8 88.1 92.3 96.5 121.6 update 45.0s 0 2441.6 2409.5 1.3 88.1 96.5 243.3 read 45.0s 0 132.8 125.7 88.1 96.5 142.6 234.9 update 50.0s 0 2496.1 2418.1 1.3 88.1 92.3 109.1 read 50.0s 0 132.8 126.4 88.1 92.3 96.5 109.1 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 55.0s 0 2535.4 2428.8 1.4 88.1 92.3 104.9 read 55.0s 0 142.2 127.8 88.1 92.3 96.5 104.9 update 60.0s 0 2675.6 2449.4 1.2 88.1 88.1 100.7 read 60.0s 0 138.2 128.7 88.1 92.3 96.5 100.7 update 65.0s 0 2664.7 2465.9 1.1 88.1 88.1 209.7 read 65.0s 0 139.6 129.5 88.1 92.3 92.3 104.9 update 70.0s 0 2668.7 2480.4 1.2 88.1 92.3 104.9 read 70.0s 0 143.2 130.5 88.1 92.3 100.7 109.1 update 75.0s 0 2679.4 2493.7 1.3 88.1 92.3 419.4 read 75.0s 0 136.4 130.9 88.1 96.5 109.1 419.4 update 80.0s 0 2679.6 2505.3 2.1 92.3 104.9 159.4 read 80.0s 0 137.6 131.3 88.1 104.9 117.4 134.2 update 85.0s 0 2893.9 2528.1 1.1 88.1 88.1 453.0 read 85.0s 0 144.6 132.1 88.1 92.3 96.5 104.9 update 90.0s 0 2806.4 2543.6 1.2 88.1 92.3 453.0 read 90.0s 0 148.2 133.0 88.1 92.3 109.1 117.4 update 95.0s 0 2698.1 2551.7 1.2 88.1 92.3 192.9 read 95.0s 0 149.4 133.9 88.1 92.3 104.9 104.9 update 100.0s 0 2791.1 2563.7 1.3 88.1 92.3 130.0 read 100.0s 0 139.0 134.1 88.1 92.3 104.9 125.8 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 105.0s 0 2765.1 2573.3 1.2 88.1 88.1 184.5 read 105.0s 0 141.4 134.5 88.1 92.3 96.5 104.9 update 110.0s 0 2762.5 2581.9 1.3 88.1 96.5 117.4 read 110.0s 0 145.4 135.0 88.1 96.5 104.9 117.4 update 115.0s 0 2856.4 2593.8 1.2 88.1 92.3 117.4 read 115.0s 0 143.6 135.3 88.1 92.3 96.5 104.9 update 120.0s 0 2829.7 2603.7 1.2 88.1 92.3 125.8 read 120.0s 0 145.0 135.7 88.1 92.3 100.7 109.1 update 125.0s 0 2713.6 2608.1 1.2 88.1 96.5 419.4 read 125.0s 0 142.0 136.0 88.1 92.3 104.9 453.0 update 130.0s 0 2807.1 2615.7 1.1 88.1 92.3 109.1 read 130.0s 0 159.4 136.9 88.1 92.3 100.7 121.6 update 135.0s 0 2715.3 2619.4 1.2 88.1 92.3 104.9 read 135.0s 0 143.4 137.1 88.1 92.3 100.7 113.2 update 140.0s 0 2732.0 2623.4 1.4 88.1 92.3 109.1 read 140.0s 0 139.4 137.2 88.1 92.3 100.7 113.2 update 145.0s 0 2827.4 2630.5 1.4 88.1 92.3 117.4 read 145.0s 0 141.6 137.4 88.1 92.3 104.9 121.6 update 150.0s 0 2758.3 2634.7 1.4 88.1 96.5 113.2 read 150.0s 0 141.8 137.5 88.1 92.3 104.9 121.6 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 155.0s 0 2746.2 2638.3 1.2 88.1 92.3 109.1 read 155.0s 0 140.2 137.6 88.1 92.3 96.5 109.1 update 160.0s 0 2788.5 2643.0 1.1 88.1 92.3 104.9 read 160.0s 0 149.4 138.0 88.1 92.3 100.7 268.4 update 165.0s 0 2826.6 2648.6 1.1 88.1 92.3 104.9 read 165.0s 0 142.2 138.1 88.1 92.3 100.7 104.9 update 170.0s 0 2864.7 2654.9 1.1 88.1 92.3 109.1 read 170.0s 0 154.6 138.6 88.1 92.3 100.7 113.2 update 175.0s 0 2817.0 2659.6 1.3 88.1 92.3 419.4 read 175.0s 0 142.6 138.7 88.1 96.5 104.9 218.1 update 180.0s 0 2732.5 2661.6 1.2 88.1 92.3 100.7 read 180.0s 0 146.4 138.9 88.1 92.3 96.5 100.7 update 185.0s 0 2644.3 2661.1 1.2 88.1 96.5 142.6 read 185.0s 0 146.0 139.1 88.1 92.3 104.9 113.2 update 190.0s 0 2737.3 2663.1 1.2 88.1 88.1 104.9 read 190.0s 0 146.2 139.3 88.1 92.3 96.5 104.9 update 195.0s 0 2625.2 2662.2 1.2 88.1 92.3 134.2 read 195.0s 0 144.8 139.4 88.1 92.3 100.7 142.6 update 200.0s 0 2603.2 2660.7 1.2 88.1 92.3 167.8 read 200.0s 0 127.2 139.1 88.1 92.3 109.1 117.4 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 205.0s 0 2622.0 2659.7 1.4 88.1 96.5 117.4 read 205.0s 0 126.8 138.8 88.1 92.3 104.9 113.2 update 210.0s 0 2606.6 2658.5 1.2 88.1 88.1 113.2 read 210.0s 0 138.6 138.8 88.1 92.3 100.7 130.0 update 215.0s 0 2546.6 2655.9 1.2 88.1 92.3 104.9 read 215.0s 0 137.8 138.8 88.1 92.3 96.5 113.2 update 220.0s 0 2551.0 2653.5 1.2 88.1 88.1 96.5 read 220.0s 0 134.4 138.7 88.1 92.3 92.3 96.5 update 225.0s 0 2615.4 2652.6 1.2 88.1 92.3 125.8 read 225.0s 0 142.8 138.8 88.1 92.3 100.7 104.9 update 230.0s 0 2585.3 2651.2 1.3 88.1 92.3 113.2 read 230.0s 0 131.2 138.6 88.1 92.3 100.7 109.1 update 235.0s 0 2565.9 2649.4 1.4 88.1 92.3 113.2 read 235.0s 0 139.4 138.6 88.1 92.3 104.9 113.2 update 240.0s 0 2640.1 2649.2 1.2 88.1 92.3 192.9 read 240.0s 0 139.0 138.6 88.1 92.3 96.5 100.7 update 245.0s 0 2618.4 2648.5 1.3 88.1 92.3 104.9 read 245.0s 0 141.4 138.7 88.1 92.3 96.5 104.9 update 250.0s 0 2645.2 2648.5 1.2 88.1 92.3 209.7 read 250.0s 0 140.2 138.7 88.1 92.3 100.7 117.4 update _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 255.0s 0 2634.6 2648.2 1.2 88.1 92.3 109.1 read 255.0s 0 131.2 138.6 88.1 92.3 100.7 109.1 update 260.0s 0 2590.1 2647.1 1.2 88.1 88.1 419.4 read 260.0s 0 135.0 138.5 88.1 92.3 92.3 100.7 update 265.0s 0 2573.2 2645.7 1.5 88.1 96.5 218.1 read 265.0s 0 134.2 138.4 88.1 96.5 109.1 130.0 update 270.0s 0 2603.3 2644.9 1.2 88.1 92.3 419.4 read 270.0s 0 134.2 138.4 88.1 92.3 96.5 100.7 update 275.0s 0 2613.3 2644.3 1.2 88.1 92.3 109.1 read 275.0s 0 141.6 138.4 88.1 92.3 100.7 117.4 update 280.0s 0 2589.5 2643.4 1.4 88.1 96.5 453.0 read 280.0s 0 133.0 138.3 88.1 96.5 100.7 113.2 update 285.0s 0 2675.9 2643.9 1.4 88.1 96.5 142.6 read 285.0s 0 130.2 138.2 88.1 96.5 125.8 192.9 update 290.0s 0 2726.6 2645.4 1.2 88.1 92.3 419.4 read 290.0s 0 145.4 138.3 88.1 92.3 100.7 109.1 update 295.0s 0 2636.5 2645.2 1.3 88.1 92.3 302.0 read 295.0s 0 144.2 138.4 88.1 92.3 100.7 104.9 update 300.0s 0 2592.5 2644.3 1.2 88.1 92.3 453.0 read 300.0s 0 136.2 138.4 88.1 92.3 100.7 113.2 update _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 793298 2644.3 33.2 1.2 88.1 92.3 453.0 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 41509 138.4 86.5 88.1 92.3 100.7 469.8 update _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 834807 2782.7 35.8 1.4 88.1 92.3 469.8 This is very promising. Recall our previous PolyScale Workload B results: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 327660 1092.2 85.5 100.7 151.0 234.9 1409.3 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 17160 57.2 115.5 109.1 159.4 268.4 771.8 update _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 344820 1149.4 87.0 100.7 151.0 243.3 1409.3 We now process almost 3x better than our previous result (i.e., 834807 ops vs 344820). Let's look at the metrics: I'm excited to run the Workload C!!! cockroach workload run ycsb \ --duration=5m \ --display-every=5s \ --display-format=simple \ --concurrency=100 \ --tolerate-errors \ --workload C \ $DATABASE_URL I230511 18:38:09.595527 1 workload/cli/run.go:460 [-] 3 creating load generator... done (took 5m50.836536042s) _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 5.0s 0 11.6 827.8 100.7 318.8 352.3 352.3 read 10.0s 0 1394.3 1110.8 88.1 218.1 251.7 604.0 read 15.0s 0 2928.5 1716.6 1.6 88.1 96.5 151.0 read 20.0s 0 2204.6 1838.7 2.8 302.0 369.1 503.3 read 25.0s 0 2937.2 2058.3 2.5 184.5 302.0 369.1 read 30.0s 0 5017.2 2551.4 1.9 88.1 100.7 130.0 read 35.0s 0 5213.6 2931.7 1.7 88.1 104.9 151.0 read 40.0s 0 5379.0 3237.6 1.7 88.1 100.7 134.2 read 45.0s 0 5424.7 3480.8 1.8 88.1 104.9 335.5 read 50.0s 0 5598.3 3692.4 2.0 88.1 100.7 130.0 read 55.0s 0 5666.8 3871.9 2.1 92.3 109.1 159.4 read 60.0s 0 5970.0 4046.7 2.0 88.1 100.7 134.2 read 65.0s 0 6257.6 4216.8 2.2 88.1 100.7 167.8 read 70.0s 0 6221.1 4360.0 2.0 88.1 100.7 385.9 read 75.0s 0 6420.8 4497.4 2.1 88.1 104.9 142.6 read 80.0s 0 6594.0 4628.4 2.2 88.1 100.7 130.0 read 85.0s 0 6723.7 4751.7 2.5 88.1 100.7 260.0 read 90.0s 0 6831.4 4867.2 2.4 88.1 100.7 184.5 read 95.0s 0 7246.8 4992.4 2.4 88.1 100.7 260.0 read 100.0s 0 7117.2 5098.7 3.0 92.3 100.7 369.1 read _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 105.0s 0 6969.3 5187.8 2.5 92.3 109.1 176.2 read 110.0s 0 7500.4 5292.9 2.5 88.1 96.5 419.4 read 115.0s 0 7281.7 5379.4 2.8 92.3 100.7 453.0 read 120.0s 0 7497.8 5467.6 3.1 92.3 100.7 260.0 read 125.0s 0 7466.5 5547.6 3.0 88.1 100.7 176.2 read 130.0s 0 7503.0 5622.8 3.0 92.3 96.5 469.8 read 135.0s 0 7578.9 5695.2 3.4 92.3 100.7 159.4 read 140.0s 0 7629.1 5764.3 3.3 92.3 100.7 251.7 read 145.0s 0 7656.2 5829.5 3.1 92.3 100.7 260.0 read 150.0s 0 7961.1 5900.6 2.5 88.1 96.5 419.4 read 155.0s 0 6791.4 5929.3 2.6 92.3 100.7 469.8 read 160.0s 0 7750.3 5986.2 3.0 92.3 100.7 436.2 read 165.0s 0 6863.4 6012.8 3.5 92.3 104.9 335.5 read 170.0s 0 7863.8 6067.3 2.9 88.1 100.7 151.0 read 175.0s 0 7391.4 6105.1 2.8 88.1 100.7 176.2 read 180.0s 0 8107.5 6160.7 3.8 92.3 100.7 159.4 read 185.0s 0 7546.5 6198.2 3.3 92.3 100.7 151.0 read 190.0s 0 7062.1 6220.9 3.0 92.3 104.9 192.9 read 195.0s 0 7126.5 6244.1 2.4 88.1 96.5 142.6 read 200.0s 0 6998.4 6263.0 2.2 88.1 100.7 134.2 read _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 205.0s 0 7850.9 6301.7 3.3 92.3 100.7 453.0 read 210.0s 0 8385.3 6351.3 3.3 88.1 100.7 469.8 read 215.0s 0 8135.3 6392.8 2.8 88.1 96.5 369.1 read 220.0s 0 7162.7 6410.3 2.6 92.3 100.7 151.0 read 225.0s 0 7122.3 6426.1 2.4 88.1 96.5 335.5 read 230.0s 0 7149.6 6441.9 2.5 88.1 100.7 142.6 read 235.0s 0 6739.3 6448.2 2.4 88.1 100.7 335.5 read 240.0s 0 7754.8 6475.4 2.8 88.1 100.7 167.8 read 245.0s 0 7024.8 6486.6 2.6 88.1 100.7 142.6 read 250.0s 0 8111.3 6519.1 3.1 88.1 100.7 159.4 read 255.0s 0 7607.0 6540.4 3.0 92.3 100.7 159.4 read 260.0s 0 6869.9 6546.8 2.5 88.1 100.7 419.4 read 265.0s 0 6091.9 6538.2 2.0 88.1 104.9 151.0 read 270.0s 0 7900.0 6563.4 3.0 88.1 100.7 260.0 read 275.0s 0 8252.5 6594.1 3.5 92.3 100.7 159.4 read 280.0s 0 7711.3 6614.1 3.0 88.1 100.7 436.2 read 285.0s 0 7802.7 6634.9 2.4 88.1 96.5 335.5 read 290.0s 0 7958.9 6657.8 2.9 88.1 100.7 268.4 read 295.0s 0 7351.4 6669.5 2.5 88.1 100.7 130.0 read 300.0s 0 6433.0 6665.6 2.4 92.3 109.1 419.4 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 1999672 6665.6 15.0 2.6 92.3 104.9 604.0 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 1999672 6665.6 15.0 2.6 92.3 104.9 604.0 Here are the results from our previous PolyScale run: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 458408 1528.0 65.4 54.5 130.0 209.7 1543.5 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 458408 1528.0 65.4 54.5 130.0 209.7 1543.5 Compare that to the new results: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 300.0s 0 1999672 6665.6 15.0 2.6 92.3 104.9 604.0 read _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 300.0s 0 1999672 6665.6 15.0 2.6 92.3 104.9 604.0 We are processing 4x more queries and our ops/sec are also 4x. PgBench Workload As the very last experiment, I would like to focus on the PgBench workload. In my line of work, I don't necessarily have a choice of workloads when customers reach out with a database pain. We see read-heavy, write heavy and mixed workloads. The ycsb Workloads B and C demonstrate the impact of caching well. As a counter-example, I'd like to demonstrate where PolyScale will not make a significant impact, with or without a local cache. PgBench is a standard benchmark tool that ships with PostgreSQL. It is ubiquitous and something we see most frequently used when customers evaluate CockroachDB. The example I'm going to demonstrate is optimized for CockroachDB, meaning the TPC-B explicit transaction shipped with PgBench is rewritten as a common table expression. Please see my articles on optimizing PgBench for CockroachDB for more information: Using pgbench With CockroachDB Serverless Optimizing Pgbench for CockroachDB Part 1 Optimizing Pgbench for CockroachDB Part 2 Using PgBench with CockroachDB Serverless directly and a separate connection per transaction: pgbench \ --host=${PGHOST} \ --no-vacuum \ --file=tpcb-cockroach.sql@1 \ --client=8 \ --jobs=8 \ --username=${PGUSER} \ --port=${PGPORT} \ -T 60 \ -P 5 \ -C \ --failures-detailed \ ${PGDATABASE} progress: 5.5 s, 10.2 tps, lat 110.543 ms stddev 27.768, 0 failed progress: 10.3 s, 11.6 tps, lat 103.826 ms stddev 12.722, 0 failed progress: 15.1 s, 11.7 tps, lat 102.692 ms stddev 8.984, 0 failed progress: 20.0 s, 11.4 tps, lat 101.976 ms stddev 9.157, 0 failed progress: 25.6 s, 11.5 tps, lat 101.600 ms stddev 8.100, 0 failed progress: 30.4 s, 11.6 tps, lat 100.336 ms stddev 8.167, 0 failed progress: 35.4 s, 11.3 tps, lat 103.329 ms stddev 9.042, 0 failed progress: 40.3 s, 11.4 tps, lat 101.837 ms stddev 6.412, 0 failed progress: 45.2 s, 11.5 tps, lat 100.112 ms stddev 6.307, 0 failed progress: 50.1 s, 11.4 tps, lat 101.227 ms stddev 6.855, 0 failed progress: 55.0 s, 11.4 tps, lat 99.705 ms stddev 4.721, 0 failed progress: 60.4 s, 11.9 tps, lat 100.051 ms stddev 6.512, 0 failed transaction type: tpcb-cockroach.sql scaling factor: 1 query mode: simple number of clients: 8 number of threads: 8 maximum number of tries: 1 duration: 60 s number of transactions actually processed: 690 number of failed transactions: 0 (0.000%) number of serialization failures: 0 (0.000%) number of deadlock failures: 0 (0.000%) latency average = 102.223 ms latency stddev = 11.448 ms average connection time = 596.884 ms tps = 11.391485 (including reconnection times) The next step is to run the same test against PolyScale Serverless and a separate connection per transaction. pgbench "host=psedge.global port=5432 user='artem' dbname='defaultdb' application_name='your_cache_id' sslmode=require" \ --no-vacuum \ --file=tpcb-cockroach.sql@1 \ --client=8 \ --jobs=8 \ -T 60 \ -P 5 \ -C \ --failures-detailed progress: 5.3 s, 7.1 tps, lat 151.924 ms stddev 43.906, 0 failed progress: 10.3 s, 9.5 tps, lat 112.534 ms stddev 9.015, 0 failed progress: 15.2 s, 10.1 tps, lat 110.195 ms stddev 6.313, 0 failed progress: 20.0 s, 9.5 tps, lat 114.546 ms stddev 10.560, 0 failed progress: 25.6 s, 7.7 tps, lat 121.107 ms stddev 15.663, 0 failed progress: 30.5 s, 8.8 tps, lat 130.285 ms stddev 38.751, 0 failed progress: 35.4 s, 9.8 tps, lat 111.316 ms stddev 7.099, 0 failed progress: 40.3 s, 9.8 tps, lat 109.574 ms stddev 10.634, 0 failed progress: 45.1 s, 9.9 tps, lat 106.072 ms stddev 4.950, 0 failed progress: 50.6 s, 8.0 tps, lat 137.115 ms stddev 52.324, 0 failed progress: 55.0 s, 9.0 tps, lat 114.435 ms stddev 10.955, 0 failed progress: 60.6 s, 10.0 tps, lat 105.880 ms stddev 6.483, 0 failed transaction type: tpcb-cockroach.sql scaling factor: 1 query mode: simple number of clients: 8 number of threads: 8 maximum number of tries: 1 duration: 60 s number of transactions actually processed: 552 number of failed transactions: 0 (0.000%) number of serialization failures: 0 (0.000%) number of deadlock failures: 0 (0.000%) latency average = 117.730 ms latency stddev = 26.544 ms average connection time = 756.460 ms tps = 9.090481 (including reconnection times) The performance is a bit worse. Let's run the same test against a local cache: pgbench "host=127.0.0.1 port=26257 user='artem' dbname='defaultdb' application_name='your_cache_id' sslmode=require" \ --no-vacuum \ --file=tpcb-cockroach.sql@1 \ --client=8 \ --jobs=8 \ -T 60 \ -P 5 \ -C \ --failures-detailed progress: 5.0 s, 8.6 tps, lat 131.663 ms stddev 38.022, 0 failed progress: 10.7 s, 9.3 tps, lat 123.430 ms stddev 19.788, 0 failed progress: 15.6 s, 10.1 tps, lat 113.609 ms stddev 11.022, 0 failed progress: 20.5 s, 9.5 tps, lat 113.968 ms stddev 13.071, 0 failed progress: 25.5 s, 10.0 tps, lat 119.194 ms stddev 13.611, 0 failed progress: 30.4 s, 10.1 tps, lat 117.146 ms stddev 11.048, 0 failed progress: 35.3 s, 9.0 tps, lat 118.848 ms stddev 17.437, 0 failed progress: 40.1 s, 9.9 tps, lat 120.464 ms stddev 18.411, 0 failed progress: 45.1 s, 9.7 tps, lat 123.710 ms stddev 25.389, 0 failed progress: 50.0 s, 8.5 tps, lat 116.464 ms stddev 15.821, 0 failed progress: 55.2 s, 7.4 tps, lat 117.977 ms stddev 13.877, 0 failed pgbench: error: connection to server at "127.0.0.1", port 26257 failed: FATAL: Upstream connection error (PolyScale) DETAIL: Failed to connect to the upstream database pgbench: error: client 4 aborted while establishing connection pgbench: error: connection to server at "127.0.0.1", port 26257 failed: FATAL: Upstream connection error (PolyScale) DETAIL: Failed to connect to the upstream database pgbench: error: client 1 aborted while establishing connection progress: 60.1 s, 7.0 tps, lat 112.718 ms stddev 10.650, 0 failed transaction type: tpcb-cockroach.sql scaling factor: 1 query mode: simple number of clients: 8 number of threads: 8 maximum number of tries: 1 duration: 60 s number of transactions actually processed: 552 number of failed transactions: 0 (0.000%) number of serialization failures: 0 (0.000%) number of deadlock failures: 0 (0.000%) latency average = 119.183 ms latency stddev = 19.422 ms average connection time = 739.329 ms tps = 9.165370 (including reconnection times) pgbench: error: Run was aborted; the above results are incomplete. We are not seeing any improvement whether we use PolyScale Serverless or local. In fact, we get errors with a couple of clients. Let's run the same test using a single connection per session. Using PgBench with CockroachDB Serverless directly and a single connection per transaction: pgbench \ --host=${PGHOST} \ --no-vacuum \ --file=tpcb-cockroach.sql@1 \ --client=8 \ --jobs=8 \ --username=${PGUSER} \ --port=${PGPORT} \ -T 60 \ -P 5 \ --failures-detailed \ ${PGDATABASE} progress: 5.0 s, 42.8 tps, lat 159.810 ms stddev 159.885, 0 failed progress: 10.0 s, 47.6 tps, lat 160.127 ms stddev 209.041, 0 failed progress: 15.0 s, 46.0 tps, lat 178.640 ms stddev 257.350, 0 failed progress: 20.0 s, 44.0 tps, lat 183.237 ms stddev 194.204, 0 failed progress: 25.0 s, 43.6 tps, lat 185.614 ms stddev 211.550, 0 failed progress: 30.0 s, 42.2 tps, lat 181.886 ms stddev 154.730, 0 failed progress: 35.0 s, 41.8 tps, lat 193.950 ms stddev 348.516, 0 failed progress: 40.0 s, 40.8 tps, lat 195.000 ms stddev 252.481, 0 failed progress: 45.0 s, 44.2 tps, lat 184.320 ms stddev 279.705, 0 failed progress: 50.0 s, 46.8 tps, lat 167.793 ms stddev 205.700, 0 failed progress: 55.0 s, 45.4 tps, lat 167.558 ms stddev 206.144, 0 failed progress: 60.0 s, 44.0 tps, lat 191.623 ms stddev 295.580, 0 failed transaction type: tpcb-cockroach.sql scaling factor: 1 query mode: simple number of clients: 8 number of threads: 8 maximum number of tries: 1 duration: 60 s number of transactions actually processed: 2654 number of failed transactions: 0 (0.000%) number of serialization failures: 0 (0.000%) number of deadlock failures: 0 (0.000%) latency average = 179.301 ms latency stddev = 237.941 ms initial connection time = 611.111 ms tps = 44.546191 (without initial connection time) The following test is using PgBbench with PolyScale Serverless and a single connection for the session: pgbench "host=psedge.global port=5432 user='artem' dbname='defaultdb' application_name='10f2f763-1cc3-482e-a3ca-abab21e67e0e' sslmode=require" \ --no-vacuum \ --file=tpcb-cockroach.sql@1 \ --client=8 \ --jobs=8 \ -T 60 \ -P 5 \ --failures-detailed progress: 5.0 s, 41.4 tps, lat 161.884 ms stddev 104.366, 0 failed progress: 10.0 s, 48.2 tps, lat 164.292 ms stddev 156.686, 0 failed progress: 15.0 s, 47.2 tps, lat 170.998 ms stddev 133.104, 0 failed progress: 20.0 s, 44.6 tps, lat 177.537 ms stddev 112.391, 0 failed progress: 25.0 s, 44.0 tps, lat 176.075 ms stddev 160.491, 0 failed progress: 30.0 s, 42.0 tps, lat 194.870 ms stddev 246.448, 0 failed progress: 35.0 s, 42.0 tps, lat 190.376 ms stddev 241.178, 0 failed progress: 40.0 s, 40.0 tps, lat 199.472 ms stddev 155.507, 0 failed progress: 45.0 s, 47.4 tps, lat 171.893 ms stddev 135.157, 0 failed progress: 50.0 s, 46.6 tps, lat 169.876 ms stddev 168.642, 0 failed progress: 55.0 s, 45.2 tps, lat 170.815 ms stddev 187.009, 0 failed progress: 60.0 s, 43.4 tps, lat 193.513 ms stddev 217.260, 0 failed transaction type: tpcb-cockroach.sql scaling factor: 1 query mode: simple number of clients: 8 number of threads: 8 maximum number of tries: 1 duration: 60 s number of transactions actually processed: 2668 number of failed transactions: 0 (0.000%) number of serialization failures: 0 (0.000%) number of deadlock failures: 0 (0.000%) latency average = 178.017 ms latency stddev = 173.356 ms initial connection time = 718.873 ms tps = 44.880455 (without initial connection time) Performance is on par with direct connection. The final test is to run PgBench against the local cache and a single connection. pgbench "host=127.0.0.1 port=26257 user='artem' dbname='defaultdb' application_name='10f2f763-1cc3-482e-a3ca-abab21e67e0e' sslmode=require" \ --no-vacuum \ --file=tpcb-cockroach.sql@1 \ --client=8 \ --jobs=8 \ -T 60 \ -P 5 \ --failures-detailed progress: 5.0 s, 40.0 tps, lat 164.834 ms stddev 122.563, 0 failed progress: 10.0 s, 48.8 tps, lat 160.760 ms stddev 170.777, 0 failed progress: 15.0 s, 47.8 tps, lat 170.314 ms stddev 140.472, 0 failed progress: 20.0 s, 45.6 tps, lat 167.467 ms stddev 230.886, 0 failed progress: 25.0 s, 44.4 tps, lat 182.303 ms stddev 216.929, 0 failed progress: 30.0 s, 42.8 tps, lat 185.109 ms stddev 192.897, 0 failed progress: 35.0 s, 42.4 tps, lat 182.918 ms stddev 234.083, 0 failed progress: 40.0 s, 41.2 tps, lat 198.508 ms stddev 331.531, 0 failed progress: 45.0 s, 40.4 tps, lat 194.331 ms stddev 279.567, 0 failed progress: 50.0 s, 40.2 tps, lat 202.052 ms stddev 228.768, 0 failed progress: 55.0 s, 38.4 tps, lat 214.685 ms stddev 320.612, 0 failed progress: 60.0 s, 37.2 tps, lat 209.055 ms stddev 257.384, 0 failed transaction type: tpcb-cockroach.sql scaling factor: 1 query mode: simple number of clients: 8 number of threads: 8 maximum number of tries: 1 duration: 60 s number of transactions actually processed: 2554 number of failed transactions: 0 (0.000%) number of serialization failures: 0 (0.000%) number of deadlock failures: 0 (0.000%) latency average = 185.856 ms latency stddev = 235.054 ms initial connection time = 788.788 ms tps = 42.963951 (without initial connection time) I am not seeing any difference in whichever method we choose. I am not convinced PolyScale is applicable in every case. You see the most bang for the buck when you pair PolyScale with read-heavy workloads where most queries are reads and the resultsets can be cached. For a read/write workload, your mileage may vary. Conclusion Overall, I think the product lives up to its promises, albeit in certain situations where we have a choice to choose a workload. Considering the wonderful experience with the engineering and a well-designed product that's very easy to use, it pairs well with CockroachDB. In those situations, I can see a value-add in having PolyScale in the picture. More

Mastering Time Series Analysis: Techniques, Models, and Strategies

By Valentine Shkulov

Batch Request Processing With API Gateway

By Bobur Umurzokov

Building A Log Analytics Solution 10 Times More Cost-Effective Than Elasticsearch

By Frank Z

Managing Data Residency: Concepts and Theory

Cloud computing has opened a Pandora's Box of many original issues compared to sound old on-premise systems. I believe that chief among them is data residency or data location: Data localization or data residency law requires data about a nation's citizens or residents to be collected, processed, and/or stored inside the country, often before being transferred internationally. Such data is usually transferred only after meeting local privacy or data protection laws, such as giving the user notice of how the information will be used and obtaining their consent. Data localization builds upon the concept of data sovereignty that regulates certain data types by the laws applicable to the data subjects or processors. While data sovereignty may require that records about a nation's citizens or residents follow its personal or financial data processing laws, data localization goes a step further in requiring that initial collection, processing, and storage first occur within the national boundaries. In some cases, data about a nation's citizens or residents must also be deleted from foreign systems before being removed from systems in the data subject's nation. - Data localization Data residency is an essential consideration for solution architects. Two critical scenarios can impact data residency requirements: Legal requirements: Certain countries have laws that require data to be stored within their territory. For example, China has strict data residency laws, which can impact the storage of data for businesses operating in the country. It is essential to be aware of such regulations and ensure compliance. Jurisdictional challenges: Laws and regulations governing data privacy and security vary between countries. This gap can create challenges for cloud providers in managing data storage and access across multiple countries, as they need to comply with the laws and regulations of each jurisdiction. For instance, the FISA and the Patriot Act in the US allow US actors to access the data of EU citizens, even though they are protected by the GDPR. As a cloud service provider responsible for upholding EU regulations, you could be liable under EU law if such access happens. Most cloud providers offer this capability; e.g., Google. However, it assumes the provider has Data Centers in the desired location. For example, Google has none in China. I want to offer a couple of options to handle this requirement in this post. Where To Compute the Location? In this section, I'll list a couple of approaches to compute the location. In the Code We can manage data residency at the code level. It's the most flexible option, but it also requires writing code and thus is the most error-prone. Specifics depend on your tech stack, but it goes like this: Get the request. Optionally query additional data. Establish where the data should go to. Write the data in the computed location. Here's what it looks like: X and Y correspond to different countries. In a Library/Framework From an architectural point-of-view, the driver approach is similar to the one above. However, the code doesn't compute the final location. The library/framework offers sharding: A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load. Some data within a database remains present in all shards,[a] but some appear only in a single shard. Each shard (or server) acts as the single source for this subset of data. - Shard (database architecture) The application knows about all countries' database URLs in this approach and needs to keep track of them. For example, the Apache ShardingSphere project provides a JVM database driver with such sharding capabilities. One can configure the driver to write data to a shard depending on a key; i.e, the location. Apache ShardingSphere is an ecosystem to transform any database into a distributed database system, and enhance it with sharding, elastic scaling, encryption features & more. The project is committed to providing a multi-source heterogeneous, enhanced database platform and further building an ecosystem around the upper layer of the platform. Database Plus, the design philosophy of Apache ShardingSphere, aims at building the standard and ecosystem on the upper layer of the heterogeneous database. It focuses on how to make full and reasonable use of the computing and storage capabilities of existing databases rather than creating a brand new database. It attaches greater importance to the collaboration between multiple databases instead of the database itself. - Apache ShardingSphere In a Proxy The proxy approach is similar to the library/framework approach above; the difference comes from the former running inside the application, while the latter is a dedicated component. The responsibility of keeping track of the databases falls now on the proxy's shoulders. Apache ShardingSphere provides both alternatives, a JDBC driver and a proxy. In the API Gateway Another approach is to compute the location in the API Gateway. Interestingly enough, part of a Gateway's regular responsibilities is to keep part of the upstreams. How To Compute the Location This section will consider what we need to compute the location. Let's examine the case of an HTTP request from a client to a system. The system as a whole, regardless of the specific component, needs to compute the location to know where to save the data. Two cases can arise: the HTTP request may carry enough information to compute where to send the data to or not. In the former case, it may be because of a client cookie, the previous request set relevant data in hidden fields on a web page, or any other approach. In the latter case, the system needs additional data. If the data is self-sufficient, it's better to forward the request to the correct country as early in the request processing chain as possible; i.e., from the API Gateway. Conversely, if the data requires enrichment, it should be done as close to the data enrichment component as possible. Imagine a situation where data location is based on a user's living place. We could set a cookie to store this information or data that allows us to compute it client-side. Every request would carry the data, and we wouldn't need additional data to compute the location. However, storing sensitive data on the client is a risk, as malicious actors could tamper with storage. To improve security, we could keep everything server-side. The system would need to request additional data to compute the location on every request. A Tentative Proposal We can reconcile the best of both worlds at the cost of additional complexity. Remember that everything is a trade-off: your mileage may vary depending on your context. The following is a good first draft that you can refine later. The first request can hit any of the two (or more) endpoints. The response adds additional metadata: which Gateway to query on the subsequent request, plus enough data so it can compute the location. On the second request, the client queries the correct Gateway. Note that for resiliency purposes, there can be two layers. Also, the request adds data necessary to compute the location. The Gateway receives the request, calculates the location, and forwards it to the app in the same location. The app receives the request and does what it's supposed to do using a sharding-friendly library. The library computes the location again. In most cases, it should yield the current location; if the data has been tampered with or the topology has changed since the initial computation, it switches to the correct location. In the latter case, we incur a performance penalty. However, our design makes it unlikely to happen. The main principle is that we should select the correct location as early as possible but leave the option to move to the right location if needed. Here's the sequence diagram of the second call, with location metadata already added: Conclusion In this post, we looked at data residency and designed a draft architecture to implement it. In the next post, we will delve into the technical details. To Go Further Foreign Intelligence Surveillance Act Patriot Act GDPR Apache ShardingSphere

By Nicolas Fränkel CORE

Best Practices To Create High Available (HA) Applications in Mule 4

In this blog, I would like to share a few best practices for creating High Available (HA) Applications in Mule 4 from an infrastructure perspective ONLY (CloudHub in this article refers to CloudHub 1.0 ONLY). Most of the configuration details (only relevant to HA) shared here are taken from MuleSoft Documentation/Articles/Blogs. 1. Horizontal Scaling Scaling horizontally means adding more servers so that the load is distributed across multiple nodes. Scaling horizontally usually requires more effort than vertical scaling, but it is easier to scale indefinitely once set up. CloudHub (CH) Add multiple CloudHub worker nodes. CloudHub (CH) provides high availability (HA) and disaster recovery against application and hardware failures. CH uses Amazon AWS for its cloud infrastructure, so availability is dependent on Amazon. The availability and deployments in CH are separated into different regions, which in turn point to the corresponding Amazon regions. If an Amazon region goes down, the applications within the region are unavailable and not automatically replicated in other regions. Deploy on Multiple Workers (Single Region) In order to achieve HA, add multiple CH workers to your Mule application to make it horizontally scale. CH automatically distributes multiple workers for the same application for maximum reliability. When deploying your application to two or more workers, the HTTP load balancing service distributes requests across these workers, allowing you to scale your services horizontally. Requests are distributed on a round-robin basis. Note: HTTP load balancing can be implemented by an internal reverse proxy server. Requests to the application (domain) URL http://appname.cloudhub.io are automatically load-balanced between all the application’s worker URLs. In case of AZ failure, all requests will be served by the Mule application deployed in other AZ in the same region. Deploy on Multiple Regions (for Disaster Recovery (DR) also) Application deployed to multiple regions provides better disaster recovery (DR) strategy along with HA. Generally, the region never goes down, but in the case of any natural calamities, the region may fail. If you have a requirement for DR, you can deploy the application to multiple regions and implement the CloudHub load balancer or any external load balancer (see below). DR Implementation by deploying our Mule application into Multiple regions and setting up a Load Balancer. Enable CloudHub HA features CloudHub's High Availability (HA) features provide scalability, workload distribution, & added reliability to your applications on CloudHub. This functionality is powered by CloudHub’s scalable load-balancing service, worker scale-out, and persistent queues features. Note: You can enable HA features on a per-application basis using the Anypoint Runtime Manager console when either deploying a new application or redeploying an existing application. Worker Scale-Out Refer to ‘Deploy on Multiple Workers (Single Region)’ above. Persistent Queues Persistent queues ensure zero message loss and let you distribute workloads across a set of workers. If your application is deployed to more than one worker, persistent queues allow communication between workers and workload distribution. How To Enable Cloudhub HA Features You can enable and disable either or both of the above features of CloudHub HA in one of two ways: When you deploy an application to CloudHub for the first time using the Runtime Manager console. By accessing the Deployment tab in the Runtime Manager console for a previously deployed application. Steps To Follow Next to Workers, select options from the drop-down menus to define the number and type of workers assigned to your application. Click an application to see the overview and click Manage Application. Click Settings and click the Persistent Queues checkbox to enable queue persistence. If your application is already deployed, you must redeploy it for your new settings to take effect. On-Premises Create multiple Standalone Mule runtime instances (nodes) & then create a Cluster out of them. When you deploy applications on-premises, you are responsible for the installation and configuration of the Mule runtime instances that run your Mule applications. Mule Enterprise Edition supports scalable clustering to provide high availability (HA) for deployed applications. A cluster is a set of Mule runtime engines that acts as a unit. In other words, a cluster is a virtual server composed of multiple nodes (Mule runtime engines). The nodes in a cluster communicate and share information through a distributed shared memory grid. This means that the data is replicated across memory in different machines. Cluster Setup in On-Premises Setup Single Data Center Multi-Node Cluster By default, clustering Mule runtime engines ensures high availability (HA). If a Mule runtime engine node becomes unavailable due to failure or planned downtime, another node in the cluster can assume the workload and continue to process existing events and messages. The following figure illustrates the processing of incoming messages by a cluster of two nodes. As shown in the following figure, the load is balanced across nodes: Node 1 processes message 1 while Node 2 simultaneously processed message 2. Processing of incoming messages by a cluster (When NO Failure occurs) If one node fails, the other available nodes pick up the work of the failing node. As shown in the following figure, if Node 2 fails, Node 1 processes both message 1 and message 2. Processing of incoming messages by a cluster (When Failure occurs) How To Add Servers/Nodes to Runtime Manager (Applicable to Deployment Options Where Control Plane Is either Anypoint Platform or On-Prem) Prerequisites: Your enterprise license is current. You are running Mule 3.6.0 or later, and API gateway 2.1 or later. If you want to download the RM Agent, you must have an Enterprise support account. If the server is already registered with another Runtime Manager instance, remove that registration first. Notes: For Mule 3.6.x, install the Runtime Manager agent. For Mule 3.7.x and later Mule 3.x versions, you can optionally update the Runtime Manager agent to the latest version to take advantage of all bug fixes and new features. For info on RM Agent installation, refer here. Steps To Follow In order to add a Mule server to the Runtime Manager so that you can manage it, you must first register it with the Runtime Manager agent. Note: Use the amc_setup script to configure the Runtime Manager agent to communicate with Runtime Manager. From Anypoint Platform, select Runtime Manager. Click Servers in the left menu. Click the Add Server button. Enter a name for your server. Notes: Server names can contain up to 60 alphanumeric characters (a-z, A-Z, 0–9), periods (.), hyphens (-), and underscores (_), but not spaces or other special characters. Runtime Manager supports Unicode characters in server names. The server name must be unique in the environment, but it can be the same for the same organization in different environments. Runtime Manager generates the amc_setup command. This command includes the server name you specified (server-name) and the registration token (token) required to register Mule in your environment. The registration token includes your organization ID and the current environment. The arrow shows the amc_setup command in the Add Server window Click Copy command to copy the amc_setup command. This button appears only if the server name you specify is valid. In a terminal window, change to the $MULE_HOME/bin directory for the Mule instance that you’re registering. Paste the command on the command line. Include any other parameters on the amc_setup command line. For more info on parameters, refer here. If your environment requires all outbound calls to go through a proxy, specify proxy settings in either the $MULE_HOME/conf/mule-agent.yml file or the $MULE_HOME/conf/wrapper.conf file. When the amc_setup the command completes successfully, you see the below messages: Mule Agent configured successfully Connecting to Access Management to extract client_id and client_secret Credentials extracted correctly, updating wrapper conf file After the script completes successfully, the name of your server appears on the Servers tab of Runtime Manager with a status of Created. Note: If the server was running when you ran the amc_setup script, restart the server to reconnect with Runtime Manager. Setup Two Data Centers (Primary-Secondary) Multi-Node Clusters (Applicable for Disaster Recovery (DR) Also) To achieve high availability (HA), you can select from two deployment topology options — a two-data centers architecture (see below) or a three-data centers architecture. You would set up your Mule application the same way in a highly available cluster configuration in both the primary and the secondary data centers. Use Active-Active Configuration ONLY This configuration provides higher availability with minimal human involvement. Requests are served from both data centers. You should configure the load balancer with appropriate timeout and retry logic to automatically route the request to the second data center if a failure occurs in the first data center environment. Benefits of Active-Active configuration are reduced recovery time objective (RTO) and recovery point objective (RPO). Two Data Centers Multi-node Clusters in Active-Active Configuration For the RPO requirement, data synchronization between the two active data centers must be extremely timely to allow seamless request flow. Use External Load Balancer When Mule clusters are used to serve TCP requests (where TCP includes SSL/TLS, UDP, Multicast, HTTP, and HTTPS), some load balancing is needed to distribute the requests among the clustered instances. There are various software load balancers available. Many hardware load balancers can also route both TCP and HTTP or HTTPS traffic. RTF (Runtime Fabric) Runtime Fabric Clusters These are clusters at the node level (worker level) using Kubernetes (K8s). They are different from Mule runtime clusters (on-prem) as they lack below clustering features: Distributed shared memory Shared state Shared VM queues In RTF clusters, Mule runtimes are not aware of each other. They are comparable to a CloudHub environment with two or more workers. Add More Workers or Controller Nodes to the Cluster This management of nodes has to be carried out in the Kubernetes side and not in MuleSoft Control Plane — Anypoint Platform Runtime Manager. There are two main ways to have nodes added to the cluster: The kubelet on a node self-registers to the control plane. Manually add a Node object using kubectl. After you create a Node object, or the kubelet on a node self-registers, the control plane checks whether the new Node object is valid. Note: For more info, refer to the documentation available on Kubernetes. Add More Pod Replicas to the Same Worker Node This management of pod replicas can be carried out in Anypoint Platform Runtime Manager. Steps To Follow Log in to Anypoint Platform using your Enterprise Mule credentials, and go to Runtime Manager. Select your application. Click tab Deployment Target. Locate Replicas dropdown. Change the number of replicas to desired number. Runtime Manager screen where no. of Pod Replicas can be changed. Select the option Run in Runtime Cluster Mode. Click Apply changes. Click View Status Status screen post-Pod Replicas creation. 2. Load Balancing Load Balancers are one or more specialized servers that intercept requests intended for a group of (backend) systems, distributing traffic among them for optimum performance. If one backend system fails, these load balancers automatically redirect incoming requests to the other systems. By distributing incoming requests across multiple systems, you enable the still-operational systems to take over when one system fails, hence achieving High Availability (HA). CloudHub (CH) Shared Load Balancer (SLB) is available within a Region & serves all the CH customers within that Region, hence you have no control over it. Instead create an Anypoint VPC first & then a Dedicated Load Balancer (DLB) within the VPC. Create a Dedicated Load Balancer (DLB) Within an Anypoint Virtual Private Cloud (VPC) Prerequisites: Ensure that your profile is authorized to perform this action by adding the CloudHub Network Administrator permission to the profile of the organization where you are creating the load balancer. Create an Anypoint Virtual Private Cloud (Anypoint VPC) in the organization where you want to create a load balancer. Create at least one certificate and private key for your certificate. There are 3 ways to create and configure a DLB for your Anypoint VPC: Using Runtime Manager Using CLI Using CloudHub REST API Steps To Follow (Using Runtime Manager) From Anypoint Platform, click Runtime Manager. Click Load Balancers > Create Load Balancer. Enter a name for your load balancer. Note: The CloudHub DLB name must be unique across all DLBs defined in Anypoint Platform (by all MuleSoft customers) Select a target Anypoint VPC from the drop-down list. Specify the amount of time the DLB waits for a response from the Mule application in the Timeout in Seconds field. The default value is 300 seconds. Add any allow-listed classless inter-domain routing (CIDR) as required. The IP addresses you specify here are the only IP addresses that can access the load balancer. The default value is 0.0.0.0/0. Select the inbound HTTP mode for the load balancer. This property specifies the behavior of the load balancer when receiving an HTTP request. Values to use for HA: On: Accepts the inbound request on the default SSL endpoint using the HTTP protocol. Redirect: Redirects the request to the same URL using the HTTPS protocol. Specify options: Enable Static IPs specifies to use static IPs, which persist when the DLB restarts. Keep URL encoding specifies that the DLB passes only the %20 and %23 characters as is. If you deselect this option, the DLB decodes the encoded part of the request URI before passing it to the CloudHub worker. Support TLS 1.0 specifies to support TLS 1.0 between the client and the DLB. Upstream TLS 1.2 specifies to force TLS 1.2 between the DLB and the upstream CloudHub worker. Forward Client Certificate specifies that the DLB forwards the client certificate to the CloudHub worker. Add a certificate: Click Add Certificate The arrow shows the Add certificate option in the Create Load Balancer page. On the Create Load Balancer | Add certificate page, select Choose File to upload both public key and private key files. If you want to add a client certificate, click Choose File to upload the file. If you want to add URL mapping rules, click the > icon to display the options. The arrow shows the expand icon in the Create Load Balancer | Add certificate page. If you add more than one URL mapping rule, order the rules in the list according to the priority in which they should be applied. Click Add New Rule, and then specify the input path, target app, output path, and protocol. Click Save Certificate. Click Create Load Balancer. On-Premises When On-Prem Mule clusters are used to serve TCP requests (where TCP includes SSL/TLS, UDP, Multicast, HTTP, and HTTPS), some external load balancing is needed to distribute the requests among the clustered instances. There are various third-party software load balancers available; two of them, which MuleSoft recommends, are: NGINX, an open-source HTTP server and reverse proxy. You can use NGINX’s HttpUpstreamModule for HTTP(S) load balancing. The Apache web server, which can also be used as an HTTP(S) load balancer. Many hardware load balancers can also route both TCP and HTTP or HTTPS traffic. I will not get into more details as it is up to your/customer's discretion to decide. Refer to the documentation of whichever load balancer you decide for configuration details. RTF (Runtime Fabric) Deploy Ingress Controllers (For Both Internal and External Calls) Runtime Fabric supports any ingress controller that is compatible with your Kubernetes environment and supports a deployment model where a separate ingress resource is created per application deployment. In general, most off-the-shelf ingress controllers support this model. The ingress controller is not part of the product offering. Customers need to choose and configure the ingress controller on RTF on self-managed (BYOK). Runtime Fabric creates a separate Ingress resource for each application deployed. The default ingress controller in any cloud (AWS, Azure or GCP) creates a separate HTTP(s) load balancer for each Ingress resource in the K8s cluster. Prerequisite(s) RTF Cluster is up and running. Steps To Follow Create the internal and external Ingress Controllers. kubectl apply -f nginx-ingress.yaml kubectl apply -f nginx-ingress-internal-v2.yaml Validate that Ingress Controllers are created with external & internal IP. kubectl get services --all-namespaces Validate the IPs created Create the Ingress Templates. kubectl apply -f ingress-template.yaml kubectl apply -f ingress-template-internal.yaml Note: For info on how to create templates, refer here. Validate that Ingress Controllers are created. kubectl get ing --all-namespaces Deploy your Mule application. In the Ingress section of the deployed application, you should be able to add endpoints for both internal and external. Ingress section of RM screen where we can add endpoints. External endpoint testing: Execute a cURL command from your own machine. Internal endpoint testing: A good way of testing is spawning a container inside the K8s cluster, accessing it and then executing a cURL command from there. Deploy External Load Balancers If you want your Mule applications to be externally accessible, you must add a external load balancer to your K8s cluster. Load balancers create a gateway for external connections to access your cluster, provided that the user knows the load balancer’s IP address and the application’s port number. When running multiple ingress controllers, you must have an external load balancer outside Runtime Fabric to front each of the ingress controllers. The external load balancer must support TCP load balancing and must be configured with a server pool containing the IP addresses of each controller. A health check must be configured on the external load balancer, listening on port 443. This configuration of the external load balancer provides the following: High availability. Protection against failure. Automatic failover (if a replica of the internal load balancer restarts or is evicted and rescheduled on another controller). Note: For info on which external load balancer (as per MuleSoft recommendations) to use, refer above ‘Load Balancing — On-Premises.’ 3. Managing Server Groups (Applicable to Hybrid Deployment Option Only — Runtime Plane Is On-Prem and Control Plane Is Anypoint Platform) A server group is a set of servers that act as a single deployment target for applications so that you don’t have to deploy applications to each server individually. Unlike clusters, application instances in a server group run in isolation from the application instances running on the other servers in the group. Deploying applications to servers in server groups provides redundancy so you can restore applications more seamlessly and quickly, with less downtime and hence High Availability (HA). Create Server Groups Prerequisites All servers in a server group must be running the same version of the Mule runtime engine and the same version of the Runtime Manager agent. You can create a server group with servers with the Running, Created, or Disconnected status, but you cannot have a server group with mixed servers. In order to add a server on which existing applications are currently running to a server group, you must first stop and delete the applications from the server. Add Servers to Runtime Manager (RM) Refer to ‘How to add servers/nodes to Runtime Manager’ above. Create Server Groups To deploy applications to a server group, you can either add the servers to Runtime Manager first and then create the server group, or you can create the group and add servers to it later. Steps To Follow From Anypoint Platform, select Runtime Manager. Select Servers in the left menu. Click the Create Group button. In the Create Server Group page, enter the name of the server group. Group names can contain between 3–40 alphanumeric characters (a-z, A-Z, 0–9) and hyphens (-). They cannot start or end with a hyphen and cannot contain spaces or other characters. Select the servers to include in your new server group. Click the Create Group button. Note: The new server group appears in the Servers list. The servers no longer appear in the Servers list. To see the list of servers in the group, click the server group name Add Servers to a Server Group Prerequisites At least one server is configured. The server group is created. No applications are deployed on the server that you are adding to the server group. Steps To Follow From Anypoint Platform, select Runtime Manager. Select Servers in the left menu. Click Group in the Type column to display the details pane. Click the Add Server button: The arrow shows the Add Server button in the details pane A list of available servers appears. Select the servers to add to the group, and click the Add Servers button. Note: The servers no longer appear in the Servers list. To see the list of servers in the server group, click the group name. 4. Managing Clusters (Applicable to On-Premises Deployment Model Only) A cluster is a set of up to eight servers that act as a single deployment target and high-availability processing unit. Unlike in server groups, application instances in a cluster are aware of each other, share common information, and synchronize statuses. If one server fails, another server takes over processing applications. A cluster can run multiple applications. Note: Before creating a cluster, you must create the Mule runtime engine instances and add the Mule servers to Anypoint Runtime Manager. Features That Help in Achieving HA Clusters Management To deploy applications to a cluster, you can either add the servers to Runtime Manager first and then create the cluster, or you can create the cluster and add servers to it later. There are two ways to create and manage clusters: Using Runtime Manager Manually, using a configuration file Prerequisites/Restrictions Do not mix cluster management tools All nodes in a cluster must have the same Mule runtime engine and Runtime Manager agent version Manual cluster configuration is not synced to Anypoint Runtime Manager, so any change you make in the platform overrides the cluster configuration files. To avoid this scenario, use only one method for creating and managing your clusters: either manual configuration or configuration using Anypoint Runtime Manager. If you are using a cumulative patch release, such as 4.3.0–20210322, all instances of Mule must be the same cumulative patch version How To Create Clusters Manually Ensure that the node is not running; that is, the Mule Runtime Server is stopped. Create a file named mule-cluster.properties inside the node’s $MULE_HOME/.mule directory. Edit the file with parameter = value pairs, one per line (See below). Note: mule.clusterId and mule.clusterNodeId must be in the properties file. Properties files ... mule.cluster.nodes=192.168.10.21,192.168.10.22,192.168.10.23 mule.cluster.multicastenabled=false mule.clusterId=<Cluster_ID> mule.clusterNodeId=<Cluster_Node_ID> ... Repeat this procedure for all Mule servers that you want to be in the cluster. Start the Mule servers in the nodes. How To Manage Clusters Manually Manual management of a cluster is only possible for clusters that have been manually created. Stop the node’s Mule server. Edit the node’s mule-cluster.properties as desired, then save the file. Restart the node’s Mule server. How To Create Clusters in Runtime Manager To deploy applications to a cluster, you can either add the servers to Runtime Manager first and then create the cluster, or you can create the cluster and add servers to it later. Prerequisites Servers cannot contain any previously deployed applications. Servers cannot belong to another cluster or server group. Multicast servers can be in the Running or Disconnected state. Unicast servers must be in the Running state. All servers in a cluster must be running the same Mule runtime engine version (including monthly update) and Runtime Manager agent version. Steps To Follow From Anypoint Platform, select Runtime Manager. Select Servers in the left menu. Click the Create Cluster button: The arrow shows the Create Cluster button on the Servers page In the Create Cluster page, enter the name for the cluster. Note: Cluster names can contain between 3–40 alphanumeric characters (a-z, A-Z, 0–9) and hyphens (-). They cannot start or end with a hyphen and cannot contain spaces or other characters. Select Unicast or Multicast. Select the servers to include in your new cluster. Click the Create Cluster button. Note: The new cluster appears in the Servers list. The servers no longer appear in the Servers list. To see the list of servers in the cluster, click the cluster name. Add Servers to Runtime Manager Refer to ‘How to add servers/nodes to Runtime Manager’ above. Multicast Clusters Enable Multicast for your clusters to achieve better HA. A multicast cluster comprises servers that automatically detect each other. Servers that are part of a multicast cluster must be on the same network segment. Advantages The server status doesn’t need to be Running to configure it as a node in a cluster. You can add nodes to the cluster dynamically without restarting the cluster. Ports/IP address config prerequisites If you configure your cluster via Runtime Manager & you use the default ports, then keep TCP ports 5701, 5702, and 5703 open. If you configure custom ports instead, then keep the custom ports open. Ensure communication between nodes is open through port 5701 . Keep UDP port 54327 open. Enable the multicast IP address: 224.2.2.3 . Data Grid (In-Memory) An in-memory data grid (IMDG) is a set of networked/clustered nodes that pool together their random access memory (RAM) to let applications share data with other applications running in the cluster. Mule clusters have a shared memory based in Hazelcast IMDG. There are different ways in which Mule share information between nodes, thereby achieving HA: Notes: Some transports share information to ensure a single node is polling for resources. For example, in a cluster there is a single FTP inbound endpoint polling for files. VM queues are shared between the nodes, so, if you put a message in a VM queue, potentially any node can grab it and continue processing the message. Object Store is shared between the cluster nodes. For more info, refer here. Data Sharing/Data Replication Data Sharing is the ability to distribute the same sets of data resources with multiple applications while maintaining data fidelity across all entities consuming the data. Data Replication is a method of copying data to ensure that all information stays identical in real-time between all data resources. The nodes in a cluster communicate and share information through a distributed shared memory grid IMDG (see below). This means that the data is replicated across memory in different machines. Data Sharing/Data Replication via IMDG Fault Tolerance (FT) Fault Tolerance means ensuring recovery from failure of an underlying component. By default, clustering Mule runtime engines ensures high availability (HA). If a Mule runtime engine node becomes unavailable due to failure or planned downtime, another node in the cluster can assume the workload and continue to process existing events and messages. Note: For more info, refer to ‘Setup Single Data Center Multi-node Cluster’ above under ‘Horizontal Scaling#On-Premises’ Also, JMS can be used to achieve HA & FT by routing messages through JMS Queues. In this case, each message is routed through a JMS queue whenever it moves from component to component. Active-Active Configuration Refer to ‘Use Active-Active Configuration ONLY’ above under ‘Horizontal Scaling#On-Premises’ Quorum Management Quorum feature is only valid for components that use Object Store. When managing a manually configured cluster, you can now set a minimum quorum of machines required for the cluster to be operational. When partitioning a network, clusters are available by default. However, by setting a minimum quorum size, you can configure your cluster to reject updates that do not pass a minimum threshold. This helps you achieve better consistency and protects your cluster in case of an unexpected loss of one of your nodes, hence HA. Under normal circumstances, if a node were to die in the cluster, you may still have enough memory available to store your data, but the number of threads available to process requests would be reduced as you now would have fewer nodes available, and the partition threads in the cluster could quickly become overwhelmed. This could lead to: Clients left without threads to process their requests. The remaining members of the cluster become so overwhelmed with requests that they’re unable to respond and are forced out of the cluster on the assumption that they are dead. To protect the rest of the cluster in the event of member loss, you can set a minimum quorum size to stop concurrent updates to your nodes and throw a QuorumException whenever the number of active nodes in the cluster is below your configured value. Note: To enable quorum, place in the cluster configurations file {MULE_HOME}/.mule/mule-cluster.properties the mule.cluster.quorumsize property, and then define the minimum number of nodes of the cluster to remain in an operational state. Ensure you catch QuorumExceptions when configuring a Quorum Size for your cluster, and then make any decision as to send an email, stop a process, perform some logging, activate retry strategies, etc. Primary Node In an active-active model, there is no primary node. In the Active-Active model, one of the nodes acts as the primary polling node. This means that sources can be configured to only be used by the primary polling node so that no other node reads messages from that source. This feature works differently depending on the source type: Scheduler source: only runs in the primary polling node. Any other source: defined by the primaryNodeOnly attribute. Check each connector’s documentation to know which is the default value for primaryNodeOnly in that connector. XML <flow name="jmsListener"> <jms:listener config-ref="config" destination="listen-queue" primaryNodeOnly="true"/> <logger message="#[payload]"/> </flow> Object Store Persistence You can persistently store JDBC data in a central system that is accessible by all cluster nodes when using Mule runtime engine on-premises. The following relational database systems are supported: MySQL 5.5+ PostgreSQL 9 Microsoft SQL Server 2014 To enable object store persistence, create a database and define its configuration values in the {MULE_HOME}/.mule/mule-cluster.properties file: mule.cluster.jdbcstoreurl: JDBC URL for connection to the database mule.cluster.jdbcstoreusername: Database username mule.cluster.jdbcstorepassword: Database user password mule.cluster.jdbcstoredriver: JDBC Driver class name mule.cluster.jdbcstorequerystrategy: SQL dialect Two tables are created per object store: One table stores data Another table stores partitions Recommendations You create a dedicated database/schema that will only be used for JDBC store. The database username needs to have permission to: Create objects in the database (DDL), CREATE and DROP for tables. Access and manage the objects it creates (DML), INSERT, UPDATE, DELETE, and SELECT. Always keep in mind that the data storage needs to be hosted in a centralized DB reachable from all nodes. Don’t use more than one database per cluster. Some relational databases have certain constraints regarding the name length of tables. Use the mule.cluster.jdbcstoretableNametransformerstrategy property to transform long table names into shorter values. The persistent object store uses a database connection pool based on the ComboPooledDataSource Java class. The Mule runtime engine does not set any explicit values for the connection pool behavior. The standard configuration uses the default value for each property. For example, the default value for maxIdleTimeis 0, which means that idle connections never expire and are not removed from the pool. Idle connections remain connected to the database in an idle state. You can configure the connection pool behavior by passing your desired parameter values to the runtime, using either of the following options: Pass multiple parameters in the command line when starting Mule: $MULE_HOME/bin/mule start \ -M-Dc3p0.maxIdleTime=<value> \ -M Dc3p0.maxIdleTimeExcessConnections=<value> Replace <value> with your desired value in milliseconds. Add multiple lines to the $MULE_HOME/conf/wrapper.conf file: Properties files wrapper.java.additional.<n>=-Dc3p0.maxIdleTime=<value> wrapper.java.additional.<n>=-Dc3p0.maxIdleTimeExcessConnections=<value> Replace <n> with the next highest sequential value from the wrapper.conf file. Conclusion This is an effort to collate good practices to achieve High Availability (HA) in one place. Mule developers keen on building highly available applications can refer to the exhaustive list above. Thank you for reading!! Hope you find this article helpful in whatever possible. Please don’t forget to like, share & feel free to share your thoughts in the comments section. If interested, please go through my previous blogs on Best Practices To build high-performant Mule applications here. To build highly reliable Mule applications here.

By PRAVEEN SUNDAR

Building Resilient Systems With Chaos Engineering

In today’s digital age, the reliability and availability of software systems are critical to the success of businesses. Downtime or performance issues can have serious consequences, including financial loss and reputational damage. Therefore, it is essential for organizations to ensure that their systems are resilient and can withstand unexpected failures or disruptions. One approach to achieving this is through chaos engineering. What Is Chaos Engineering? Chaos engineering is a practice that involves intentionally introducing failures or disruptions to a system to test its resilience and identify weaknesses. By simulating real-world scenarios, chaos engineering helps organizations proactively identify and address potential issues before they occur in production. This approach can help organizations build more resilient systems, reduce downtime, and improve overall performance. Steps Involved In Chaos Engineering The chaos engineering process involves several steps. First, teams must identify the critical components of the system and the potential failure modes that could impact these components. Next, they must design and execute experiments to simulate these failure modes and measure the impact on the system. Finally, teams must analyze the results of the experiments and use the insights gained to improve the system’s resilience. Benefits of Chaos Engineering One of the key benefits of chaos engineering is that it helps organizations identify and address potential issues before they occur in production. By intentionally introducing failures to a system, teams can identify weaknesses and areas for improvement. For example, if an experiment reveals that the system is not resilient to a particular type of failure, the team can take steps to address this weakness and improve overall system resilience. Another benefit of chaos engineering is that it can help organizations reduce downtime and improve system availability. By identifying and addressing potential issues proactively, teams can prevent unexpected failures and disruptions that could impact system availability. This can help organizations maintain business continuity and avoid financial loss or reputational damage. How Chaos Engineering Helps Organizations Chaos engineering can also help organizations improve their overall system performance. By testing the system’s resilience under different conditions, teams can identify bottlenecks or performance issues that may impact system performance. This can help organizations optimize their systems and improve overall performance. To implement chaos engineering, organizations must adopt a culture of experimentation and embrace failure as a learning opportunity. This requires a shift in mindset from one that views failure as a negative outcome to one that recognizes failure as a natural part of the learning process. By embracing failure and learning from it, teams can continuously improve their systems and build more resilient and reliable software. Conclusion In conclusion, chaos engineering is a powerful practice that can help organizations build more resilient systems, reduce downtime, and improve overall performance. By intentionally introducing failures and disruptions to a system, teams can identify weaknesses and areas for improvement, and proactively address potential issues before they occur in production. To implement chaos engineering, organizations must adopt a culture of experimentation and embrace failure as a learning opportunity. With a commitment to chaos engineering, organizations can build more resilient and reliable software systems that can withstand unexpected failures and disruptions.

By Charles Ituah

Network Virtualization

Network virtualization has been one of the most significant advancements in the field of networking in recent years. It is a technique that allows the creation of multiple virtual networks, each with its own set of policies, services, and security mechanisms, on top of a single physical network infrastructure. Network virtualization helps to optimize network resources, reduce operational costs, and increase flexibility and agility in network deployment and management. In this article, we will delve deeper into the concept of network virtualization, its benefits, and the various technologies and protocols used in its implementation. What Is Network Virtualization? Network virtualization is the process of decoupling the network’s logical functions from its physical infrastructure to create multiple virtual networks on a shared physical network. The idea is to allow multiple tenants or applications to share the same physical infrastructure while maintaining their own isolated logical networks with dedicated resources and policies. This enables the creation of a highly efficient and flexible network that can meet the needs of different users and applications. Virtualization has been widely used in the IT industry for many years, primarily in the server and storage domains. Network virtualization extends the same concept to the networking domain, allowing multiple logical networks to be created on top of a single physical network infrastructure. It provides a layer of abstraction that separates the logical network from the physical infrastructure, enabling the logical network to be configured and managed independently of the physical network. Benefits of Network Virtualization Network virtualization has several benefits, including: Resource Optimization Network virtualization enables the efficient use of network resources by allowing multiple logical networks to share the same physical infrastructure. This reduces the need for dedicated physical networks for each application or user, leading to lower costs and better utilization of resources. Improved Agility and Flexibility Network virtualization makes it easier to create, manage, and modify logical networks as per the changing needs of the users and applications. This enables network administrators to respond quickly to changing business requirements and deploy new applications and services more rapidly. Better Security Network virtualization provides better security by creating isolated logical networks that can be secured independently. This reduces the risk of security breaches spreading across the entire network and enhances overall network security. Simplified Network Management Network virtualization simplifies network management by enabling the central management of multiple logical networks. This reduces the need for complex manual configuration and ensures consistency across the network. Improved Network Scalability and Flexibility Network virtualization allows organizations to easily scale their network resources up or down to meet changing demands. Virtual networks can be created and configured quickly and easily without the need for additional physical network devices. Reduced Network Complexity Network virtualization simplifies network design and management, reducing the complexity of the underlying physical network infrastructure. This makes it easier to manage and troubleshoot network issues. Cost Savings Network virtualization reduces the need for additional physical network devices, leading to lower capital and operational costs. Approaches to Network Virtualization There are different approaches to implementing network virtualization, including: Overlay Networks Overlay networks create virtual networks on top of the existing physical network infrastructure, using tunneling protocols such as VXLAN or GRE to encapsulate virtual network traffic within the physical network. Overlay networks provide a simple and scalable approach to network virtualization, enabling the creation of virtual networks without the need for additional physical network devices. However, they can introduce additional network latency and may require additional network bandwidth. VLANs VLANs provide a simple approach to network segmentation, allowing different virtual networks to be created on a single physical network infrastructure. VLANs use tagging to identify virtual network traffic, enabling network administrators to isolate and secure virtual network traffic. However, VLANs have limitations in terms of scalability and flexibility, as they are limited to a maximum of 4096 VLANs per network. Software-Defined Networking (SDN) SDN separates the network control plane from the data plane, enabling network administrators to manage network resources centrally using software-defined controllers. SDN provides a flexible and scalable approach to network virtualization, enabling the creation of virtual networks that can be configured and managed centrally. SDN also enables automated network configuration and management, reducing the complexity of network management and improving network scalability and flexibility. Network Functions Virtualization (NFV) NFV enables the creation of network services as virtualized network functions (VNFs), running on generic hardware rather than proprietary network devices. NFV provides a flexible and scalable approach to network services, enabling organizations to Applications of Network Virtualization Network virtualization has several applications in different areas, including: Data Centers Network virtualization is widely used in data centers to create virtual networks for different applications, departments, and tenants. This enables the efficient utilization of the network resources, as well as improved security and isolation between different applications and tenants. Cloud Computing Network virtualization is an essential component of cloud computing, as it enables the creation of virtual networks that can be used to connect different cloud services, applications, and users. This allows for the efficient sharing of resources, as well as improved security and isolation between different cloud tenants. Internet Service Providers Network virtualization is also used by internet service providers to create virtual networks for different customers, departments, and applications. This enables the efficient utilization of network resources, as well as improved security and isolation between different customers and applications. Telecommunications Network virtualization is also used in the telecommunications industry to create virtual networks for different services and applications, such as voice, data, and video. This enables the efficient sharing of network resources, as well as improved security and isolation between different services and applications. Challenges of Network Virtualization While network virtualization offers several benefits, it also presents some challenges, including: Complexity Network virtualization can be complex to set up and manage, requiring specialized skills and knowledge. This can increase the cost and complexity of network operations. Performance Network virtualization can impact network performance, especially in terms of latency and throughput. This can affect the user experience and application performance. Compatibility Network virtualization may not be compatible with all network hardware and software, requiring specialized hardware and software that support network virtualization. Conclusion Network virtualization is a powerful technology that enables the creation of multiple virtual networks on top of a shared physical infrastructure. In the world of computer networks, virtualization has emerged as a revolutionary concept that has transformed the way we manage and utilize our network resources. Network virtualization allows us to create multiple virtual networks on a single physical network infrastructure, thereby increasing efficiency, flexibility, and scalability.

By Aditya Bhuyan

Boosting Application Performance With MicroStream and Redis Integration

In today's fast-paced digital world, application performance has become critical in delivering a seamless user experience. Users expect applications to be lightning-fast and responsive, no matter the complexity of the task at hand. To meet these expectations, developers constantly look for ways to improve their application's performance. One solution that has gained popularity in recent years is the integration of MicroStream and Redis. By combining these two cutting-edge technologies, developers can create ultrafast applications that deliver better results. In this post, we will explore the benefits of this integration and how developers can get started with this powerful combination. MicroStream is a high-performance, in-memory persistence engine designed to improve application performance. MicroStream can store data in memory without needing a mapper or conversion process. It means that developers can work with objects directly without worrying about the mapping process, saving around 90% of the computer power that would have been consumed in the mapping process. One of the critical advantages of MicroStream is its speed. By storing data in memory, MicroStream allows faster read and write operations, resulting in improved application performance. MicroStream's data structure is optimized for in-memory storage, enhancing its speed and efficiency. It makes it an ideal solution for applications that require fast response times and high throughput. Another advantage of MicroStream is its simplicity. With MicroStream, developers can work with objects directly without dealing with the complexities of SQL databases or other traditional persistence solutions. It makes development faster and more efficient, allowing developers to focus on creating great applications instead of struggling with complex data management. MicroStream's speed, simplicity, and efficiency make it an ideal solution for modern application development. By eliminating the need for a mapper or conversion process, MicroStream saves valuable computer power and resources, resulting in significant cost savings for developers. And with its optimized data structure and in-memory storage capabilities, MicroStream delivers fast and reliable performance, making it a powerful tool for building high-performance applications. We have enough of the theory: let's move to the next session, where we can finally see both databases working together to impact my application. Database Integratrion In an upcoming article, we will explore the integration of MicroStream and Redis by creating a simple project using Jakarta EE. With the new Jakarta persistence specifications for data and NoSQL, it is now possible to combine the strengths of MicroStream and Redis in a single project. Our project will demonstrate how to use MicroStream as the persistence engine for our data and Redis as a cache for frequently accessed data. Combining these two technologies can create an ultrafast and scalable application that delivers better results. We will walk through setting up our project, configuring MicroStream and Redis, and integrating them with Jakarta EE. We will also provide tips and best practices for working with these technologies and demonstrate how they can be used to create powerful and efficient applications. Overall, this project will serve as a practical example of using MicroStream and Redis together and combining them with Jakarta EE to create high-performance applications. Whether you are a seasoned developer or just starting, this project will provide valuable insights and knowledge for working with these cutting-edge technologies. The project is a Maven project where the first step is to put the dependencies besides the CDI and MicroStream: XML <code> <dependency> <groupId>expert.os.integration</groupId> <artifactId>microstream-jakarta-data</artifactId> <version>${microstream.data.version}</version> </dependency> <dependency> <groupId>one.microstream</groupId> <artifactId>microstream-afs-redis</artifactId> <version>${microstream.version}</version> </dependency> </code> The next step is creating both entity and repository; in our scenario, we'll create a Book entity with Library as a repository collection. Java @Entity public class Book { @Id private String isbn; @Column("title") private String title; @Column("year") private int year; } @Repository public interface Library extends CrudRepository<Book, String> { List<Book> findByTitle(String title); } The final step before running is to create the Redis configuration where we'll overwrite the default StorageManager to use the Redis integration, highlighting MicroStream can integrate with several databases such as MongoDB, Hazelcast, SQL, etc. Java @Alternative @Priority(Interceptor.Priority.APPLICATION) @ApplicationScoped class RedisSupplier implements Supplier<StorageManager> { private static final String REDIS_PARAMS = "microstream.redis"; @Override @Produces @ApplicationScoped public StorageManager get() { Config config = ConfigProvider.getConfig(); String redis = config.getValue(REDIS_PARAMS, String.class); BlobStoreFileSystem fileSystem = BlobStoreFileSystem.New( RedisConnector.Caching(redis) ); return EmbeddedStorage.start(fileSystem.ensureDirectoryPath("microstream_storage")); } public void close(@Disposes StorageManager manager) { manager.close(); } } Done, we're ready to go! For this sample, we'll use a simple Java SE; however, you can do it with MicroProfile and Jakarta EE with microservices. Java try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { Book book = new Book("123", "Effective Java", 2002); Book book2 = new Book("1234", "Effective Java", 2019); Book book3 = new Book("1235", "Effective Java", 2022); Library library = container.select(Library.class).get(); library.saveAll(List.of(book, book2, book3)); List<Book> books = library.findByTitle(book.getTitle()); System.out.println("The books: " + books); System.out.println("The size: " + books.size()); } Conclusion In conclusion, MicroStream integration with multiple databases is a promising approach to designing high-performance data management systems. This project explores various integration techniques to connect microstream with databases such as MySQL, MongoDB, Oracle, and PostgreSQL. The system will be designed and implemented using a combination of programming languages such as Java, Python, and JavaScript. The project will also provide documentation, training materials, and benchmark tests to ensure the system meets the specified requirements and delivers user value. By leveraging the power of MicroStream technology and integrating it with different databases, organizations can build robust, scalable, and efficient data management systems that can handle large amounts of data and complex data structures. This approach can give organizations a competitive edge by enabling them to process data faster, make better-informed decisions, and enhance operational efficiency. Overall, MicroStream integration with multiple databases is a promising approach that can benefit organizations in various industries. With the right design, implementation, and testing, organizations can leverage this approach to build data management systems that meet their unique business needs and drive success. Reference: Source code

By Otavio Santana CORE

Front-End: Cache Strategies You Should Know

Caches are very useful software components that all engineers must know. It is a transversal component that applies to all the tech areas and architecture layers such as operating systems, data platforms, backend, frontend, and other components. In this article, we are going to describe what is a cache and explain specific use cases focusing on the frontend and client side. What Is a Cache? A cache can be defined in a basic way as an intermediate memory between the data consumer and the data producer that stores and provides the data that will be accessed many times by the same/different consumers. It is a transparent layer for the data consumer in terms of user usability except to improve performance. Usually, the reusability of data provided by the data producer is the key to taking advantage of the benefits of a cache. Performance is the other reason to use a cache system such as in-memory databases to provide a high-performance solution with low latency, high throughput, and concurrency. For example, how many people query the weather on a daily basis and how many times do they repeat the same query? Let's suppose that there are 1,000 people in New York consulting the weather and 50% repeat the same query twice per day. In this scenario, if we can store the first query as close as possible to the user's device, we achieve two benefits increase the user experience because the data is provided faster and reduce the number of queries to the data producer/server side. The output is a better user experience and a solution that will support more concurrent users using the platform. At a high level, there are two caching strategies that we can apply in a complementary way: Client/Consumer Side: The data cached is stored on the consumer or user side, usually in the browser's memory when we are talking about web solutions (also called private cache). Server/Producer Side: The data cached is stored in the components of the data producer architecture. Caches like any other solution have a series of advantages that we are going to summarize: Application performance: Provide faster response times because can serve data more quickly. Reduce load on the server side: When we apply caches to the previous system and reuse a piece of data, we are avoiding queries/requests to the following layer. Scalability and cost improvement: As data caching gets closer to the consumer, we increase the scalability and performance of the solution at a lower cost. Components closer to the client side are more scalable and cheaper because three main reasons: These components are focused on performance and availability but have poor consistency. They have only part of the information: the data used more by the users. In the case of the browser's local cache, there is no cost for the data producer. The big challenges of cache are data consistency and data freshness, which means how the data is synchronized and up-to-date across the organization. Depending on the use case, we will have more or fewer requirements restrictions because it is so different from caching images than the inventory stock or sales behavior. Client-Side Caches Speaking about the client-side cache, we can have different types of cache that we are going to analyze a little bit in this article: HTTP Caching: This caching type is an intermediate cache system, as it depends partially on the server. Cache API: This is a browser API(s) that allows us to cache requests in the browser. Custom Local Cache: The front-end app controls the cache storage, expiration, invalidation, and update. HTTP Caching It caches the HTTP requests for any resource (CSS, HTML, images, video, etc.) in the browsers, and it manages all related to storage, expiration, validation, fetch, etc., from the front end. The application’s point of view is almost transparent as it makes a request in a regular way and the browser does all the “magic." The way of controlling the caching is by using HTTP Headers, in the server side, it adds cache-specific headers to the HTTP response, for example: "Expires: Tue, 30 Jul 2023 05:30:22 GMT," then the browser knows this resource can be cached, and the next time the client (application) requests the same resource if the request time is before the expiration date the request will not be done, the browser will return the local copy of the resource. It allows you to set the way the responses are disguised, as the same URL can generate different responses (and their cache should be handled in a different way). For example, in an API endpoint that returns some data (i.e., http://example.com/my-data) we could use the request header Content-type to specify if we want the response in JSON or CSV, etc. Therefore, the cache should be stored with the response depending on the request header(s). For that, the server should set the response header Vary: Accept-Language to let the browser know the cache depends on that value. There are a lot of different headers to control the cache flow and behavior, but it is not the goal of this article to go deep into it. It will probably be addressed in another article. As we mentioned before, this caching type needs the server to set the resources expiration, validation, etc. So this is not a pure frontend caching method or type, but it’s one of the simplest ways to cache the resources the front-end application uses, and it is complementary to the other way we will mention down below. Related to this cache type, as it is an intermediate cache, we can even delegate it in a “piece” between the client and the server; for example, a CDN, a reverse proxy (for example Varnish), etc. Cache API It is quite similar to the HTTP caching method, but in this case, we control which requests are stored or extracted from the cache. We have to manage the cache expiration (and it’s not easy, because those caches were thought to live “forever”). Even if these APIs are available in the windowed contexts are very oriented to their usage in a worker context. This cache is very oriented to use for offline applications. On the first request, we can get and cache all the resources need it (images, CSS, JS, etc.), allowing the application to work offline. It is very useful in mobile applications, for example with the use of maps for our GPS systems in addition to weather data. This allows us to have all the information for our hiking route even if we have no connection to the server. One example of how it works in a windowed context: const url = ‘https://catfact.ninja/breeds’ caches.open('v1').then((cache) => { cache.match((url).then((res) => { if (res) { console.log('it is in cache') console.log(res.json()) } else { console.log('it is NOT in cache') fetch(url) .then(res => { cache.put('test', res.clone()) }) } }) }) Custom Local Cache In some cases, we will need more control over the cached data and the invalidation (not just expiration). Cache invalidation is more than just checking the max-age of a cache entry. Imagine the weather app we mentioned above. This app allows the users to update the weather to reflect the real weather in a place. The app needs to do a request per city and transform the temperature values from F to ºC (this is a simple example: calculations can be more expensive in other use cases). To avoid doing requests to the server (even if it’s cached), we can do all the requests the first time, put all the data together in a data structure convenient for us, and store it in, for example in the browser’s IndexedDB, in the LocalStorage, SessionStorage or even in memory (not recommended). The next time we want to show the data, we can get it from the cache, not just the resource data (even the computation we did), saving network and computation time. We can control the expiration of the caches by adding the issue time next to the API, and we can also control the cache invalidation. Imagine now that the user adds a new cat in its browser. We can just invalidate the cache and do the requests and calculations next time, or go further, updating our local cache with the new data. Or, another user can change the value, and the server will send an event to notify the change to all clients. For example, using WebSockets, our front-end application can hear these events and invalidate the cache or just update the cache. This kind of cache requires work on our side to check the caches and handle events that can invalidate or update it, etc., but fits very well in a hexagonal architecture where the data is consumed from the API using a port adaptor (repository) that can hear domain events to react to the changes and invalidate or update some caches. This is not a cache generic solution. We need to think if it fits our use case as it requires work on the front-end application side to invalidate the caches or to emit and handle data change events. In most cases, the HTTP caching is enough. Conclusion Having a cache solution and good strategy should be a must in any software architecture, but our solution will be incomplete and probably not optimized. Caches are our best friends mostly in high-performance scenarios. It seems that the technical invalidation cache process is the challenge, but the biggest challenge is to understand the business scenarios and uses cases to identify what are the requirements in terms of data freshness and consistency that allow us to design and choose the best strategy. We will talk about other cache approaches for databases, backend, and in-memory databases in the next articles.

By Miguel Garcia

Load Balancer High Availability With CockroachDB and HAProxy

For reference: Checkout my previous article where I discuss connection pool high availability, "Connection Pool High Availability With CockroachDB and PgCat." Motivation The load balancer is a core piece of architecture for CockroachDB. Given its importance, I'd like to discuss the methods to overcome the SPOF scenarios. High-Level Steps Start CockroachDB and HAProxy in Docker Run a workload Demonstrate fault tolerance Conclusion Step-By-Step Instructions Start CockroachDB and HAProxy in Docker I have a Docker Compose environment with all of the necessary services here. Primarily, we are adding a second instance of HAProxy and overriding the ports not to overlap with the existing load balancer in the base Docker Compose file. I am in the middle of refactoring my repo to remove redundancy and decided to split up my Compose files into a base docker-compose.yml and any additional services into their own YAML files. lb2: container_name: lb2 hostname: lb2 build: haproxy ports: - "26001:26000" - "8082:8080" - "8083:8081" depends_on: - roach-0 - roach-1 - roach-2 To follow along, you must start the Compose environment with the command: docker compose -f docker-compose.yml -f docker-compose-lb-high-availability.yml up -d --build You will see the following list of services: ✔ Network cockroach-docker_default Created 0.0s ✔ Container client2 Started 0.4s ✔ Container roach-1 Started 0.7s ✔ Container roach-0 Started 0.6s ✔ Container roach-2 Started 0.5s ✔ Container client Started 0.6s ✔ Container init Started 0.9s ✔ Container lb2 Started 1.1s ✔ Container lb Started The diagram below depicts the entire cluster topology: Run a Workload At this point, we can connect to one of the clients and initialize the workload. I am using tpcc as it's a good workload to demonstrate write and read traffic. cockroach workload fixtures import tpcc --warehouses=10 'postgresql://root@lb:26000/tpcc?sslmode=disable' Then we can start the workload from both client containers. Load Balancer 1: cockroach workload run tpcc --duration=120m --concurrency=3 --max-rate=1000 --tolerate-errors --warehouses=10 --conns 30 --ramp=1m --workers=100 'postgresql://root@lb:26000/tpcc?sslmode=disable' Load Balancer 2: cockroach workload run tpcc --duration=120m --concurrency=3 --max-rate=1000 --tolerate-errors --warehouses=10 --conns 30 --ramp=1m --workers=100 'postgresql://root@lb2:26000/tpcc?sslmode=disable' You will see output similar to this. 488.0s 0 1.0 2.1 44.0 44.0 44.0 44.0 newOrder 488.0s 0 0.0 0.2 0.0 0.0 0.0 0.0 orderStatus 488.0s 0 2.0 2.1 11.0 16.8 16.8 16.8 payment 488.0s 0 0.0 0.2 0.0 0.0 0.0 0.0 stockLevel 489.0s 0 0.0 0.2 0.0 0.0 0.0 0.0 delivery 489.0s 0 2.0 2.1 15.2 17.8 17.8 17.8 newOrder 489.0s 0 1.0 0.2 5.8 5.8 5.8 5.8 orderStatus The logs for each instance of HAProxy will show something like this: 192.168.160.1:60584 [27/Apr/2023:14:51:39.927] stats stats/<STATS> 0/0/0 28724 LR 2/2/0/0/0 0/0 192.168.160.1:60584 [27/Apr/2023:14:51:39.927] stats stats/<STATS> 0/0/816 28846 LR 2/2/0/0/0 0/0 192.168.160.1:60584 [27/Apr/2023:14:51:40.744] stats stats/<STATS> 0/0/553 28900 LR 2/2/0/0/0 0/0 192.168.160.1:60584 [27/Apr/2023:14:51:41.297] stats stats/<STATS> 0/0/1545 28898 LR 2/2/0/0/0 0/0 192.168.160.1:60582 [27/Apr/2023:14:51:39.927] stats stats/<NOSRV> -1/-1/61858 0 CR 2/2/0/0/0 0/0 HAProxy exposes a web UI on port 8081. Since we have two instances of HAProxy, I exposed the second instance at port 8083. Demonstrate Fault Tolerance We can now start terminating the HAProxy instances to demonstrate failure tolerance. Let's start with instance 1. docker kill lb lb The workload will start producing error messages. 7 17:41:18.758669 357 workload/pgx_helpers.go:79 [-] 60 + RETURNING d_tax, d_next_o_id] W230427 17:41:18.758737 357 workload/pgx_helpers.go:123 [-] 61 error preparing statement. name=new-order-1 sql= W230427 17:41:18.758737 357 workload/pgx_helpers.go:123 [-] 61 + UPDATE district W230427 17:41:18.758737 357 workload/pgx_helpers.go:123 [-] 61 + SET d_next_o_id = d_next_o_id + 1 W230427 17:41:18.758737 357 workload/pgx_helpers.go:123 [-] 61 + WHERE d_w_id = $1 AND d_id = $2 W230427 17:41:18.758737 357 workload/pgx_helpers.go:123 [-] 61 + RETURNING d_tax, d_next_o_id unexpected EOF 142.0s 3 0.0 0.2 0.0 0.0 0.0 0.0 delivery 142.0s 3 0.0 2.2 0.0 0.0 0.0 0.0 newOrder 142.0s 3 0.0 0.2 0.0 0.0 0.0 0.0 orderStatus 142.0s 3 0.0 2.2 0.0 0.0 0.0 0.0 payment Our workload is still running using the HAProxy 2 connection. Let's bring it back up: docker start lb Notice the client reconnects and continues with the workload. 335.0s 1780 0.0 0.1 0.0 0.0 0.0 0.0 stockLevel _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 336.0s 1780 0.0 0.1 0.0 0.0 0.0 0.0 delivery 336.0s 1780 7.0 1.1 19.9 27.3 27.3 27.3 newOrder 336.0s 1780 0.0 0.1 0.0 0.0 0.0 0.0 orderStatus 336.0s 1780 2.0 1.0 10.5 11.0 11.0 11.0 payment 336.0s 1780 0.0 0.1 0.0 0.0 0.0 0.0 stockLevel 337.0s 1780 0.0 0.1 0.0 0.0 0.0 0.0 delivery 337.0s 1780 7.0 1.1 21.0 32.5 32.5 32.5 ne The number of executed statements goes up upon the second client successfully connecting. We can now do the same with the second instance. Similarly, the workload reports errors that it can't find the lb2 host. 0.0 0.2 0.0 0.0 0.0 0.0 stockLevel I230427 17:48:28.239032 403 workload/pgx_helpers.go:79 [-] 188 pgx logger [error]: connect failed logParams=map[err:lookup lb2 on 127.0.0.11:53: no such host] I230427 17:48:28.267355 357 workload/pgx_helpers.go:79 [-] 189 pgx logger [error]: connect failed logParams=map[err:lookup lb2 on 127.0.0.11:53: no such host] And we can observe the dip in the statement count. We can bring it back up: docker start lb2 One thing we can improve on is starting the workload with both connection strings. It will allow each client to fail back to the other instance of pgurl even when one of the HAProxy instances is down. What we have to do is stop both clients and restart with both connection strings. cockroach workload run tpcc --duration=120m --concurrency=3 --max-rate=1000 --tolerate-errors --warehouses=10 --conns 30 --ramp=1m --workers=100 'postgresql://root@lb:26000/tpcc?sslmode=disable' 'postgresql://root@lb2:26000/tpcc?sslmode=disable' I am going to do that one client at a time so that the workload does not exit completely. Not at any point in this experiment have we lost the ability to read/write to and from the cluster. Let's shut down one of the HAProxy instances again and see the impact. docker kill lb lb I'm now seeing errors across both clients, but both clients are still executing. .817268 1 workload/cli/run.go:548 [-] 85 error in stockLevel: lookup lb on 127.0.0.11:53: no such host _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms) 156.0s 49 0.0 0.2 0.0 0.0 0.0 0.0 delivery 156.0s 49 1.0 2.1 31.5 31.5 31.5 31.5 newOrder 156.0s 49 0.0 0.2 0.0 0.0 0.0 0.0 orderStatus 156.0s 49 1.0 2.0 12.1 12.1 12.1 12.1 payment 156.0s 49 0.0 0.2 0.0 0.0 0.0 0.0 stockLevel I230427 17:55:58.558209 354 workload/pgx_helpers.go:79 [-] 86 pgx logger [error]: connect failed logParams=map[err:lookup lb on 127.0.0.11:53: no such host] I230427 17:55:58.698731 346 workload/pgx_helpers.go:79 [-] 87 pgx logger [error]: connect failed logParams=map[err:lookup lb on 127.0.0.11:53: no such host] I230427 17:55:58.723643 386 workload/pgx_helpers.go:79 [-] 88 pgx logger [error]: connect failed logParams=map[err:lookup lb on 127.0.0.11:53: no such host] I230427 17:55:58.726639 370 workload/pgx_helpers.go:79 [-] 89 pgx logger [error]: connect failed logParams=map[err:lookup lb on 127.0.0.11:53: no such host] I230427 17:55:58.789717 364 workload/pgx_helpers.go:79 [-] 90 pgx logger [error]: connect failed logParams=map[err:lookup lb on 127.0.0.11:53: no such host] I230427 17:55:58.841283 418 workload/pgx_helpers.go:79 [-] 91 pgx logger [error]: connect failed logParams=map[err:lookup lb on 127.0.0.11:53: no such host] We can bring it back up and notice the workload recovering. Conclusion Throughout the experiment, we've not lost the ability to read and write to the database. There were dips in traffic, but that's expected. The lesson here is providing a highly available configuration where clients can see multiple connections.

By Artem Ervits CORE

Simulating and Troubleshooting Blocked Threads in Scala [Video]

In this series of simulating and troubleshooting performance problems in Scala, let’s discuss how to make threads go into a blocked state. A thread will enter into a blocked state when it cannot acquire a lock on an object because another thread already holds the lock on the same object and doesn’t release it. Scala Blocked Thread Program Here is a sample program, which would make threads go into a blocked state. package com.yc class BlockedApp { } object BlockedApp { def main(args: Array[String]): Unit = { for (counter <- 1 to 10) { new AppThread().start() } } def action(): Unit = { this.synchronized { while (true) Thread.sleep(600000) } } class AppThread extends Thread { override def run(): Unit = BlockedApp.action() } } The sample program contains the BlockedApp class. This class has a start() method. In this method, 10 new threads are created. In the AppThread class there is a run() method that invokes BlockedApp.action(). In this BlockedApp.action() method, the thread is put to continuous sleep (i.e., the thread is repeatedly sleeping for 10 minutes again and again). But if you notice, the action() method is a synchronized method. Synchronized methods can be executed by only one thread at a time. If any other thread tries to execute the action() method while the previous thread is still working on it, then the new thread will be put in the blocked state. In this case, 10 threads are launched to execute the action() method. However, only one thread will acquire the lock and execute this method, and the remaining 9 threads will be put in a blocked state. NOTE: If threads are in a BLOCKED state for a prolonged period, then the application may become unresponsive. How To Diagnose Blocked Threads You can diagnose blocked threads either through a manual or automated approach. Manual Approach In the manual approach, you need to capture thread dumps as the first step. A thread dump shows all the threads that are in memory and their code execution path. You can capture a thread dump using one of the 8 options mentioned here. But an important criterion is as follows: you need to capture the thread dump right when the problem is happening (which might be tricky to do). Once the thread dump is captured, you need to manually import the thread dump from your production servers to your local machine and analyze it using thread dump analysis tools like fastThread or samurai. Automated Approach On the other hand, you can also use the yCrash open-source script, which would capture 360-degree data (GC log, 3 snapshots of thread dump, heap dump, netstat, iostat, vmstat, top, top -H, etc.) right when the problem surfaces in the application stack and analyze them instantly to generate root cause analysis report. We used the automated approach. Below is the root cause analysis report generated by the yCrash tool highlighting the source of the problem. yCrash reporting transitive dependency graph of 9 BLOCKED threads yCrash prints a transitive dependency graph that shows which threads are getting blocked and who is blocking them. In this transitive graph, you can see ‘Thread-0’ blocking 9 other threads. If you click on the thread names in the graph, you can see the stack trace of that particular thread. yCrash reporting the stack trace of 9 threads that are in BLOCKED state Here is the screenshot that shows the stack trace of the 9 threads which are in a blocked state. From the stack trace, you can observe that the thread is stuck on the ‘com.yc.BlockedApp$.action(BlockedApp.scala:17)’ method. Equipped with this information, one can easily identify the root cause of the blocked state threads. Video To see the visual walk-through of this post, click below:

By Ram Lakshmanan CORE

How to Optimize CPU Performance Through Isolation and System Tuning

CPU isolation and efficient system management are critical for any application which requires low-latency and high-performance computing. These measures are especially important for high-frequency trading systems, where split-second decisions on buying and selling stocks must be made. To achieve this level of performance, such systems require dedicated CPU cores that are free from interruptions by other processes, together with wider system tuning. In modern production environments, there are numerous hardware and software hooks that can be adjusted to improve latency and throughput. However, finding the optimal settings for a system can be challenging as it requires navigating a multidimensional search space. To accomplish this efficiently, it is necessary to understand the tuning landscape and to use tools and strategies that facilitate effective changes. Moreover, managing Java processes can be more difficult due to the number of auxiliary threads which are spawned by the JVM, even for logically single-threaded applications. The scheduling of these threads is critical to minimizing jitter and achieving optimal performance. In this article, we will explore the strengths and weaknesses of the standard solutions for controlling CPU isolation for low-latency applications under Linux and how we at Chronicle Software developed Chronicle Tune to address the inherent trade-offs of these solutions. Using the isolcpus Linux Configuration isolcpus is a Linux boot command-line option that allows an explicit list of CPUs to be excluded from consideration by the Linux scheduler. This option provides very effective isolation; however, the problem is that it does not respect CPU ranges. For example, when you use task set or sched_setaffinity to specify a range of allowed CPUs for a pinned process, only the first CPU in the allowed range is utilized, regardless of the number of threads in the process. Thus when controlling thread placement under isolcpus, every thread requires explicit management and particular care must be taken to avoid scheduling conflicts from auxiliary and/or child threads. Another major disadvantage of isolcpus is that the configuration is fixed once a host has started, so changes to the configuration require a reboot of the system. Using Cgroups or Csets An alternative that provides a similar level of isolation and allows for dynamically changing configuration is Linux cgroups. cgroups is a feature in the Linux kernel that enables administrators to limit, allocate, and prioritize system resources such as CPU, memory, disk I/O, and network bandwidth among processes or groups of processes. This can help prevent one application from monopolizing system resources, resulting in poor performance or instability. csets is a utility that is used specifically to manage CPU affinity and placement for groups of tasks. By defining csets, administrators can assign specific CPUs or CPU cores to particular tasks or groups of tasks, ensuring that those tasks have dedicated CPU resources and minimizing interference from other tasks. This can be especially useful in high-performance computing environments, where minimizing contention and maximizing performance is critical. Both cgroups and csets enable specific cpuset groupings to be defined, with processes confined to run within one particular group. Figure 1. A comparison of how threads can be managed with isolcpus and cgroups. isolcpus allows the management of individual threads but prevents the use of flexible CPU groups. One drawback is that cgroups are primarily designed to work at the process level and are a less natural tool to use when the targeted control of individual threads is important. As touched on earlier, this can be a particular problem for Java applications, given the relatively large number of auxiliary threads started by the JVM. Even though many of the threads only run occasionally, they can still generate enough jitter to impact the high percentiles of any latency-sensitive application in the same group. A further complication when using cgroups is the absence of support from standard calls like task set and sched_setaffinity, making it more challenging to combine cgroups with low-level libraries: moving processes between groups requires the use of specialized calls. Making the Procedure Better and Automatic Since there are drawbacks with both isolcpus and cgroups/csets, plus they can be time-consuming to configure, we developed software to make tuning and managing a system simpler and more transparent. Chronicle Tune blends features from isolcpus and cgroups/csets together with bespoke functionality to simplify CPU and system tuning, allowing changes to be applied dynamically without the need for reboots. Chronicle Tune can be especially useful for Java applications where careful separation and control of application and background threads is essential for achieving the best performance. Chronicle Tune facilitates optimal process placement and control, helping to ensure fewer and shorter interrupts and allowing threads to be dynamically migrated. Figure 2. Comparison of Chronicle Tune, isolcpus, and groups How Much Could Tuning Improve Performance? For businesses seeking to improve their performance down to the nanosecond, it’s crucial to understand how much of a difference tuning can make. To this end, we conducted a practical test to evaluate the impact of the Chronicle Tune on the performance of the Chronicle Queue. Specifically, we measured the write-to-read latency for 256-byte messages at a rate of 100,000 messages per second. Our testing shows that while Chronicle Tune had an effect even at lower percentiles, from around the 99.9th percentile onwards, the benefits of significantly reduced jitter became increasingly apparent, showing the machine running in a much cleaner, more optimal configuration. Figure 3 Write-to-read latency of Chronicle Queue exchanging 256-byte messages @ 100k msgs/s. Can Shrink Wrapped Software Be as Efficient as Manual Tuning? Using ready-made software is certainly more convenient than tuning manually. So how effective is Chronicle Tune in comparison? To investigate this, we have a tool that measures the jitter experienced by a spinning, pinned thread (closely representing a typical latency-sensitive application thread), and Figure 4 below shows the results of a comparison between isolcpus and Chronicle Tune. This plot shows that while the total number of jitter events is slightly lower for isolcpus (as might be expected given isolcpus integrates directly with the scheduler), the worst outliers are, in fact, slightly lower with Chronicle Tune (6us vs. 14us) on account of the additional system tuning with Chronicle Tune beyond just CPU isolation. Chronicle Tune achieves this with a simple, transparent configuration, which can be adjusted without the need for reboots. Figure 4. Average number of delays per hour, grouped by length of the delay. The statistics were gathered during a 91-second jitter test run. Conclusion The standard solutions for controlling CPU isolation for low-latency applications under Linux are isolcpus and cgroups/csets. However, they each have their downsides and can be awkward to use. Chronicle Tune simplifies the process of system tuning and manages low-latency, low-jitter tasks, scheduling of threads separately from processes, dynamic adjustment of allocations during runtime, efficient management of Interrupt Requests, and whole-system optimization, including SSD, disk, memory, and network. All of this is achieved using a simple, transparent configuration that can be adjusted dynamically without the need for a reboot to take effect.

By Peter Lawrey

CDNs: Speed Up Performance by Reducing Latency

Welcome back to this series all about file uploads for the web. In the previous posts, we covered things we had to do to upload files on the front end, things we had to do on the back end, and optimizing costs by moving file uploads to object storage. Upload files with HTML Upload files with JavaScript Receive uploads in Node.js (Nuxt.js) Optimize storage costs with Object Storage Optimize performance with a CDN Secure uploads with malware scans Today, we’ll do more architectural work, but this time it’ll be focused on optimizing performance. Recap of Object Storage Solution By now, we should have an application that stores uploaded files somewhere in the world. In my case, it’s an Object Storage bucket from Akamai cloud computing services, and it lives in the us-southeast-1 region. So when I upload a cute photo of Nugget making a big ol’ yawn, I can access it at austins-bucket.us-southeast-1.linodeobjects.com/files/nugget.jpg. Nugget is a super cute dog. Naturally, a lot of people are going to want to see this. Unfortunately, this photo is hosted in the us-southeast-1 region, so anyone living far away from that region has to wait longer before their eyes can feast on this beast. Latency sucks. And that’s why CDNs exist. What Is a CDN? CDN stands for “content delivery network“, and it’s a connected network of computers that are globally distributed and can store copies of the same files so that when a user makes a request for a specific file, it can be served from the nearest computer to the user. By using a CDN, the distance a request must travel is reduced, thereby resolving requests faster, regardless of a user’s location. Here’s a WebPageTest result for that photo of Nugget. The request was made from their servers in Japan, and it took 1.1 seconds for the request to complete. Instead of serving the file directly from my Object Storage bucket, I can set up a CDN in front of my application to cache the photo all over the world. So users in Tokyo will get the same photo but served from their nearest CDN location (which is probably in Tokyo), and users in Toronto are going to get that same file but served from their nearest CDN location (which is probably in Toronto). This can have significant performance implications. Let’s look at that same request, but served behind a CDN. The new WebPageTest results still show the same photo of Nugget, and the request still originated from Tokyo, but this time it only took 0.2 seconds; a fraction of the time! When the request is made for this image, the CDN can check if it already has a cached version. If it does, it can respond immediately. If it doesn’t, it can go fetch the original file from Object Storage, then save a cached version for any future requests. Note: the numbers reported above are from a single test. They may vary depending on network conditions. The Compounding Returns of CDNs The example above focused on improving the delivery speeds of uploaded files. In that context, I was only dealing with a single image that is uploaded to an Object Storage bucket. It shows almost a full-second improvement in response times, which is great, but things get even better when you consider other types of assets. CDNs are great for any static asset (CSS, JavaScript, fonts, images, icons, etc.), and by putting it in front of my application, all the other static files can automatically get cached as well. This includes the files that Nuxt.js generates in the build process, and which are hosted on the application server. This is especially relevant when you consider the “Critical rendering path” and render-blocking resources like CSS, JavaScript, or fonts. When a webpage loads, as the browser comes across a render-blocking resource, it will pause parsing and go download the resource before it continues (hence “render-blocking”). So any latency that affects a single asset may also impact the performance of other assets further down the network cascade. This means the performance improvements from a CDN are compounding. Nice! So is this about showing cute photos of my dog to more people even faster, or is it about helping you make your applications run faster? YES! Whatever motivates you to build faster websites, including a CDN as part of your application infrastructure is a crucial step if you plan on serving customers from more than one region. Connect Akamai CDN to Object Storage I want to share how I set up Akamai with Object Storage because I didn’t find much information on the subject, and I’d like to help anyone that’s looking for a solution. If it doesn’t apply to your use case, feel free to skip this section. Akamai is the largest CDN provider in the world, with something like 300,000 servers across 4,000 locations. It’s used by some of the largest companies in the world, but most enterprise clients don’t like sharing which tools they use, so it’s hard to find Akamai-related content. (Note: You will need an Akamai account and access to your DNS editor.) In the Akamai Control Center, I created a new property using the Ion Standard product, which is great for general-purpose CDN delivery. After clicking Create Property, you’ll be prompted to choose whether to use the setup wizard to guide you through creating the property, or you can go straight to the Property Manager settings for the new property. I chose the latter. In the Property Manager, I had to add a new hostname in the Property Hostnames section. I added the hostname for my application. This is the URL where users will find your application. In my case, it was "uploader.austingil.com". Part of this process also requires setting up an SSL certificate for the hostname. I left the default value selected for Enhanced TLS. With all that set up, Akamai will show me the following Property Hostname and Edge Hostname. We’ll come back to these later when it’s time to make DNS changes. Property Hostname: uploader.austingil.com Edge Hostname: uploader.austingil.com-v2.edgekey.net Next, I had to set up the actual property’s behavior, which meant editing the Default Rule under the Property Configuration Settings. Specifically, I had to point the Origin Server Hostname to the domain where my origin server will live. In my DNS, I created a new A record pointing origin-uploader.austingil.com to my origin server’s IP address, then added a CNAME record that points uploader.austingil.com to the Edge Hostname provided by Akamai. A: origin-uploader.austingil.com -> origin server IP CNAME: uploader.austingil.com -> uploader.austingil.com-v2.edgekey.net This lets me build out my CDN configuration and test it as needed, only sending traffic through the CDN when I’m ready. Finally, to serve files in my Object Storage instance through Akamai, I created a new rule based on the blank rule template. I set the rule criteria to apply to all requests going to the /files/* sub-route. The rule behavior is set up to rewrite the request’s Origin Server Hostname and change it to my Object Storage location: npm.us-southeast-1.linodeobjects.com. This way, any request that goes to uploader.austingil.com/files/nugget.jpeg is served through the CDN, but the file originates from the Object Storage location. And when you load the application, all the static assets generated by Nuxt are served from the CDN as well. All other requests are passed through Akamai and forwarded to origin-uploader.austingil.com, which points to the origin server. So that’s how I’ve configured Akamai CDN to sit in front of my application. Hopefully, it all made sense, but if you have questions, feel free to ask me. To Sum Up Today we looked at what a CDN is, the role it plays in reducing network latency, and how to set up Akamai CDN with Object Storage. But this is just the tip of the iceberg. There’s a whole world of tweaking CDN configuration to get even more performance. There are also a lot of other performance and security features a CDN can offer beyond just static file caching: web application firewalls, faster network path resolution, DDoS protection, bot mitigation, edge compute, automated image and video optimization, malware scanning, request security headers, and more. My colleague, Mike Elissen, also covers some great security topics on his blog. The most important thing that I wanted to convey today is that using a CDN improves file delivery performance by caching content close to the user. I hope you’re enjoying the series so far and plan on sticking around until the end. We’ll continue next time by looking at ways to protect our servers from malicious file uploads. Thank you so much for reading. If you liked this article, and want to support me, the best ways to do so are to share it and follow me on Twitter.

By Austin Gil CORE

Performance

DZone's Featured Performance Resources

Top Performance Experts

The Latest Performance Topics