"Fossies" - the Fresh Open Source Software Archive

Member "wire-server-2021-10-01/docs/developer/cassandra-interaction.md" (4 Oct 2021, 8704 Bytes) of package /linux/misc/wire-server-2021-10-01.tar.gz:

As a special service "Fossies" has tried to format the requested source page into HTML format (assuming markdown format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field. See also the latest Fossies "Diffs" side-by-side code changes report for "cassandra-interaction.md": 2021-09-14_vs_2021-10-01.

Writing code interacting with cassandra


Anti-pattern: Using full table scans in production code

Queries such as select some_field from some_table; are full table scans. Cassandra is not optimized at all for such queries, and even with a small amount of data, a single such query can completely mess up your whole cluster performance. We had an example of that which made our staging environment unusable. Luckily, it was caught in time and fixed before making its way to production.

Suggested alternative: Design your tables in a way to make use of a primary key, and always make use of a WHERE clause: SELECT some_field FROM some_table WHERE some_key = ?.

In some rare circumstances you might not easily think of a good primary key. In this case, you could for instance use a single default value that is hardcoded: SELECT some_field FROM some_table WHERE some_key = 1. We use this strategy in the meta table which stores the cassandra version migration information, and we use it for a default idp. some_field might be of type set, which allows you to have some guarantees. See the implementation of unique claims and the note on guarantees of CQL sets for more information on sets.

Anti-pattern: Using IN queries on a field in the partition key

Larger IN queries lead to performance problems. See https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

A preferred way to do this lookup here is to use queries operating on single keys, and make concurrent requests. One way to do this is with the [pooledMapConcurrentlyN] (https://hoogle.zinfra.io/file/root/.stack/snapshots/x86_64-linux/e2cc9ab01ac828ffb6fe45a45d38d7ca6e672fb9fe95528498b990da673c5071/8.8.4/doc/unliftio-0.2.13/UnliftIO-Async.html#v:pooledMapConcurrentlyN) function. To be conservative, you can use N=8 or N=32, we've done this in other places and not seen problematic performance yet. For an optimization of N, see the section further below.

Anti-pattern: Designing for a lot of deletes or updates

Cassandra works best for write-once read-many scenarios.

Read e.g.

Understanding more about cassandra

primary partition clustering keys

Confused about primary key, partition key, and clustering key? See e.g. this post or this one

optimizing parallel request performance

See the thoughts in https://github.com/wireapp/wire-server/pull/1345#discussion_r567829234 - measuring overall and per-request performance and trying out different settings here might be worthwhile if increasing read or write performance is critical.

Cassandra schema migrations

Backwards compatible schema changes

Most cassandra schema changes are backwards compatible, or should be designed to be so. Looking at the changes under services/{brig,spar,galley,gundeck}/schema you'll find this to be mostly the case.

The general deployment setup for services interacting with cassandra have the following assumption:

So usually with these safeguards in place, and backwards-compatible changes, we have the following:

If this order (apply schema first; then deploy code) is not safeguarded, then there will be code running in e.g. production which SELECT my_new_field FROM my_new_table even though this doesn't yet exist, leading to 500 server errors for as long as the mismatch between applied schema and code version persists.

Backwards incompatible schema changes

In the case where a schema migration is not backwards compatible, such as in the form of ALTER TABLE my_table DROP my_column, the reverse problem exists:

During a deployment:

What to do about backwards incompatible schema changes

Options from most to least desirable: