Let's make a deal, Kabam

DNA3000 · September 2023

Mofugger said:

Demonzfyre said:

You're barking up the wrong tree. DNA has more intimate knowledge of the game and how it works than anyone on here aside from maybe Jax or Miike.
I mean, I've got 25+ years experience building persistence layers for CMS, data lakes, and cloud-scale analytics and metrics. This sort of thing is in my group of trees. Everything I said is accurate with respect to how this sort of data should be managed at scale. If DNA has actually looked at the underlying data model, then sure...he knows it's the wrong implementation.

I don't think there's a "should be managed" in this case. There are pros and cons with going flat, trading space for time, and going normalized, trading joins for space. It depends greatly on how your infrastructure works.

I myself haven't directly seen the implementation. I only know what Kabam has stated in the past. However, colloquial statements made by Kabam about the internals of how the game works have been oversimplified to the point of being inaccurate in the past, so I cannot say with absolutely certainty that what I described accurately describes the implementation.

It is entirely possible that the back end is in fact normalized, and the problem with crystal runaway is not the impact on the database size, but rather the slow accumulation of inefficient uncached calls to crystal ownership tables. The game client doesn't "feel" like it is doing that, but it is entirely possible: the fact that signature stones work the way they do (or rather did: I think that was fixed) was also pretty strange to me.

Is it possible the game implementation is not the best it could be? Absolutely. I've seen very little in the way of standardization for infrastructure components or designs in the games industry. Everyone does everything their own way. And experienced database engineers aren't generally lining up to create game database backends when they can make triple the salary literally anywhere else.

Unknown · September 2023

This content has been removed.

DNA3000 · September 2023

JustWantTheRewards said:

Demonzfyre said:

Mofugger said:

DNA3000 said:

Imagine there's a 32-bit counter for the amount of a certain kind of crystal. That's four bytes on every single game account. Not every current active player, but every account that has ever existed. Hundreds of millions of game accounts, each storing that value, even if it is zero. Remove that crystal altogether, and that's potentially gigabytes of space being freed in the game's databases. But only if *every* crystal is opened on *every* game account, even game accounts no longer being played, and then that crystal is deleted from the game (or the space is reused for new crystals, which is essentially the same thing).
You're describing a completely denormalized database (assuming RDBMS). I really expect that this isn't how they structured the data. The more reasonable approach is a crystals table, a users table, and a join table with FKs to crystal PK and user PK. As long as any one user has a type of crystal, that row needs to remain in the crystals table taking up a small, but constant, number of bits. In this scenario, the number of total bits consumed for storage IS directly related to the number of users with that type of crystal. If the number goes down to zero, then the association is removed from the join table instead of being left as zero.

This is the preferred way not just for storage (including index size), but for query execution plans and response time.

You're barking up the wrong tree. DNA has more intimate knowledge of the game and how it works than anyone on here aside from maybe Jax or Miike.
Just because someone is educated on a topic doesn't mean people who also know a thing or two just shouldn't discuss it with them.

I agree. If someone has some specific background that makes them qualified to question or contradict anything I say, they should feel free to do so. If they are in fact knowledgeable, then either I'm wrong and they're right or I'm right and they're wrong and either way there would be more confidence in the information.

What @Mofugger said was essentially correct, and I wouldn't disagree on at least the broad terms. You'd expect a modern, large scale distributed database to be implemented in the way he or she described most of the time. I'd insert a disclaimer about real time systems being a common exception, but even there I'd say there are ways to implement optimization layers over top (I would probably use very chunky normalization in this case: pure crystal by crystal normalization seems very inefficient to me). The only question that he forgot to ask himself was: if it was done that way, why would you need to do crystal clean ups? No architecture remains high performing under all scales, but the numbers involved for the game are below what I would think Google cloud could accommodate with an optimized architecture foundationally similar to what he's describing.

So either it is the way I described and they are running into scaling problems related to size, or it is the way Mofugger described and they are running into issues of having insufficient or no optimization layer to hide the joins, or it is something else entirely different and Kabam is not describing the problem in a technically accurate way.

ahmynuts · September 2023

DNA3000 said:

JustWantTheRewards said:

Demonzfyre said:

Mofugger said:

DNA3000 said:

Imagine there's a 32-bit counter for the amount of a certain kind of crystal. That's four bytes on every single game account. Not every current active player, but every account that has ever existed. Hundreds of millions of game accounts, each storing that value, even if it is zero. Remove that crystal altogether, and that's potentially gigabytes of space being freed in the game's databases. But only if *every* crystal is opened on *every* game account, even game accounts no longer being played, and then that crystal is deleted from the game (or the space is reused for new crystals, which is essentially the same thing).
You're describing a completely denormalized database (assuming RDBMS). I really expect that this isn't how they structured the data. The more reasonable approach is a crystals table, a users table, and a join table with FKs to crystal PK and user PK. As long as any one user has a type of crystal, that row needs to remain in the crystals table taking up a small, but constant, number of bits. In this scenario, the number of total bits consumed for storage IS directly related to the number of users with that type of crystal. If the number goes down to zero, then the association is removed from the join table instead of being left as zero.

This is the preferred way not just for storage (including index size), but for query execution plans and response time.

You're barking up the wrong tree. DNA has more intimate knowledge of the game and how it works than anyone on here aside from maybe Jax or Miike.
Just because someone is educated on a topic doesn't mean people who also know a thing or two just shouldn't discuss it with them.
I agree. If someone has some specific background that makes them qualified to question or contradict anything I say, they should feel free to do so. If they are in fact knowledgeable, then either I'm wrong and they're right or I'm right and they're wrong and either way there would be more confidence in the information.

What @Mofugger said was essentially correct, and I wouldn't disagree on at least the broad terms. You'd expect a modern, large scale distributed database to be implemented in the way he or she described most of the time. I'd insert a disclaimer about real time systems being a common exception, but even there I'd say there are ways to implement optimization layers over top (I would probably use very chunky normalization in this case: pure crystal by crystal normalization seems very inefficient to me). The only question that he forgot to ask himself was: if it was done that way, why would you need to do crystal clean ups? No architecture remains high performing under all scales, but the numbers involved for the game are below what I would think Google cloud could accommodate with an optimized architecture foundationally similar to what he's describing.

So either it is the way I described and they are running into scaling problems related to size, or it is the way Mofugger described and they are running into issues of having insufficient or no optimization layer to hide the joins, or it is something else entirely different and Kabam is not describing the problem in a technically accurate way.

I think both of you can be correct at the same time. I mean from my experience no 2 companies do things the same way (for some reason. I've seen some genuinely baffling decisions made) It could be a case where towards the beginning Kabam even did things a different way from what both you and Mofugger outlined. Maybe they didn't think the game would grow to be like the biggest most complex fighting game of all time (it is right? I cant think of another off the top of my head) and the structure that they had in place at the beginning was too integral to the running of the game to be changed and they've just had to work with it. Similar to what happened a few years back with their Unity shenanigans. My specialty isn't in database systems and i haven't worked with them directly very often so i could be off the mark.

DNA3000 · September 2023

ahmynuts said:

DNA3000 said:

JustWantTheRewards said:

Demonzfyre said:

Mofugger said:

DNA3000 said:

Imagine there's a 32-bit counter for the amount of a certain kind of crystal. That's four bytes on every single game account. Not every current active player, but every account that has ever existed. Hundreds of millions of game accounts, each storing that value, even if it is zero. Remove that crystal altogether, and that's potentially gigabytes of space being freed in the game's databases. But only if *every* crystal is opened on *every* game account, even game accounts no longer being played, and then that crystal is deleted from the game (or the space is reused for new crystals, which is essentially the same thing).
You're describing a completely denormalized database (assuming RDBMS). I really expect that this isn't how they structured the data. The more reasonable approach is a crystals table, a users table, and a join table with FKs to crystal PK and user PK. As long as any one user has a type of crystal, that row needs to remain in the crystals table taking up a small, but constant, number of bits. In this scenario, the number of total bits consumed for storage IS directly related to the number of users with that type of crystal. If the number goes down to zero, then the association is removed from the join table instead of being left as zero.

This is the preferred way not just for storage (including index size), but for query execution plans and response time.

You're barking up the wrong tree. DNA has more intimate knowledge of the game and how it works than anyone on here aside from maybe Jax or Miike.
Just because someone is educated on a topic doesn't mean people who also know a thing or two just shouldn't discuss it with them.
I agree. If someone has some specific background that makes them qualified to question or contradict anything I say, they should feel free to do so. If they are in fact knowledgeable, then either I'm wrong and they're right or I'm right and they're wrong and either way there would be more confidence in the information.

What @Mofugger said was essentially correct, and I wouldn't disagree on at least the broad terms. You'd expect a modern, large scale distributed database to be implemented in the way he or she described most of the time. I'd insert a disclaimer about real time systems being a common exception, but even there I'd say there are ways to implement optimization layers over top (I would probably use very chunky normalization in this case: pure crystal by crystal normalization seems very inefficient to me). The only question that he forgot to ask himself was: if it was done that way, why would you need to do crystal clean ups? No architecture remains high performing under all scales, but the numbers involved for the game are below what I would think Google cloud could accommodate with an optimized architecture foundationally similar to what he's describing.

So either it is the way I described and they are running into scaling problems related to size, or it is the way Mofugger described and they are running into issues of having insufficient or no optimization layer to hide the joins, or it is something else entirely different and Kabam is not describing the problem in a technically accurate way.
I think both of you can be correct at the same time. I mean from my experience no 2 companies do things the same way (for some reason. I've seen some genuinely baffling decisions made) It could be a case where towards the beginning Kabam even did things a different way from what both you and Mofugger outlined. Maybe they didn't think the game would grow to be like the biggest most complex fighting game of all time (it is right? I cant think of another off the top of my head) and the structure that they had in place at the beginning was too integral to the running of the game to be changed and they've just had to work with it. Similar to what happened a few years back with their Unity shenanigans. My specialty isn't in database systems and i haven't worked with them directly very often so i could be off the mark.

In the closing days of an MMO I used to play I "acquired" admin access to one of the test shards. One of the things I discovered was that many of the organically added features the game was enhanced with had a lot of caveats regarding how they were used. For example, there was a mechanism that could add a new ability to any player character. This was intended to be used on instanced maps only. When used in a zone that could contain a hundred players or more, it would crash the entire shard completely. Turns out the problem was that even though this feature would work perfectly fine no matter how many players it was used on, if it took more than a fraction of a second it would trip an interlock designed to be a safety feature to prevent run away scripted events from occurring. However, it was an unhandled exception, because no one thought to consider what would happen if this feature was used on more than 24 players at a time.

(What I never bothered to figure out was why this took more than a millisecond per player in the first place).

That's what pretty much every game looks like eventually. They look like Overseer's costume. They are not what people would guess them to be, and they are not what people assume they would be from seeing other things. They are games, and games are weird.

Mofugger · September 2023

DNA3000 said:

So either it is the way I described and they are running into scaling problems related to size, or it is the way Mofugger described and they are running into issues of having insufficient or no optimization layer to hide the joins, or it is something else entirely different and Kabam is not describing the problem in a technically accurate way.

I agree with much of your thought process. The game experience skews to the individual user, not aggregate groups. You can see this in screens like Summoner Leaderboard screens which seem to be slower (and slower than scalar functions account for). This indicates some purposeful denormalization of user data. The logical thing to denormalize is the summary data like stash count and crystal count and inventory...the top level rollup information.

The reason I felt less certain about denormalization to individual crystals is not because of the storage tradeoff, but because of the constant schema updating that would be required and because of the eventual hard column count limits of the underlying data store. In something like Postgres this could be as high as 1600 columns, but it is totally dependent on column type and tuples so is realistically much lower. And gets inefficient as you start to need to read from TOAST when a row exceeds the page size.

Optimization is hard and interesting. I could drone on about CAP theorem and why you can only have two of those things. But everyone always wants all 3. In Kabam's case I'd have to believe immediate consistency is non-negotiable for most or all of the rewards-related data. So you can't take cheap optimizations like materialized views or separate read/write tables. Which left me feeling like they start to hit the limitations of joined data. But I've seen absolutely nothing concrete to validate any of that.

DNA3000 said:

And experienced database engineers aren't generally lining up to create game database backends when they can make triple the salary literally anywhere else.

Utterly sad, and utterly true. This is an industry that has some really interesting data problems for both producer and consumer. But the industry also can't (or won't...I don't know firsthand) pay market cost for that talent.

Mofugger · September 2023

I happen to enjoy this thread immensely, but I doubt it will win any "Best of" awards in this forum

DNA3000 · September 2023

Mofugger said:
In Kabam's case I'd have to believe immediate consistency is non-negotiable for most or all of the rewards-related data. So you can't take cheap optimizations like materialized views or separate read/write tables.

Well in that case here's one that I think will melt your brain in particular. There used to be a bug, which was very long standing but you may not be aware of, where if you tried to apply a large number of sig stones to a champion and your game client crashed during the process, you would infrequently find that when you started back up again you had some random number of sig stones used and sig levels applied to the target champion. The brain melting part being: those two numbers would not be remotely the same. In other words, it was possible to try to apply 100 sig stones, crash, and restart to find the game deducted twelve sig stones from your inventory and increased the champion sig level by 82.

This should be fixed now. Now think about what the implementation must have looked like for that to be a not-rare possibility. Even most mistaken wrong implementations can't do that. I wouldn't expect ACID here, but that's just bonkers.

Let's make a deal, Kabam

Comments