I am a Senior Software Architect of 12 years...can you guys hire me? This is sad because I can get AWS or GCP support on the phone in the matter of a few minutes at my job. If GCP is down, then you guys are going to get quite a big reimbursement, so spread the love.
What's strange is that you guys are blaming Google, when they have no indication of a service disruption today: https://status.cloud.google.com/
so what's the prob?
The GCP status board only lists outages and critical disruptions. It does not list functional problems that are workload-specific.
A while back I had an issue with an EC2 hosted cluster. The client was getting bursts of low IOPS to Windows guests for a long time. This caused an invisible, unreported, zero event log occurrence of volume shadow snapshot block clearing backlog to occur, which I can't even find articles about anyone ever troubleshooting. This caused all volume management to stall, which meant no VSS anything, no disk management anything, nada, zip, zilch. Touching anything involving volume management would hang the volume management services and require a reboot, which did not fix the underlying problem. We only figured out the problem when we cloned the instances and left them idle during troubleshooting, and after two *days* the idle clones automagically fixed themselves when not under load. No one can explain to me what can take that long to automatically resolve.
The world is a complicated place.
Yea, but that's a flaw with the infrastructure's architecture you designed, not the provider like they are blaming. You should have had a failover cluster in that situation, and you would have had time to figure out the problem.
This game has been plagued by long outages and scaling issues for years, and it's not that hard to build a redundant, scalable architecture at all levels. Hell, they make enough money in a week to solve this problem. That's the beauty of AWS and GCP. Sure, they'll have to pay more for the redundancy, but you don't have to deal with this BS all the time.
Yes and no. Yes, ultimately the responsibility is with the original architecture designers. But those guys might be long gone and once you have a flawed design, organically growing it over time tends to make it more vulnerable to problems, and there's no longer any actual human being to directly blame.
I've designed resilient systems for a long time, and long enough to be comfortable saying almost no one is any good at it. I don't know why, all I know is if you're not the CTO of Netflix you're probably doing something dumb. But that's a separate issue from how hard it is to fix an issue once you have it.
The idea that you just need two and you'll be fine is a common thought in the industry. It is wrong. If redundancy has never bit you in the butt before, consider yourself lucky. But I don't think the thread will survive a debate about the proper implementation of cluster quorum.
I am a Senior Software Architect of 12 years...can you guys hire me? This is sad because I can get AWS or GCP support on the phone in the matter of a few minutes at my job. If GCP is down, then you guys are going to get quite a big reimbursement, so spread the love.
What's strange is that you guys are blaming Google, when they have no indication of a service disruption today: https://status.cloud.google.com/
so what's the prob?
The GCP status board only lists outages and critical disruptions. It does not list functional problems that are workload-specific.
A while back I had an issue with an EC2 hosted cluster. The client was getting bursts of low IOPS to Windows guests for a long time. This caused an invisible, unreported, zero event log occurrence of volume shadow snapshot block clearing backlog to occur, which I can't even find articles about anyone ever troubleshooting. This caused all volume management to stall, which meant no VSS anything, no disk management anything, nada, zip, zilch. Touching anything involving volume management would hang the volume management services and require a reboot, which did not fix the underlying problem. We only figured out the problem when we cloned the instances and left them idle during troubleshooting, and after two *days* the idle clones automagically fixed themselves when not under load. No one can explain to me what can take that long to automatically resolve.
The world is a complicated place.
Yea, but that's a flaw with the infrastructure's architecture you designed, not the provider like they are blaming. You should have had a failover cluster in that situation, and you would have had time to figure out the problem.
This game has been plagued by long outages and scaling issues for years, and it's not that hard to build a redundant, scalable architecture at all levels. Hell, they make enough money in a week to solve this problem. That's the beauty of AWS and GCP. Sure, they'll have to pay more for the redundancy, but you don't have to deal with this BS all the time.
Yes and no. Yes, ultimately the responsibility is with the original architecture designers. But those guys might be long gone and once you have a flawed design, organically growing it over time tends to make it more vulnerable to problems, and there's no longer any actual human being to directly blame.
I've designed resilient systems for a long time, and long enough to be comfortable saying almost no one is any good at it. I don't know why, all I know is if you're not the CTO of Netflix you're probably doing something dumb. But that's a separate issue from how hard it is to fix an issue once you have it.
The idea that you just need two and you'll be fine is a common thought in the industry. It is wrong. If redundancy has never bit you in the butt before, consider yourself lucky. But I don't think the thread will survive a debate about the proper implementation of cluster quorum.
Dude, I have been in the industry for 12 years as it evolved. My current employers has to maintain 99.9999% uptime to our clients or else we have to reimburse them a huge amount. I do know that legacy design is hard to get rid of once it's been built out, but there is no excuse with this much time having available. Companies often neglect infrastructure when they are busy building out new features and code to add value to their products, and we know Kabam isn't reinventing what was handed to them. You would think stability would be a huge priority in that case.
Netmarble is struggling with the same issues since they purchased Kabam. With the tools out there today (docker, fargate, ELB, multi-region) it IS that easy. You are describing problems with single EC2 instances when that is outdated thinking. We just redesigned our outdated architecture and it took 2 weeks with 2 people working on it to incorporate solutions for these exact problems, and we even improved it so that it is using terraform so that we can easily plan changes to the infrastructure and apply them using code. And, we added monitoring and observability to replace outdated event logs and dashboards.
@NikoBravo what's really funny, this whole situation inspired me to watch Fight Club, lol. I'm leaning towards Tyler Durden and Project Mayham blowing up Google's servers.
Quake! I think. I hate the idea of answering this question because I'm always worried people are going to yell at me for making a horrible choice. Hah.
Fell in love with quake on agents of shield and was stoked to see her added to the game. Would have to agree with you she is my favorite character in game as well. Wolverine is right next to her tho. Grew up watching the 90's x-men series which was amazing
Any idea when the servers are coming back up ? Kinda burnt out on Call Of Duty... But if comes down to it I may pop that in and play it while I wait LoL
Well, There should at least be a single button to ask for help in arena instead of asking for help by tapping every single champ every single time cause it's quite time consuming. if it was a single button for like ask help for 3 maybe that would be awesome. ┐(︶▽︶)┌
@Kabam Miike that's a tough call for me lol I LOVE Ghost Rider I wanted him in the game for months and when he was released i flipped out lol ! But I do love Magik she was the first champ I ever awakened.. When we got the Generic Gems after Scarlett and Thor got nerfed.. So I'm split now as to if I want to r5 a 5* GR or Magik... Imo GR has best Regen in the game but Magik has best power control
That's a hard one. Magik was my first 5/50, and she helped me go so far when I needed her, and I still use her a lot! My 5-Star Magik is not awakened, but her Power Lock on Sp2 is still one of my go to mechanics in game for when I'm having trouble with an Opponent that has strong special attacks, and was a main stay of my AW attack team for a long time.
I find it somewhat disrespectful that you guys only choose the fun off-topic posts to respond to when people ask legitimate questions like the guy on the last page asking about how Domino's sp2 works.
I mean, don't get me wrong, I don't know how much you get paid or if you even make OT for this. Maybe, you hate that you have to be there on Friday night and want to turn it into fun social hour where people suck up to you for ego stroking. Honestly, I don't care, but I mean you have the time to at least answer some people's questions.
You can have fun AND be productive. Otherwise, why not just go home and post updates when the devs actually doing the work email you telling you status updates?
We've replied to a lot of on topic posts. And DNA replied to you pretty thoroughly regarding some of your server questions. Frankly, just because you would like more technical details does not always mean that we can provide them to you. You seem really intent on arguing about some of this stuff, and I completely understand your frustration - you have every right to be angry with us right now. But please don't come in here and try to derail the situation.
A lot of other people are really enjoying the dialogue in here and I'm sorry if you're not, but we are going to continue engaging with folks in this thread in this manner because it's helping a lot. There is also zero reason or justification for trying to bring what we get paid into this conversation, it's not relevant or constructive.
No one is 'sucking up' to us, we aren't here for an 'ego stroke', we are trying to make the best of a bad situation.
I find it awesome that you guys have taken time out of (What I presume to be) your long weekend! Awesome that you guys could stay through our trying times! BTW, any update on expanding the Content Creator program to non-Youtubers such as fan pages and bots? You guys made a comment a while back during the original announcement, but don’t know if that’s still a thing in development.
I find it somewhat disrespectful that you guys only choose the fun off-topic posts to respond to when people ask legitimate questions like the guy on the last page asking about how Domino's sp2 works.
I mean, don't get me wrong, I don't know how much you get paid or if you even make OT for this. Maybe, you hate that you have to be there on Friday night and want to turn it into fun social hour where people suck up to you for ego stroking. Honestly, I don't care, but I mean you have the time to at least answer some people's questions.
You can have fun AND be productive. Otherwise, why not just go home and post updates when the devs actually doing the work email you telling you status updates?
We've replied to a lot of on topic posts. And DNA replied to you pretty thoroughly regarding some of your server questions. Frankly, just because you would like more technical details does not always mean that we can provide them to you.
A lot of other people are really enjoying the dialogue in here and I'm sorry if you're not, but we aren't going to stop because one person would prefer a different communication style.
Thanks @adora & @Kabam Miike for the fun discussion. I've got to get to bed it's 2:30AM on the East Coast & I've got work in the morning.... hopefully the game will be up then. Good night everyone 🍺🍻🥃🍹🍸🍷🌛
Comments
Dude, I have been in the industry for 12 years as it evolved. My current employers has to maintain 99.9999% uptime to our clients or else we have to reimburse them a huge amount. I do know that legacy design is hard to get rid of once it's been built out, but there is no excuse with this much time having available. Companies often neglect infrastructure when they are busy building out new features and code to add value to their products, and we know Kabam isn't reinventing what was handed to them. You would think stability would be a huge priority in that case.
Netmarble is struggling with the same issues since they purchased Kabam. With the tools out there today (docker, fargate, ELB, multi-region) it IS that easy. You are describing problems with single EC2 instances when that is outdated thinking. We just redesigned our outdated architecture and it took 2 weeks with 2 people working on it to incorporate solutions for these exact problems, and we even improved it so that it is using terraform so that we can easily plan changes to the infrastructure and apply them using code. And, we added monitoring and observability to replace outdated event logs and dashboards.
And yes, you can learn from companies like netflix who may experience an outage once or twice a year (and i use them more than MCOC and never have seen one). They actually learn from their mistakes and post the details for all to learn from: https://medium.com/netflix-techblog/lessons-netflix-learned-from-the-aws-outage-deefe5fd0c04
Fell in love with quake on agents of shield and was stoked to see her added to the game. Would have to agree with you she is my favorite character in game as well. Wolverine is right next to her tho. Grew up watching the 90's x-men series which was amazing
Hrrrrm. I would love 90's Cartoon Storm. Squirrel Girl would be rad. Jubilee! Shuri from Black Panther. Valkyrie from Thor.
That means you are there in some allaince?
Hmm starting to see a bias against the men of marvel tsk tsk
Jubilee is a definite yes for me.
We've replied to a lot of on topic posts. And DNA replied to you pretty thoroughly regarding some of your server questions. Frankly, just because you would like more technical details does not always mean that we can provide them to you. You seem really intent on arguing about some of this stuff, and I completely understand your frustration - you have every right to be angry with us right now. But please don't come in here and try to derail the situation.
A lot of other people are really enjoying the dialogue in here and I'm sorry if you're not, but we are going to continue engaging with folks in this thread in this manner because it's helping a lot. There is also zero reason or justification for trying to bring what we get paid into this conversation, it's not relevant or constructive.
No one is 'sucking up' to us, we aren't here for an 'ego stroke', we are trying to make the best of a bad situation.
maybe something to think about around Halloween?
God, I remember that sound. Did a course through Dial-Up. I thought I was so cool for being connected. Lol.
Why not Cloak and Dagger to support the new Freeform/Hulu series?
Or Jessica Jones?
We have two Iron Fists for crying out loud.
Ha ha ha ha no. At this point, the only option to continue this particular conversation exceeds even the loose parameters of this thread.