S1E6: Disaster Recovery and Backups - Just Because You're Paranoid Doesn't Mean They Aren't After You
S1E6: Disaster Recovery and Backups - Just Because You're Paranoid Doesn't Mean They Aren't After You
Disaster recovery and backups.
Jim TimbermanManaged Services Managing Director
Chadd WheatPrincipal Consultant and Team Manager
Speaker 1: ASCII Anything, a podcast presented by Moser Consulting. Join us every Wednesday to find out who for Moser's more than 200 resident experts we'll be talking to and what they're focused on at the moment. Trends, security, setup, ASCII Anything and we'll give you our best answers.
Angel Leon: Hello everyone. And welcome to another edition of ASCII Anything, presented by Moser Consulting. I'm your host Angel Leon, Moser's HR advisor. And this week we have some repeat offenders with us. We're bringing back Jim Timberman and Chadd Wheat to talk about a subject we sort of glossed over during our first conversation. We'll be talking about disaster recovery and backups for your business. Should you have a plan? What should that plan entail? How often should you back up your data? All of that and much more with our resident experts, Jim Timberman and Chadd Wheat. Gentlemen, thank you very much for joining me once again on ASCII Anything. How are you guys today?
Jim Timberman: Great. Wonderful. Thanks for having us back.
Chadd Wheat: Yeah.
Angel Leon: All right, awesome. Well, the last time we spoke, we provided our audience a lot of great information regarding managing their IT help desk. But along the way, we may have stumbled into something that could have been made into its own conversation. And that's what we're here to do today, as we're going to talk about disaster recovery and backups. I was doing some research about this topic yesterday. I understand that disaster recovery plans are key for any IT expert and sometimes they should be created in conjunction with that of a business continuity plan. What do you guys think about that?
Jim Timberman: To answer that, that's not necessarily true. When we think of disaster recovery, a lot of people just normally think of, oh, my building got hit by a tornado or it got wiped out in a fire or something happened that I can't get into my building. Those are situations where continuity plans do come into place of how are your folks going to work, et cetera? From a disaster recovery, could just be essentially that I've been breached, there's servers have gone down and you need to spin up new ones, applications have gone down and you need to get them back up and running. If you're looking at disaster recovery, that way business continuity just becomes a small component of it, meaning what are you going to do if we can't get into your building? Well, in today's cloud environment, a lot of that has been answered. And actually it's really kind of funny is even with COVID you could look at that as, okay my business continuity plan has to be in place because my workers are remote and away. That being said, yeah, there is some correlation there, but more focused on the disaster side of things.
Chadd Wheat: Yeah, there's several scenarios, Angel, to consider. Like Jim said, the one we're sort of living through right now is the COVID crisis where people really had to discover or implement their business continuity. But like Jim mentioned, instead of getting hit by a tornado or a natural disaster, there's things such as disgruntled employees, to sabotaging critical systems, there's through social engineering and the dark web, people finding out account credentials on your site. There's a whole gambit of different situations that it can occur like that.
Jim Timberman: To Chadd's point there of having those situations where you have been breached or compromised, not just from a disaster perspective, but the backup piece of that comes into play pretty quickly because a lot of times, and I'll kind of talk a little bit about our process when we deal with ransomwares or any kind of breaches and so forth, as it relates to some endpoint detection tools that are out there to protect devices. Well, if that happens, we're really restoring from a backup. We're taking the most current backup we have, building a new device, building a new server and getting that set up and then restoring the backup to that device. And then the other, any kind of compromised piece has been removed from the network. That's where the backup piece becomes really important and a lot of companies today really don't think about it as, wow, that's I really should be thinking that way and that I need to access this data quickly. And we're seeing more and more of that based on this, the way the world has kind of changed today.
Angel Leon: Talking about those recovery strategies, I was reading a little bit yesterday about IT systems, applications and data and that these recoveries should include networks, servers, desktops, laptops, wireless devices, data and connectivity. Speaking about backups and all of that, that I just mentioned, how should a company determine what to back up? What do you think would be needed to be included in that?
Chadd Wheat: It's really a question of risk versus reward, Angel, because some companies may do an entire one for one backup. And I think we've got an example of that, but it all depends on again, sort of their budget and pain tolerance because you can back up everything but obviously that's going to be the most extensive and costly solution from a system standpoint and from a personnel standpoint. Other people only backup critical servers. Some people backup those plus their development servers and any intellectual data type thing. It really depends on it's like an insurance policy. How far do you want to go? What is your tolerance if something does happen, the risk and reward?
Jim Timberman: And through that into kind of, and Chadd is spot on with that is, you need to prioritize what are your critical applications and systems? And then from that, you want to start saying," Okay, what do I need to back up with that? And what's the level I need to get to?" And through that, the first thing everyone says is," Hey, let's take email out of it," because email technically is the most mission critical application that any company has. And if they don't tell you, they're lying, because that's where it starts. Communication is made through that. With a lot of folks that haven't moved to a Office or a Microsoft 365 platform or a cloud based structure on that, that's a lot of what needs to come up, having to get that email server up and running and getting things going and then looking at your core ERPs, your CRMs, et cetera and then looking through that. But the bigger question on that is, okay, not so much what's mission critical from an application perspective, it's also looking at what devices need to come into play. Because in this wireless world we live in, it's easy to connect. Having a network firewall configuration may not be as critical because if I can get this server up and get it connected, then I can get folks to VPN to it. They can get into it from anywhere, which we're seeing today. That becomes critical. And then, through that, it's also looking at the next phase of that is, okay, how much data do you need? And at what point are you willing to accept that you're going to be behind? Or may not be able to recover from that?
Chadd Wheat: Yeah. And there's a difference between becoming functional once again and becoming fully functional back to where you were before the disaster occurred. And again, large companies may be considered about the loss of income, some measure it even in minutes or hours, as opposed to smaller companies, which may be able to tolerate a longer period of outage.
Angel Leon: Let me ask you along those same lines, because you guys just brought up a lot of good points, but when creating a disaster recovery plan and we're going to get back to backups here in a second, what contingencies should be considered?
Jim Timberman: That's a great question. A lot of it is the first thing we normally look at is where's this going to live? When you look at disaster recovery, because the piece of that that's going to start from is from your backups. We were kind of talking about this earlier about the components of separating business continuity and disaster recovery, because business continuity is really looked at as my building. I can't access my building, therefore, my workers, I have to go set up workstations somewhere else. I have to be in a remote location because a good example would be here in the Midwest is a tornado took my building out. Okay, well with that, I need to find a place to begin for my employees to sit and work at. Companies have like, hey, we've got suites in a hotel. We look at, we've got a partner down the street that has given us office space. We're a tenant within this property management company that they'll put us up in another building of theirs that's vacant, stuff like that. And then through that, they've got to walk through all the different pieces that have to happen in order to get that remote, new location up and running. Telephones, internet connection, et cetera. A lot of that has to be defined. When we talk about disaster recovery, that could be meaning from my search server has died. It's there's a bad core in it. There's a bad disc. It can't be functional so I have to go start up another one, move things over to that and begin to work again.
Chadd Wheat: And again, sorry to interrupt, Jim. Again, that's all changed in the last decade or so with the rise of the internet, really, because several, several years ago, I won't say how old I am. I worked for a company that they had pre- rented a warehouse and they had workstations and everything set up so if a disaster took out the main headquarters, they could get their essential personnel over there to work in this warehouse and it would be basically ready to go from the start. But with the internet era and connectivity the way it is, and again, using COVID as an example, people can work from almost anywhere. Now, you don't want everybody, especially your secure systems, maybe to do that, but it's really changed the way business continuity and disaster recovery have developed over the years.
Jim Timberman: That is true. And then, you put into the fact that you've got larger bandwidth so that your risk of how much data are you willing to lose becomes greater. And we normally sit down with clients and talk about disaster recovery and business continuity and backups. The first question always comes up is, or the first two questions that we ask are always," Well, how much money are you willing to budget toward this? And how much data are you willing to lose?" Because you could build the most elaborate backup scheme and tools out there. There's different platforms to do this within and environments to do this within, but that becomes really expensive. You have your, what we would call a hot backup, which means that we've got a replicated environment sitting somewhere else that is actually running in parallel to your production environment, replication of data, all the stuff that's configured the same anything that happens over here, happens over there and it's running in parallel and it's either in the office or location or it's possibly in another location, whether that would be in a co- lo or out in the cloud or wherever, and they're running a parallel so something happens here. They just instantly flip over and they're up and running. And then there's a catch up game once the other ones come back up. Those are really expensive and depending on what you want to do, but your risk is pretty small. And there's a lot of larger companies out there that are doing what's called follow the sun kind of backups and recovery is that they're basically have these multiple redundant systems that are all over the world at different data centers so if anything should happen in that region, it just flips over to the next one or the next one so that if anything ever happens, we're covered.
Chadd Wheat: That's sort of how the AWS and Azure work, for example. They have data centers located across the globe and when something happens, they can isolate it and move to the next one.
Jim Timberman: And even in that too, from there, you kind of start to scale down into what is the best solution for that for that company of our clients. A lot of that is again, based on cost and how much data they're willing to lose. And there's a multitude of different products and services out there to help support a lot of those.
Angel Leon: Jim, you mentioned data recovery expenses and that ties into the next question that I'm going to ask, because it has to do with this topic specifically with recovery plans. What are some of the common ways hackers use to try and gain access to information? What do you think are some of the sneakier ways that they use to get that data? Because at the end of the day, talking about data, expenses, if you want your data to be safe, I think it's safe to say that for lack of a better term, that it's worth the investment, it's worth the expense, correct?
Jim Timberman: Yes. That is true. Data encryption is the key to that. Ensuring that it's encrypted in movement, it's encrypted on the drive. That's the key. Chadd, do you have anything you want to add to that?
Chadd Wheat: Yeah. One of the ways people don't realize is with all the social platforms today, hackers can social engineer and they can look for certain data on people's public profiles. And this goes from as far, there's the intentional kind where they're phishing through an email, say," Oh, click here." And they can get information that way, but people have to realize that anything you put on the internet, especially if it's related to your business, that can be social engineered and that can give hackers certain bits of information that they may need to break into your system, both personal and professional systems.
Jim Timberman: And that also brings up another point. When we're talking about a lot of protection is done around individuals, PII data, your personal data, your bank accounts, et cetera. But one of the things a lot of companies often time and time again, forget about is their intellectual capital. What they do, how they do it, their processes. If they're in the manufacturing world or pharma or any kind of manufacturing, there's patents that are pending and information there that they're maintaining and holding private for future release on the product or service, et cetera, et cetera. Well, what ends up happening a lot of times is people, there there's so much of a wall built around this HIPPA data and PII, personal data, don't let your social security numbers get out. Because that seems to be where everything's noticed quickly because it impacts individuals. But what a lot of people forget about is that, hey, if they open the doors up a little bit on their intellectual capital, that makes it a lot easier to get in so now you get into that kind of corporate espionage where we're stealing patents and stealing data and being able to get to market quicker than others. A lot of time, they're spending more time focusing on that personal data and that they have intellectual capital that they need to protect as well.
Chadd Wheat: Yeah. Another example of that, Jim, is we've seen this and heard this from some of our competitors, for example, is some companies will ask for a consultation or an assessment and depending on how detailed you get, they can use that information and just cut you out of it, really, to implement the suggestions and the infrastructure you have. Typically, in our case, we're careful not to give the exact details, but more of a rough form. But I know I've heard that happening several times where someone will ask for three proposals, they'll take the best one and run with it themselves rather than engaging the consulting firm.
Jim Timberman: Oh yeah. Yes. That happens a lot.
Angel Leon: Speaking of, in a case like this, in case of an emergency, how often and what should we test to ensure that all of our data is safe?
Chadd Wheat: There's several types of disaster recovery tests, Angel. One is simply a paper test where the team sits down and reads through and annotates your plans and makes sure that they seem sound on paper. It's sort of like the white boarding. There's a walkthrough test where you get your stakeholders together, identify any issues or gaps that they see there. And these are all sorts of theoretical tests. Moving on to the more practical ones, there are simulations you can go through. If you have for example, you simulate a disaster and have the team actually act like, but not perform the steps you're going to do. Now, where we start getting into actual system down and testing your systems, you can do a parallel test where you've got, as Jim mentioned before, one to one environment and you pretty much pull the plug on one and make sure it's running. Can also be called a cut over test. And these things are important. And a lot of companies we know, they'll go through the first theoretical parts, but they never truly test the systems where they're running and they pull the plug and see if it'll cut over successfully. I think that the timing of that and how often do you do that is also important.
Jim Timberman: Yeah. And to that point of the timing of it and also what is it that you're going to test from a DR perspective is that we always try to do testing twice a year. We'll do what we call a sample test. We have one or two applications or environments that we'll take down, spin up the new environment and let that run for a few days, a week, maybe two, depending on how comfortable we feel and then switch it back over so that we can see that we've got the redundancy in place and that we're not losing anything between that transaction. And then at the end of the year, usually the other time, semi, annually, the second test we would do would be a bigger, larger sampling where we, again, we are taking multiple applications down and environments down and then spinning the backups back up there and then letting that run again for a couple, three weeks to a month and then dropping things back over. And in some instances too, we may actually look at that and say," Okay hey, we're going to make the backup production, let that stay till the next test." That may run for six months and then in turn, what was the old production, will stay the backup and then we switch things over. And a lot of that's based on what the client is, we've architected with the client.
Chadd Wheat: And sometimes it can, again, depend on the size of the company you're talking about. For example, a small office versus a multi- departmental, divisional manufacturing firm may be more comfortable testing individual departments rather than the whole enterprise.
Jim Timberman: Correct. Yep.
Angel Leon: Interesting. And Chadd, you mentioned in one of your answers earlier about timing. Let me ask you both this question. How do you know when it's time to modernize your methods? For example, at what point does a cloud migration makes sense? How can a company identify their tipping point?
Jim Timberman: That's a tough question to really answer. There's a lot of companies that have fear in the cloud and the fact that it's, I still need to see, I need to open up a door and I need to see blinking lights. That makes a lot of IT directors feel safe. And if I can't see it, I can't touch it and really, it's hard for them to wrap their minds around the fact that there is a giant data center somewhere that these servers are living in. And you just have a little box of it. A lot of times, well when we look at clients that kind of would justify getting them kind of what I would call cloud ready is we look at their spend. Okay, what are you spending today? What's your environment look like? How old is it? And what are you spending to maintain that? And those are the ones that we look at to say," They're definitely ready for cloud readiness or they're ready for cloud movement." Because of the fact that there's a lot of cost savings there and with that, we also look at what are they processing? And what are their hours? How much compute time is actually being spent on average? If it's a 9:00 to 5: 00 organization and they've got servers running, running, running, there's a lot of cost there. Well, if I can migrate them over, put the commute time to kind of shut things down after a certain period of time where I'm not paying for that, that cost drops a little bit. It's kind of almost like a buy versus lease analogy because you really don't own anything in the cloud.
Angel Leon: Interesting.
Jim Timberman: But the benefit you get from it is all your updates. You're not having to worry about continually expand on licenses because it's all included within that uptime. You still have to build the machines out there so your uptime, you're increasing your risk for better uptime and better performance.
Chadd Wheat: A good way to identify maybe when it's time is to bring in an outside firm to do an audit or a security assessment of your systems. And frankly, as Jim sort of pointed out, when you identify there's certain risks in uptime and loss of capital, then you really need to upgrade. And typically any kind of security or IT systems audit could identify the most at risk systems and companies may be able to do it more piecemeal, like transferring all of their email, for example, an easy way to get it on for example, Microsoft Office 365. They migrate their email systems and then other systems as they're identified as vulnerable to a disaster scenario.
Angel Leon: Shameless plug, that's something that we would do, correct?
Jim Timberman: Correct. Yep. We've done a number of what we call cloud readiness assessments and audits in that. And that's usually when we onboard a client, it's one of the first things we really look at is what their environment is, what they have, what is it doing? What's critical? And then look for ways to improve that, not just from a performance perspective, but also from a cost side of things too. A great example of that is we had a client that had two file servers. They had literally close to each one of them had probably close to a terabyte of data on it. There was no applications running on it, it was just file folders. They were just storing documents on it. And what documents are out there? How are you managing those? That kind of leads into kind of a document management type discussion. But at the end of the day, it became, if something ever happened to that, there's key documents out there, key data out there that they need to have so if anything happened to that server, they could lose that.
Chadd Wheat: We touched on that a little bit earlier, Jim, when you talked about disaster scenarios where hardware fails or CPU fails on a box, but Angel, we also have to consider the age and dating of your software versions. We've had clients before that were running such an old version of software because their systems were dependent on it, that the manufacturer, the vendor had stopped supporting it. That opens up a huge gap. What do you do if you have a disaster and the vendor says," Hey sorry, we end of lifed this two years ago." That's another, I guess, vulnerability that people need to look at very, very seriously.
Jim Timberman: Yep. That is true. There's been a number of times where we've come in and there's applications or tools that they have that if moved, they're probably not going to work again.
Chadd Wheat: Biggest example, I guess, is Y2K that everybody probably remembers. Some systems weren't going to work after that.
Angel Leon: Jim, let me go back to something you said at the beginning of this answer about the cloud versus IT managers still wanting to see those servers on a room somewhere. Is there a benefit to either or? What can you tell us about that?
Jim Timberman: In all honesty, there really, there is a benefit to the cloud and there's some still benefits to having stuff on prem. A lot of it is just based on preference and cost. There's a cost savings, capital expense versus CapEx versus OpEx expenditures and depending on how your organization looks at having equipment and the depreciation of that and the liability of that.
Chadd Wheat: Yeah, it's simplifying it. It's almost like that lend versus buy scenario. You're going to pay more to least the equipment and the compute time, however, your vendor's going to be in charge of making sure that you have the sufficient power, that you'll be able to scale up without having to spend a lot of capital on hardware upgrades, software upgrades, infrastructure upgrades.
Jim Timberman: Yep. Definitely true. And it's a great way to look at things, it's a buy versus lease situation. And you can also look to other things, there's the hybrid solutions out there that say," Hey, some things will live out in the cloud, some things will live on prem." That's us coming in and looking at what is there and what makes sense to go out there. A lot more of the benefits of higher up times and achieving those five nines and performance and all that, your risks of getting that are better in the cloud versus being on prem. Due to the fact that you have the ability to spin things up quickly and put them off quickly. A good example would be is if it's the end of the year, you're running at full capacity and your operations are running at full capacity and it's consuming a lot of your bandwidth and compute in your environment, but you need to run these end of the year reports and it's going to hog up even more. Well in those situations, you could spin things up quickly, run those reports, when they're done, shut it down and still be able to maintain your performance levels.
Chadd Wheat: Right. And most cloud based providers also have a way that you can just spin those up, not even owning them where you can sort of borrow them for a fee obviously, but you can spin it up during times of high productivity and then they go offline again when you're not at capacity anymore.
Jim Timberman: Yeah. And there's also that other piece too, of moving data around from a retention perspective, instead of having things all sitting on here, just taking up space, taking up space. It's in the cloud, you can move that around and less expensive storage where you still have access to it, but it's not doing anything, just sitting there. You're not running applications, it's just sitting there, but you can bring it back in as needed. A lot of times you'll have to pay for the movement of that data.
Chadd Wheat: It's a near time as opposed to real time data.
Angel Leon: It's interesting because it could work like its own disaster recovery because you could have data stored on the premises, but then you could also have part of it in the cloud, which should help you like you were saying, Jim, spin it up quickly if you need it. Whereas if something were to happen in that physical device, then you have that backup, if you will, or you have more data in the cloud. That's very interesting. Gentlemen, I want to end this by asking you a question to see if you guys have any disaster recovery stories that you'd like to share with us, something that you can share with us about maybe a situation where you ran into your career, where you had to do something with that.
Chadd Wheat: You always give us the really good questions.
Jim Timberman: That true. Trying to think of a good example.
Chadd Wheat: Let me think of a few. Some of the major ones. I don't know, Jim, you have any personal experience, but it's when somebody's website gets hacked and they have to restore quickly, especially somebody who's making money off the web. One example, Angel, and I'm sure probably everybody listening to this can relate is localized power outages, where they're doing construction on infrastructure near your business park or site and without warning or advanced notice, all of a sudden your building goes dead. For cloud people, their computing power on the cloud that doesn't affect them except it does affect your local workforce. Now in situations like that, like for example, our company Moser Consulting, it's pretty easy to get over. Everybody goes home and works remotely and everything is pretty much how they left it. But that's probably a scenario that most users are familiar with where they're working along and then suddenly at 3: 30 in the afternoon, bam, lights are off, everything's off.
Angel Leon: I think that's a very common scenario, Chadd. I agree.
Jim Timberman: A lot of to Chad's point too, is just run into a lot of recovery situations where the environment, a disk has gone bad. A lot of times we'll have to build a server and we'll need that. That kind of becomes the plan of, Chad had mentioned, we walk through those scenarios of, okay, hey, those tabletop exercises where you're like, hey, if this happens, what do you do? And we really, we have that happen a lot more than we'd like to just because hardware, it's like anything, some parts fail, things break, it's IT. Sometimes things are out of our control. As much as we try to track the breadcrumbs to prevent it from happening, it will happen. We don't take it lightly. And we do take backups very seriously in that we monitor those on a daily basis, both throughout the day, just to ensure that they're running and they happen so if they fail, we need to make sure we're on top of it to make sure it's recovered and get it going so we don't lose any of that data. And we do take disaster recovery very seriously because it's our responsibility to our clients is to ensure they're up and running, whether that would be just in the day to day operations or in a major catastrophe or in situations like we have now with COVID. They need to be productive and that's our responsibility.
Chadd Wheat: Yeah. And I think it all comes down, Angel, this whole conversation and it's a quote most often attributed to Benjamin Franklin," If you fail to plan, you're planning to fail."
Angel Leon: That is a good segue to the ending of this episode, Chadd and Jim. Thank you very much for another wonderful episode today where we learned a lot about disaster recovery and backups for your organization.
Chadd Wheat: You're very welcome.
Angel Leon: Gentlemen, Thank you once again.
Jim Timberman: All right. Thank you.
Angel Leon: And that's it for this week's edition of ASCII Anything, presented by Moser Consulting. We hope you enjoyed this conversation about disaster recovery and we'd love if you would join us next week when we continue to dive deeper with our resident experts and what they're currently working on. Until then, so long, everybody.