Our Partners close more business.

Use these powerful resources to win more business, faster, with less effort.  
Call 877-411-2220 x121 for personal support with any opportunity.

RESET SEARCH

Hosting Quote Estimator

GET a FREE Sandbox or Trial Environment NOW

How To Use This Tool:  

To find answers to common RFP and RFI questions, select a tag, or, search for terms like "security", "performance", etc.  You will find common questions and answers grouped together in one record.  Follow the tag links to refine your search.  Supporting downloads and documentation are available, below.

Please login to obtain download access to additional supporting documentation.  Registered users can also contribute to the database.  You can request access by Contacting Us.

© Omegabit LLC, 2023

Enter a Search Phrase or Select a Tag

Contenidos con etiqueta dr .

Backups

Q:

When do you backup?

How often do you backup?

Do you conduct backups of user-level information, system-level information and information system documentation including security-related documentation; and protects the confidentiality, integrity, and availability of backup information at storage locations?

Are the backup and restore process and procedures reviewed, and backed up information validated at least annually?

What is the backup schedule and retention on these systems?

If there is an issue, what is the process for a restore?

Can you elaborate on the offsite archive?

RPO/RTO expectations and testing schedule?


A:

 

In the case of most failures Omegabit features full redundancy and fault tolerance at the primary host facility as a function of the private cloud infrastructure.  Full Disaster Recovery is only initiated in the event of a catastrophic facilities failure.

In the event of a catastrophic failure of the physical plant, Client services will failover to one of our secondary DR locations.  Omegabit has the ability to backhaul traffic between private NOCs/POPs. Or, route directly from the DR location.  And, depending on the nature of the failure, can activate BGP to re-route public IPs.

A DR process TOC is available for review on request (much of it is redacted for security).  In the case of most failures, Omegabit provides full redundancy at the primary host facility.  

The standard SLA terms apply. The formal promises for critical faults are (summarized - see SLA for more details):

An initial response time within 2 hours is promised for Severity I issues, and 4 hours for Severity II issues, regardless of notification by automated alert or customer contact.
(actual response time is typically <15minutes for critical issues)

For non-catastrophic events, e.g. equipment or primary storage failure, an RTO not to exceed 12 hours is promised, with an RPO not to exceed 24 hours[1]. 

[1] Assumes worst-case; in a typical failure scenario our redundant cloud infrastructure can tolerate a failure, e.g. a server node, switch path, or disk failure, transparently, or with minor administrative intervention, and recovery in <1hr with no loss of data.

For catastrophic events requiring comprehensive relocation of service to a separate hosting facility, an RTO not to exceed 48 hours is promised with and RPO not to exceed 2 weeks[2] (15-days).
[2] Special terms and retention policies are available on request.  Assumes worst-case disaster recovery scenario from offsite archives; the RPO in this "catastrophic" scenario is more typically <48hrs, from near-line backup.

Please see the supplied copy of the SOW and the sections on backups and Support Ticket and Escalation Procedures for more details. 

This is what is promised OOTB.  Omegabit can accommodate any additional requirements around these expectations as a special request - including hot DR failover.  But, substantial additional costs will apply for both DXP licensing and infrastructure.  

What is offered OOTB is typically the best balance of cost and protection, practically speaking.  If you require more, we'll support it.


Please see the attached copy of the SOW and the sections on backups and Support Ticket and Escalation Procedures for more details. 

This is what is promised OOTB.  We can accommodate any additional requirements around these expectations as a special request - including hot DR failover.  But, substantial additional costs will apply for both DXP licensing and infrastructure.  What we offer OOTB is typically the best balance of cost and protection, practically speaking.  If you require more, we'll support it.

summary:

Backups snapshots of the entire VM stack are performed every 2hrs, and the offsite archives of those backups are continuous to a second remote physical location.  Retention for 2hr snaps for 48hrs, dailys for 30 days, and weeklys for 16 weeks.  We can accommodate longer retention if necessary.  Some of these retention policies impact RPO.  For PCI, you may want logs to last up to 1yr but, that can be accomplished through application design or by depending on our backups.  We recommend using both strategies depending on your reporting needs.

For PCI, you may want logs to last up to 1yr but, that can be accomplished through application design or by depending on our backups.  We recommend using both strategies depending on your reporting needs.

Backups should be considered for disaster recovery purposes only.  Our retention policy is variable and based upon data volume.  Depending upon the environment, rollbacks to the previous day, several days, weeks are available, but with sporadic snapshots between periods.  Therefore, a specific point-in-time recovery may not be possible.  We are typically able to restore backward up to several weeks depending upon the total size of your store.

Backups are automated using a combination of VMWare and Nible Storage technolgies. 

Backups are comprehensive and cover all aspects of internal and Client operations using a  vm snapshot based approach for rapid, transparent backup and recovery.

Backup and recovery procedures are exercised several times per month as a function of normal operations and to support snapshot and rollbacks for customers that are part of their normal development activities.

 

If there is an issue, what is the process for a restore?

We can restore any VM snapshot on file in a matter of minutes (it usually takes about 20 minutes to mount the backup partition and re-fire the image).  Recovering items inside that image is a matter of logging in and parsing the necessary data (files, db backups, etc.).  Typically, if there is a need to restore, we will recover all the dependent VM nodes and simply restart them.  In most cases when we do a recovery it is at the customer's request (e.g., they stepped on something, accidentally), so, we'll do a whole or partial restore based on that need; the restore process is always "informed" as best as possible so that we are not just arbitrarily rolling you back to some point in time without understanding the goals.  In some cases, a partial restore is sufficient.  We will help to inform this decision based on your goals.

 

Can you elaborate on the offsite archive?

Backups and archives are performed at the SAN level (Nimble Storage+VMWare APIs).  Backups are cloned as archives to a redundant SAN at our LA location at One Wilshire/CoreSite, where we operate secondary/backup and DR infrastructure.
 

What is the DR process you have in place?

I've attached a TOC for our DR process (much of it is redacted for security).  In the case of most failures we have full redundancy at the primary host facility.  In the case of a catastrophic failure of the physical plant, we would failover to one of our secondary locations.  In your case @OneWilshier in Los Angeles.  We have the ability to backhaul traffic, or route directly from LA.  And depending on the nature of the failure, can activate BGP to re-route public IPs.
 

RPO/RTO expectations and testing schedule?

Recovery testing is schedule quarterly but actuated much more frequently as a function of supporting our production customers.

The short answer is that we can recover from most failures transparently, or at least automatically by failover in our cloud.  All network and cloud infrastructure is fully redundant.  

Your Liferay portal may or may not be redundant depending the setup; it would have to be clustered.  Presently we are not discussing an application cluster.  So, the vhost nodes themselves are a single point of failure.  

If a physical server fails, the vhost will automatically restart on another server and automatically rejoin service (minutes, typically).  System failures that require intervention may take longer to resolve, but we are typically responding within 15 minutes.  

If a network layer were to fail, it is usually transparent due to redundancy. 

Practically speaking, our reaction is very fast and we will respond aggressively to any interruption or degradation in service, at any hour.  
 



No hay ningún comentario aún. Sea usted el primero.