Our Partners close more business.

Use these powerful resources to win more business, faster, with less effort.  
Call 877-411-2220 x121 for personal support with any opportunity.

RESET SEARCH

Hosting Quote Estimator

GET a FREE Sandbox or Trial Environment NOW

How To Use This Tool:  

To find answers to common RFP and RFI questions, select a tag, or, search for terms like "security", "performance", etc.  You will find common questions and answers grouped together in one record.  Follow the tag links to refine your search.  Supporting downloads and documentation are available, below.

Please login to obtain download access to additional supporting documentation.  Registered users can also contribute to the database.  You can request access by Contacting Us.

© Omegabit LLC, 2023

Enter a Search Phrase or Select a Tag

Content with tag disaster recovery .

Backups

Q:

When do you backup?

How often do you backup?

Do you conduct backups of user-level information, system-level information and information system documentation including security-related documentation; and protects the confidentiality, integrity, and availability of backup information at storage locations?

Are the backup and restore process and procedures reviewed, and backed up information validated at least annually?

What is the backup schedule and retention on these systems?

If there is an issue, what is the process for a restore?

Can you elaborate on the offsite archive?

RPO/RTO expectations and testing schedule?


A:

 

In the case of most failures Omegabit features full redundancy and fault tolerance at the primary host facility as a function of the private cloud infrastructure.  Full Disaster Recovery is only initiated in the event of a catastrophic facilities failure.

In the event of a catastrophic failure of the physical plant, Client services will failover to one of our secondary DR locations.  Omegabit has the ability to backhaul traffic between private NOCs/POPs. Or, route directly from the DR location.  And, depending on the nature of the failure, can activate BGP to re-route public IPs.

A DR process TOC is available for review on request (much of it is redacted for security).  In the case of most failures, Omegabit provides full redundancy at the primary host facility.  

The standard SLA terms apply. The formal promises for critical faults are (summarized - see SLA for more details):

An initial response time within 2 hours is promised for Severity I issues, and 4 hours for Severity II issues, regardless of notification by automated alert or customer contact.
(actual response time is typically <15minutes for critical issues)

For non-catastrophic events, e.g. equipment or primary storage failure, an RTO not to exceed 12 hours is promised, with an RPO not to exceed 24 hours[1]. 

[1] Assumes worst-case; in a typical failure scenario our redundant cloud infrastructure can tolerate a failure, e.g. a server node, switch path, or disk failure, transparently, or with minor administrative intervention, and recovery in <1hr with no loss of data.

For catastrophic events requiring comprehensive relocation of service to a separate hosting facility, an RTO not to exceed 48 hours is promised with and RPO not to exceed 2 weeks[2] (15-days).
[2] Special terms and retention policies are available on request.  Assumes worst-case disaster recovery scenario from offsite archives; the RPO in this "catastrophic" scenario is more typically <48hrs, from near-line backup.

Please see the supplied copy of the SOW and the sections on backups and Support Ticket and Escalation Procedures for more details. 

This is what is promised OOTB.  Omegabit can accommodate any additional requirements around these expectations as a special request - including hot DR failover.  But, substantial additional costs will apply for both DXP licensing and infrastructure.  

What is offered OOTB is typically the best balance of cost and protection, practically speaking.  If you require more, we'll support it.


Please see the attached copy of the SOW and the sections on backups and Support Ticket and Escalation Procedures for more details. 

This is what is promised OOTB.  We can accommodate any additional requirements around these expectations as a special request - including hot DR failover.  But, substantial additional costs will apply for both DXP licensing and infrastructure.  What we offer OOTB is typically the best balance of cost and protection, practically speaking.  If you require more, we'll support it.

summary:

Backups snapshots of the entire VM stack are performed every 2hrs, and the offsite archives of those backups are continuous to a second remote physical location.  Retention for 2hr snaps for 48hrs, dailys for 30 days, and weeklys for 16 weeks.  We can accommodate longer retention if necessary.  Some of these retention policies impact RPO.  For PCI, you may want logs to last up to 1yr but, that can be accomplished through application design or by depending on our backups.  We recommend using both strategies depending on your reporting needs.

For PCI, you may want logs to last up to 1yr but, that can be accomplished through application design or by depending on our backups.  We recommend using both strategies depending on your reporting needs.

Backups should be considered for disaster recovery purposes only.  Our retention policy is variable and based upon data volume.  Depending upon the environment, rollbacks to the previous day, several days, weeks are available, but with sporadic snapshots between periods.  Therefore, a specific point-in-time recovery may not be possible.  We are typically able to restore backward up to several weeks depending upon the total size of your store.

Backups are automated using a combination of VMWare and Nible Storage technolgies. 

Backups are comprehensive and cover all aspects of internal and Client operations using a  vm snapshot based approach for rapid, transparent backup and recovery.

Backup and recovery procedures are exercised several times per month as a function of normal operations and to support snapshot and rollbacks for customers that are part of their normal development activities.

 

If there is an issue, what is the process for a restore?

We can restore any VM snapshot on file in a matter of minutes (it usually takes about 20 minutes to mount the backup partition and re-fire the image).  Recovering items inside that image is a matter of logging in and parsing the necessary data (files, db backups, etc.).  Typically, if there is a need to restore, we will recover all the dependent VM nodes and simply restart them.  In most cases when we do a recovery it is at the customer's request (e.g., they stepped on something, accidentally), so, we'll do a whole or partial restore based on that need; the restore process is always "informed" as best as possible so that we are not just arbitrarily rolling you back to some point in time without understanding the goals.  In some cases, a partial restore is sufficient.  We will help to inform this decision based on your goals.

 

Can you elaborate on the offsite archive?

Backups and archives are performed at the SAN level (Nimble Storage+VMWare APIs).  Backups are cloned as archives to a redundant SAN at our LA location at One Wilshire/CoreSite, where we operate secondary/backup and DR infrastructure.
 

What is the DR process you have in place?

I've attached a TOC for our DR process (much of it is redacted for security).  In the case of most failures we have full redundancy at the primary host facility.  In the case of a catastrophic failure of the physical plant, we would failover to one of our secondary locations.  In your case @OneWilshier in Los Angeles.  We have the ability to backhaul traffic, or route directly from LA.  And depending on the nature of the failure, can activate BGP to re-route public IPs.
 

RPO/RTO expectations and testing schedule?

Recovery testing is schedule quarterly but actuated much more frequently as a function of supporting our production customers.

The short answer is that we can recover from most failures transparently, or at least automatically by failover in our cloud.  All network and cloud infrastructure is fully redundant.  

Your Liferay portal may or may not be redundant depending the setup; it would have to be clustered.  Presently we are not discussing an application cluster.  So, the vhost nodes themselves are a single point of failure.  

If a physical server fails, the vhost will automatically restart on another server and automatically rejoin service (minutes, typically).  System failures that require intervention may take longer to resolve, but we are typically responding within 15 minutes.  

If a network layer were to fail, it is usually transparent due to redundancy. 

Practically speaking, our reaction is very fast and we will respond aggressively to any interruption or degradation in service, at any hour.  
 



No comments yet. Be the first.

Backup Testing

Q:

Provide Evidence of last BC/DR test and results.

Do you have Backup and DR test and results?


A:

Passed with no successful exploits or exceptions; May 2017; details cannot be divulged due to its proprietary and sensitive nature.

Omegabit features an comprehensive and robust high-availability host infrastructure with redundancy and alternate location disaster recovery capabilities including multiple layers of data backup and archive. This includes daily local SAN and backup snapshots, and offsite archives. See SOW-SLA for standard terms, which can be adjusted to meet the specific needs of this implementation. Passed last tests with no successful exploits or exceptions; May 2017; details cannot be divulged due to its proprietary and sensitive nature.



No comments yet. Be the first.

Disaster Recovery

Q:

Is there a plan for Incident Response?

Do you have a Disaster Recovery Document?

Do you have policy and procedures which document your business continuity (BC) and disaster recovery (DR)?

Do you have BC/DR plans that assure the continuity of service and products provided to meet client's RTO and/or RPO?

Are roles and responsibilities documented in the contingency plans?

Do you conduct business impact analysis at least annually?

Do you provide contingency training to your staffs according to assigned roles and responsibilities at least annually?

Have you conducted BC/DR tests/exercises on this system with all appropriate parties in the last 12 months and revise the plans to address changes and problems encountered during implementation and testing?

Is the system included in your organization's business continuity and disaster recovery (BC/DR) plan?

In terms of crash and DR Omegabit offers multiple redundant layers of protection including but not limited to:

In terms of crash and DR recovery Omegabit offers multiple redundant layers of protection including but not limited to:

What type of business continuity and disaster recovery options are included as part of this solution? Is this part of the standard services?

How are the backup data stored?


A:

This is documented in Omegabit Internal Operations Wiki.

This is documented in Omegabit Disaster Recovery Handbook, Section 1.1 to 1.4 and Section 2.3

ref: Omegabit Disaster Recovery Plan TOC

Yes. Per agreed upon SLA. 

Yes. ref: Omegabit Disaster Recovery Plan TOC

Yes. ref: Omegabit Disaster Recovery Plan TOC

Yes. ref: Omegabit Disaster Recovery Plan TOC, Omegabit Operations Portal, and Training curriculums

Yes. The DR plan was recently exercised and updated in Q2 of 2017. A certified statement can be provided by executive management certifying this, provided the vetting proceeds to the next round.

ref: Omegabit Disaster Recovery Plan TOC

● Logical and physical redundancy at the VMWare, JVM, repository and other critical layers of the runtime environment stack

● Warm-spare redundant Liferay architecture (proposed)

● Server failover capability

● Rapid nearline backup recovery

● Comprehensive off site DR for catastrophic failure

In the event that a high-availability portal configuration is required, redundant nodes of the HA configuration will be purposefully isolated to discrete server and backend infrastructure as a complement to that logical HA configuration, to the benefit of higher reliability and faster recovery under various logical/physical architecture failure scenarios.

Omegabit operates comprehensive SNMP and service level monitoring of all configured hosts and services.  Triggers are adjustable and set by default to detect failures as well as symptoms of imminent failure.  Monitor alerts are responded to by live personnel, 24x7x365, and acted upon according to severity, per the terms of our SLA.

The core physical host infrastructure is inherently HA in terms of disk arrays, storage and network paths, physical servers, switching, etc.  Omegabit operates a modern VMWare based infrastructure.  In the case of most physical failures services are designed to continue transparently with no observable interruption to operations.  In the case of logical failures, the VM, JVM, and Liferay backend service configuration is proposed as an HA setup, to practical limits.  If a higher level of resilience is required than is proposed, we are able to accommodate that as additional scope.  Disaster Recovery (DR) is an inherent component of the regular day-to-day operations performed by Omegabit, as a core function of the hosting operations is supplied for all tenants.

Omegabit offers multiple redundant layers of protection including but not limited to:

● Logical and physical redundancy at the VMWare, JVM, repository and other critical layers of the runtime environment stack

● Warm-spare redundant Liferay architecture (proposed)

● Server failover capability

● Rapid nearline backup recovery

● Comprehensive off-site DR for catastrophic failure

Backups snapshots of the entire VM stack are performed every 2hrs, and the offsite archives of those backups are continuous to a second physical location.  Retention for 2hr snaps for 48hrs, dailys for 30 days, and weeklys for 16 weeks.  We can accommodate longer retention if necessary.  Some of these retention policies impact RPO.  

For PCI, you may want logs to last up to 1yr but, that can be accomplished through application design or by depending on our backups.  We recommend using both strategies depending on your reporting needs.

Backups should be considered for disaster recovery purposes only.  Our retention policy is variable and based upon data volume.  Depending upon the environment, rollbacks to the previous day, several days, weeks are available, but with sporadic snapshots between periods.  Therefore, a specific point-in-time recovery may not be possible.  We are typically able to restore backward up to several weeks depending upon the total size of your store.

 

Omegabit can provide additional backup and archival services to meet specific requirements on a needs basis.  Please contact your sales representative for more information.

 

Omegabit features a comprehensive alternate-site DR recovery plan that includes regular off-site archives using Omegabit owned and managed equipment.  Backup to the public cloud (e.g. Amazon), is optional but requires special arrangement and may not be compatible with some PII/HIPAA requirements.  Specific features for disaster recovery vary by tier of service; please see the SOW for complete details on RTO/RPO times and obligations.

 



No comments yet. Be the first.

Event and Incidence Response

Q:

Is there a plan for Incident Response?

Do you employ an automated mechanism to integrate audit review, analysis, and reporting process to support incident response, continuous monitoring, contingency planning, and audit?


A:

This is documented in Omegabit Disaster Recovery Handbook, Section 1.1 to 1.4 and Section 2.3 ref: Omegabit Disaster Recovery Plan TOC

We consider any intrusion to be a "Severity I" class event.  Which, per our SLA requires a response time of <2hrs.  In reality, our 24x7 operations staff will respond to any Severity I event the instant it is detected.  I've attached  boiler-plate/generic copy of our SLA and am happy to produce a named/completed version for a given stack if of benefit (this one calls out "sticker" and a la carte pricing and is OK to share as-is).  


In the event of any detected incident, including known breach or ongoing security intrusion, performance degradation, major change in performance profile, etc., notice is provided to the customer as soon as practical. Typically, within minutes provided we are not actively triaging and providing emergency responding to the issue.  We consider the ownership of security related issues to be a joint-responsibility along with the application sponsor.  Because, in many cases the risk may be specific to the customer's implementation of the portal, and we must be in communication with the owner to permanently resolve the issue if not to triage and block it, temporarily.  A 24hr point of contact with the customer is preferred for emergency incident notification, which can include email and telephone contact depending on severity and customer preference.

 

Yes, for specific audit log event analysis we use a combination of tools including but not limited to Splunk and Dynatrace SaaS; additional software subscription and fees apply for a dedicated implementation of reporting facilities for Client use and self-reporting.



No comments yet. Be the first.

Hosting Provider - Security Documentation

Q:

Provide security documentation for your proposed solution. This should include security diagrams and other documentation such as architecture, policies, procedures, and compliance with laws SSAE-16, HIPAA, SOX, FedRAMP, etc. Security patches and software upgrades should be current, and backup procedures for remote files and databases should be put in place. Third party software integration should be verified. Please attach the Data Center Security Guide, including but not limited to: • Physical, Admin and Technical Security Controls; • Data Breach Notification Procedures ; • Security Program; and • System Upgrade Policy.


A:

Omegabit facilities and operations are SOC-2 certified in direct cooperation with its facilities partner, Digital West Networks, Inc.  A copy of the primary facilities SOC-2 certification is included with this submission and supplemental references for secondary operations are available on request.  Omegabit also features secondary Disaster Recovery and Point of Presence operations at facilities managed in partnership with Digital West and industry leading suppliers Equinix, and CoreSite featuring the most modern and compliant plant and core operations available in support of application specific certifications.  Omegabit directly owns and controls all infrastructure extending from the Internet drops: servers, firewalls, edge switching, storage, etc., and relies on its partner facilities for high-availability cooling, power, and physical plant security as well as emergency hands-on operations.

 

Omegabit follows extensive security protocols following best practices for PCI, FERPA, FEDRAMP and similar compliance modeled on industry standards and best practices. This includes emphasis on traditional IT and host infrastructure security for Internet providers, as well as specialized training relating to custom application designs and the implementation of Liferay, specifically. Many practices are modeled after requirements established from its broad base of customers operating sensitive applications for finance, healthcare, government, education and similar purposes. Omegabit is able to support most any compliance requirement and typically will establish operational policies that are considerate of best practices and the specific requirements of the Customer.

 

A Table of Contents (TOC), outlining procedural content has been provided for reference; full content is obfuscated due to its proprietary and sensitive nature. The following outlines are provided: "Omegabit Disaster Recovery PlanTOC", "Omegabit IT Security TOC", and the "Omegabit Employee HandbookTOC". Collectively, these documents cover many of the issues identified in this list. Other items are covered by our Operational Wikis. A supplemental document titled "Omegabit Information Security Questionnaire" is also included, which addresses the most common questions concerning overall security practices, capabilities, and options.  Omegabit is also able to maintain a custom policy and procedures for customers with special needs, e.g. PCI or similar compliance. A sample policy statement titled "Omegabit Operations Policy Guidelines and Recommendations - Redacted Generic" has been provided of an example maintained with a PCI compliant tennant.



No comments yet. Be the first.

Risk Management - Plan & Documentation

Q:

Is there a formal and documented process for addressing identified risk (e.g. tracking risk ownership, action plans and milestones)?

Do you have an enterprise-wide risk management program that designates individuals to fulfill specific roles and responsibilities within the organizational risk management process?

Third-party Oversight or Risk Management Plan?

Are risk findings/issues tracked, reported, and taken appropriate actions for remediation in an appropriate amount of time on an ongoing basis?


A:

ref: Omegabit Internal Operations Wiki and Customer Environment Ticketed Request system, Omegabit Operations Portal, Omegabit IT Security Handbook

ref: Omegabit Disaster Recovery Plan TOC

Due to the nature of our business and services, Risk Management is an inherent part of our DR planning lifecycle and includes business factors including finance, infrastructure, personnel, liabilities, etc. A quarterly assessment of these risks is performed as part of our regular strategic planning lifecycle. This information is proprietary.

Liferay executes regular security assessments and publishes hotfixes and notifications concerning newly discovered threats within the Liferay framework. It is the responsibility of the Client or application sponsor to determine the applicability of these risks and to integrate published fixes into any custom built software. As the runtime manager, Omegabit assumes responsibility to assist with the deployment of any/all compatible security related patches or changes to the Liferay runtime and its supporting components (OS, DB, Web acceleration, etc.; the "stack", collectively), which are provided or approved for use by the application sponsor. As this relates to hosting and runtime operations, notifications are also provided concerning any relevant security or stability related risk or action. This is addressed in the hosting SLA. Circumstances relating to security or other immediate threat are escalated and responded to with the highest internal priority. Important Note: Most host providers will <not> monitor or respond proactively to risks at the OS level or inside the Liferay application container. This is a noteworthy and unique benefit of Omegabit Liferay Enterprise Portal Hosting services, which monitors and assumes responsibilty for <ALL> layers of the application infrastructure and Liferay runtime. And, maintains specific operational awareness and sensitivity to the purpose and compliance requirements of its Client's hosted environments. Omegabit monitors, manages, and responds to all relevant threat conditions - malicious or otherwise- proactively, at all layers of the infrastructure on the behalf of it's Client tennats.



Add Comment
Posted on 8/8/21 4:25 AM.