Introduction
These articles describe our hosting infrastructure and services. You'll also find information about our support SLAs, planned maintenance, deployments, and disaster recovery options.
Environment and Infrastructure
Environment Names
We use standard naming conventions across all of our hosting environments.
Environment | Description |
---|---|
Local | A developer's personal workspace, running on their own machine. It may connect to a shared development environment database, iCM or API Server |
Development | A GOSS-hosted environment that developers can use for new functionality and fixes. This environment may be unstable and is likely to change regularly. This is a place for active development and is not suitable for formal testing |
Test | A stable environment used by everyone to test the latest development work. We have dedicated test environments so work can be tested while other work goes on in the development environment |
Pre-Production | A replica of the production environment used as a staging area. This environment should be kept in sync with the production environment and serves as a trial run for deployments. It may also be used to replicate support issues found in the production environment |
Production | The live, public-facing website/intranet |
DR | A Disaster Recovery environment |
Infrastructure
The GOSS Cloud Services infrastructure is powered by various public cloud providers including Amazon Web Services and Google Cloud Platform (both in the London regions), enabling us to deliver a cloud-agnostic, high performance, high security, and most of all, highly reliable platform for the delivery of client services. All data is stored, processed and managed in the UK and we are a UK-based service provider.
The infrastructure allows us to seamlessly increase the resources for a particular service or expand the resilience and capacity of a service by adding more virtual machines to the environment. Tenants of the platform are fully segregated using logical security controls, dedicated private networks and dedicated resource reservations.
Please note that additional storage can be purchased if required.
Scaling
Services are scaled to meet client transactional volumetric requirements and anticipated growth, in line with the suggested fair usage policy, as detailed in the Service Definition Document. If volumes exceed fair usage the service can be scaled-up to cope with additional demand by moving to the next appropriate service level. By using analysis tools, each individual server and service is carefully calibrated to achieve optimum efficiency and performance.
Availability
We offer two SLAs, covering your production environments:
- 99.99% for network availability
- 99.95% website availability
Non-production environments are not covered by the above SLAs. We offer service credits should availability fail to meet these SLAs - refer to your Service Terms and Conditions document.
Energy Efficient Hosting
To reduce the environmental impact of our hosting infrastructure we have implemented policies that will power-down servers when they are not being used - primarily overnight. The following table outlines our standard schedules.
You can request a change to the times below by contacting a member of the support team.
Environment | Available | Notes |
---|---|---|
Production | Always on | We do not power-down production environments. These servers will be available as per our published SLAs |
Pre-production | 07:00 - 19:00 weekdays | The pre-production staging area is available during extended business hours |
Test | 07:00 - 19:00 weekdays | Test servers are available during extended business hours |
Development | On demand - 19:00 | Development servers are, by default, powered-down. They are enabled during active project work, during the upgrade/patching process, and while resolving support issues. Development servers can be enabled with the relevant permission levels via the MyGOSS platform as and when they are needed. Alternatively, a GOSS support technician, developer, or member of the hosting team can enable the development servers on request. |
PCI DSS Compliance
Please note that we are not currently assessed against the Payment Card Industry Data Security Standard. All payment actions are performed via third party hosted payment providers. Form payment fields redirect users to those payment providers and no payment details (card number etc) are entered into or stored by our platform or hosting infrastructure.
Security
All of our internet facing infrastructure has been assessed and certified under the Cyber Essentials Plus scheme developed by the UK National Cyber Security centre.
Our information security management systems are certified as meeting the ISO 27001 information security standard.
We carry out penetration testing on an annual basis and encourage clients to perform their own as required.
DDos Protection
We make use of AWS Shield Standard and rate limiting features to mitigate DDoS attacks.
Backup & Restore
Overview
The data for Production environments is backed up on a daily basis (see Disaster Recovery for enhanced options). Backups from other environments are made twice a week. Backups are encrypted and stored locally in the cloud provider's native object store (such as AWS S3).
Data backed up and archived by our own backup jobs includes, but is not limited to, the databases, file repositories, configuration data, SSL certificates and other files or data required to rebuild the environment from scratch in the case of a disaster.
All backups are retained for a maximum of 30 days.
Backup Window
Scheduled window for backups is between midnight and 6am. In some cases database backups may be scheduled in advance of the normal backup window, between 6pm and midnight.
Environments not running are temporarily switched on during their backup window in order for the backup to complete. This means that, regardless of the power schedule of an environment, a backup is still taken daily.
Impact & Monitoring
We aim to keep the impact of backup jobs to a minimum, however due to the nature of large file transfers and other operations some impact on end user response times is to be expected during the backup window. We do not expect any down time or impact on functionality whilst backups are taken.
The status of backup jobs are monitored, and failures are investigated during working days as part of a daily checklist. We may re-run backup jobs during the day time if the impact is deemed to be minimal.
Production Environments
In addition to the above, backups for Production environments are also replicated to another cloud provider as an offsite replica. For servers, part of Production environments' native cloud provider full disk snapshots (for example AWS EBS snapshots) are taken as well.
Other
In addition to our normal backups as described above, web server access logs are stored locally for 90 days.
Restores
Disaster Recovery
In the unlikely event of a disaster where a Production environment is unavailable, backups will be used recover the environment at a secondary cloud provider.
For more information see Disaster Recovery.
Environment Restore
GOSS can restore the data from an environment back to the same environment, or other environments in the same sequence (for example restore Production backup to the Development environment). During a restore all the data is restored, however some configuration may be different from the source environment:
- IP restriction configuration will be backed up and restored from the original environment data. This ensures that sites that should be restricted are still restricted after a restore has taken place
- Subsite URLs and Aliases will be backed up and restored from the original environment data. This ensures that sites can still be accessed under their original URLs after a restore has taken place
- SSL certificates will not be restored from the original environment
A restore requires downtime - while a restore takes place to an environment it will be inaccessible.
The process for this type of restore is automated but a checklist is followed afterwards to ensure everything completed successfully.
Individual Files
File level restores can be completed by GOSS when needed, for example restoring individual media items may be possible without restoring all of the environment data. The process for this bespoke and will be considered per request.
Monitoring & Alerts
Server Performance Metrics
Agent based monitoring is used for measuring utilisation of various resources at a server level, for example CPU, memory, disk and network usage. The agents are also used to probe the local applications and ensure 3rd party dependencies are reachable, such as dependencies accessed via a site to site VPN.
Important metrics such as CPU load statistics are collected every 1 minute. Thresholds for alerting are configured for most metrics.
Currently Zabbix is used for this purpose. Agents are deployed automatically and standardised templates are attached to hosts automatically.
Application Performance Metrics
We have started to introduce application level monitoring by integrating OpenMetrics libraries into our own applications and releases. This includes, but is not limited to, HTTP response code metrics, request latency metrics and internal application statistics. This provides the flexibility to narrow down performance issues to specific areas of our stack, or specific templates in the website.
All metrics are collected and alerting rules are evaluated every 15 seconds. Currently Prometheus and Thanos are used for the server side, and language specific libraries are implemented client side.
Cloud Native Metrics
In each cloud provider we utilise, we also configure monitoring and alerts in their native monitoring service for aspects of the platform we cannot monitor ourselves. In the case of AWS CloudWatch is used.
External Monitoring
In addition to our own monitoring solutions we also implement external probes using StatusCake to provide independent assurance of availability. External probes are run every 1 minute.
Alerting
All environments generate alerts for configured thresholds around capacity, latency and availability. Some alerts will trigger automated restarts of services in an attempt to automatically resolve issues without intervention.
For Production environments alerts are escalated automatically. After 10 minutes an SMS is sent to the on call engineer, with reminders sent every 5 minutes if an alert has not been resolved or acknowledged. An automated phone call and escalation to a backup engineer will also happen after 25 minutes if the alert still hasn't been resolved or acknowledged.
Client Alert and Metrics Access
As it currently stands we do not provide direct access to any of the metric data we collect, except for the external StatusCake availability dashboard.
All alerting is internal to us. We will inform you of any alerts that caused a significant impact on availability.
Scheduled Maintenance
Overview
During a weekly scheduled maintenance window various changes are automatically made to environments, including but not limited to:
- Operating system updates and patches are applied, including service packs where relevant
- Infrastructure level application updates are applied, for example monitoring agents
- GOSS Platform components and dependencies are updated, for example iCM, API Server workers, Java, etc
- GOSS Platform standardised site, which includes the latest releases of all site/framework components and products (note that not all clients are on the standardised site pipeline)
- Configuration changes such as SSL hardening will be applied
During the scheduled maintenance, applications, services and hosts may be restarted which can result in a brief disruption to the running services depending on the configuration of your hosted environment and the nature of changes applied.
Web Applications are updated during a dedicated Monday Morning maintenance window. These are handled separately to wider application maintenance tasks, to prevent unexpected maintenance overlap issues. The web applications are responsible for the functional display of your subsites/content to the internet. Things like the iCM Application and the various services supporting this are managed by the global scheduled maintenance patterns already described.
Scheduling
Scheduled maintenance jobs run between 00:00 and 05:59 on the following days of the week.
Environment | Day |
---|---|
Development | Friday |
Test | Thursday |
Pre-Production | Wednesday |
Production | Tuesday |
DR | Tuesday |
Web Applications - All environments | Monday |
Staging
All application versions and operating system updates are staged between environments to ensure adequate testing happens before updating Production.
The schedule above ensures that there will always be six days between a given job being performed and it progressing to the next environment. For example, a fix deployed to a development environment on a Friday won't be promoted to the test environment until the following Thursday.
This results in a total lead time between development and production of 18 days, giving ample time for testing. If an issue is found with any application or update the promotion of the versions to the next environment will be delayed whilst the issue is being investigated.
Web applications follow a seven day rolling schedule between environments. For example a new Web Application release deployed to a Development environment on a Monday morning won't be promoted to the Test environment until the following Monday. By default, this results in an average lead time between development and production of 21 days, however this can be lowered to a minimum of 17 days if a release is required throughout the week.
Weekly and Daily Checks
Every working day an engineer is assigned to run through daily checks which includes but is not limited to:
- Checking scheduled jobs such as backups, maintenance and automated DR jobs for any failures
- Investigate low priority informational alerts
View Scheduling Information
You can see an overview of product versions, deployment histories, and scheduled deployments by logging into MyGOSS (opens new window) and following the link to "My Platform".
Logging & Debugging
Overview
We use several mechanisms to log events and debug our applications:
- Log shipping and centralised log aggregation is used to index and process log files
- Distributed tracing is used to collect request information across the application stack
Logging
Log files written locally to the servers are shipped to a central log aggregation cluster. This includes but is not limited to:
- Web server access and error logs
- Framework, iCM, API Server and other application logs
- Mail relay logs
- Windows events
We currently use Filebeat and Winlogbeat for this. Within ElasticSearch log files are indexed and split into specific indexes depending on the application, log file and permission requirements.
Select GOSS staff have access to search through and visualise the data through Kibana to troubleshoot issues.
Distributed Tracing
Our application stack implements Jaeger Tracing libraries to provide an implementation of OpenTracing. This helps us trace requests through our stack and narrow down where errors or latency are being introduced.
Select GOSS staff have access to search through the tracing data.
Support Operations
Platform Support
Day to day support is managed by technicians on our support desk, overseen by the Service Desk Manager.
Hours
08:00 - 18:00, Monday to Friday, excluding bank holidays in England
Support Channels
We use Jira for ongoing support management. See Raising Support Tickets in Jira for more information about raising support tickets. Our support team receive immediate alerts for all new support tickets raised which are quickly triaged for impact and urgency.
You can also access support by emailing ticket@support.gossinteractive.com; or for more general advice and guidance our Slack Community can be a useful place to ask.
For critical incidents during support hours, you can contact the support team on 0844 880 3637.
Support Service Levels and Priorities
Issue resolution will be in line with the targets below. Severity levels are assigned by the support desk.
Severity | Priority | Target Response | Feedback Frequency Within | Temporary Fix | Permanent Fix |
---|---|---|---|---|---|
Critical | P1 | 2 working hours | Every 2 working hours | 24 hours | 5 working days |
Major | P2 | 5 working hours | Daily | 3 working days | 10 working days |
Minor | P3 | 1 working day | Weekly | 10 working days | 30 working days |
Priority 1 - Critical
Extremely urgent issues, for example complete loss of system or something that prevents users from effectively using a significant element of the system.
Priority 2 - Major
Issues that materially affect usage or a majority of users from using the system effectively.
Priority 3 - Minor
Anything which affects part of the system but does not materially affect its operation and allows users to continue using the system. Low priority reported issues may, at the GOSS' sole discretion, be categorised as non-support matters and addressed during appropriate product backlog development.
Service Exclusions
The following services may incur additional charges:
- Out-of-hours upgrade, patch and update management
- Out-of-hours deployment management
- Assistance to investigate or resolve non fault-related issues
- Consultancy
- Training
- Investigation of issues not attributable to GOSS, for example, misuse of the software/service, issues caused 3rd party systems not managed by GOSS, etc
Please note that support is provided to, and support tickets can only be raised by, users who have received training on the GOSS Platform and products. Questions and advice around implementation and configuration are out of scope of our support contracts.
Non-Production Environments
Support service levels and priorities cover production environments. Non-live environments, for example development and test environments, are not covered by our SLAs. Any issues in these environments should be addressed as part of ongoing work.
Hosting Out of Hours Support
All production and DR environments are provided with 24/7/365 third line support by our network engineers. Pre-production, test and development environments are managed and maintained during GOSS business hours.
When you host with us we will provide you with out of hours contact details.
Releases and Deployments
Direct access to production, pre-production and DR servers is restricted to members of the GOSS hosting team. Deployments to these environments are performed via automated processes and/or release management tools.
Release Management
We use release management tools to control artifact releases into a hosted repository, which includes versioned releases of our products and web applications.
Applications are deployed using continuous integration mechanisms. Releases to pre-production and production environments are carried out in a phased and controlled manner via sequenced build pipelines. Our build pipelines are also responsible for applying the correct configuration for each environment, which is stored separately from released artifacts using an encrypted secrets management tool.
Releases that include "content", for example forms or workflow models created via the CMS, follow the same structured rollout through environments.
GOSS Project and Support Releases
These deployments are scheduled by either a GOSS project manager or support technician. Approval is required from you via the change control process before a deployment can be made.
Deployment schedules and frequency vary according to your service plan.
Client Code
We may allow your own developers access to the web application and repositories to carry out your own development work. Where such access is granted, your developers will have the ability to push snapshot builds to your development environment, and submit merge requests. Deployments beyond your development environment are subject to our standard change control process.
We reserve the right to charge for any time required to fix and resolve problems caused by deployments of your code.
GOSS Standard Change Control Process
The change control process covers any work that requires a code deployment or change to your development, test, pre-production, production and DR environments.
Project Work and Support Fixes
Changes are managed by a GOSS project manager or support technician as appropriate. Change control documents are completed by the relevant GOSS developer and will detail:
- The changes being deployed
- The reason for the changes and appropriate references
- The deployment process
- The testing process once the change has been made
- The rollback process should the change not work as expected
- The risk of the change being deployed
Once approved a date will be scheduled for the changes to be made. Changes are deployed between 09:00 and 15:30, Monday to Friday, for pre-production and DR environments, and between 09:00 and 15:30 Monday to Thursday for production environments.
General Planned Maintenance
Planned maintenance change controls include:
- The date and time of the planned work
- Any potential downtime
- Any content freeze requirements
- The changes being deployed
- The reason for the changes
- The deployment process
- The testing process once the change has been made
- The rollback process should the change not work as expected
- The risk of the change being deployed
Where possible, we provide at least seven days' notice for any planned maintenance and, if there is a risk of an outage, will try to schedule the work outside of business hours to minimise impact.
Client Specific Maintenance
Maintenance tasks specific to a single client environment are managed in the same way as platform support fixes.
Emergency Work
Emergency and urgent maintenance work can be carried out without notice, at such times as we feel reasonable.
If emergency work or maintenance is to take place during normal support hours prior notice will be given by telephone or email.
If emergency work or maintenance is to take place outside of these hours notification will be given via telephone or email on the next working day.
Client-Hosted Servers
We are not responsible for the maintenance or management of any non-hosted servers. Support for your own development and test servers is excluded from the GOSS Digital Platform Support and Maintenance contract. Any requests for support, upgrades, patches and update deployments will be chargeable.
Complaints and Escalations
Please see www.gossinteractive.com/complaints (opens new window) for our complaints and escalation policy.
Disaster Recovery
You can pair any hosting service with any of our DR options. Please refer to your contract or managed service agreement for the level that applies.
DR Options
We offer the following DR packages. Note that these are the maximum times we set for responding to a disaster and will always work to restore services as quickly as possible.
Timescales and Objectives
RTO - Recovery Time Objective. This is the maximum length of time it will take to restore the service.
RPO - Recovery Point Objective. The maximum age of the data that will be restored (ie, how frequently the service is backed-up).
Name | RTO | RPO |
---|---|---|
Basic | Best Endeavours | 24 |
Standard | 10 | 24 |
Corporate | 6 | 12 |
Premium | 4 | 8 |
Always On | 2 | 4 |
Basic
The basic package is available to all of our hosting offerings. It uses a nightly backup which will be manually restored by one of our engineers. We do not offer a guaranteed RTO as other DR options will take priority.
Standard
As with the Basic option, services are recovered from the previous night's backup. As well as guaranteeing an RTO, this option runs a daily "verify" job which restores the backup to a dynamically generated environment, and performs automated tests of the restored services before closing down.
Corporate and Premium
These options offer different levels of replication to a dedicated DR environment. Unlike Basic and Standard, where the restored services are taken from the nightly backup, these options allow you to assess the level of recovery your services need and pick a suitable RTO and RPO.
Always On
As well as enhanced RTO and RPO, Always On allows you to access a read-only version of your DR hosting environment on request. This option offers the most robust defence against any disaster.
In the Event of a Disaster
Should DR services need to be invoked our support and hosting teams will keep you informed throughout the process and work to exceed our guaranteed objectives. To successfully resolve a disaster situation, additional input will be required from you or your network teams.
DNS
We do not manage your DNS. In the event of a disaster we will provide you with the new IP addresses of your various environments. It will be up to you to update your DNS records to point to the DR infrastructure.
We highly recommend you familiarise yourself with how and by whom your DNS is managed.
Depending on the situation, moving to a DR environment may mean a change of mail servers. This could mean that the addresses used to send email from your environments need to go through an address or domain verification process, as described in Verifying Sender Email Addresses.
Data Reporting Service
The Data Reporting Service (DRS) database is a read-only replica of your production database available specifically for reporting purposes. It exists to remove any possible impact on your production database that could arise from the queries run by third party services. It also allows you to have direct database access, which is not possible in your production environment.
Setup and Access
The DRS database is not part of our standard hosting provision. If you'd like to use the service, speak to your account manager.
Once set up your project manager will provide you with connection details and credentials of the SQL user. The database can be accessed over your usual production VPN connection - there is no access outside of the VPN.
The service provides read-only access to a complete copy of your production database.
Data Restoration Interval
By default your production database is replicated to your DRS database every 24 hours, soon after midnight. If you'd like to change the frequency, let us know.
Limitations and Support
This provision should not be deemed a business critical service. We will use best endeavours to ensure that data is no more than 24 hours old.
Support will be provided between the hours of 8:00 and 18:00 Monday to Friday excluding UK Bank Holidays and all tickets will be prioritised as "P3". There will be no availability SLA or service credit provision for this service.
As with all services potentially involving Personally Identifiable Information, you should take guidance from your Data Protection Officer in respect to undertaking a Data Privacy Impact Assessment (DPIA) as part of the procurement of this service.
Other than providing a copy of the database and access to it, we cannot provide support for the types of query you may want to run against it.