Skip to main content

Security Concept

Information Access

Our applications and services run in a multi-tenant, distributed environment. Rather than segregating each customer’s data onto a single server or cluster, all data (including our own) is distributed among a shared infrastructure composed of many homogeneous machines and located across AWS data centers.

Our services store user data in a variety of distributed storage technologies for unstructured and structured data, such as Amazon S3 (Simple Storage Service), and distributed databases such as Amazon DynamoDB.

Our application infrastructure is service oriented. Instead of few, monolithic applications we have many, small services, often referred to as micro-services. This requires that requests coming from other components are authenticated and authorized. Service-to-service authentication is based on an open standard (JSON Web Token). These authentication tokens are issued by an authentication infrastructure built into our production platform to broker authenticated channels between application services. In turn, trust between instances of this authentication broker is derived from x509 host certificates that are issued to each service host by the Amazon Certificate Manager (ACM) certificate authority. Service-to-infrastructure authentication (such as a connection to our storage layer) is authenticated using the AWS Signature Version 4 signing process which carries the identity of the service making the request all the way down to the infrastructure layer.

For example, a web application is visited by an end-user. After the user has been authenticated, the user is in posession of a short-lived JWT which proves their identity and contains an identifier for the web application for which the token was issued (audience in JWT terms). The JWT is then used by the front-end to make an API request to an application back-end to process the request. This API request is authenticated by the back-end, and will only be processed if the audience is authenticated as an authorized front-end application. If authorized, the application back-end will make an API call to a storage layer (such as DynamoDB) to retrieve the requested data. The storage layer again authenticates and authorizes the request, and will only process the request if the requester (the service back-end) is authenticated as authorized to access to the data store in question.

Access by production application administrative engineers to production environments is similarly controlled. A centralized group and role management system is used to define and control engineers’ access to production services, using an extension of the above-mentioned security protocol. The engineer exchanges the authentication token issued by our corporate identity management system for short-lived, personal AWS credentials that are scoped down to only allow least-privileged access. Issuance of personal credentials is in turn guarded by two-factor authentication.

Our primary goal is to remove the human component as much as possible from daily operations. Most of our service rely on fully-managed infrastructure components provided by AWS. We rely on AWS to operate and maintain the underlying infrastructure, such as servers and networking. If self-managed servers are required for the operation of a service, we follow strict rules for administrative access. Shell access to any server is only available through server-initiated AWS Systems Manager Session Manager (AWS SSM) sessions. Access to this service requires personal, short-lived security credentials. The main benefit is that there is no additional layer of credentials (such as SSH keys) that need to be managed and audited. Additionally, all ports for remote administation on the servers (such as port 22 for SSH) can be disabled and reduce the attack surface. Additionally, AWS SSM provides an audit trail that is independent of the logs provided by the server.

Access Control

In order to secure our valuable data assets, we employs a number of authentication and authorization controls that are designed to protect against unauthorized access.

Authentication Controls

We requires the use of a unique user ID for each employee. This account is used to identify each person’s activity in our infrastructure, including any access to employee or customer data. This unique account is used for every system. Upon hire, an employee is assigned the user ID by Human Resources and is granted a default set of privileges described below. At the end of a person’s employment, their account’s access is disabled from within the HR system.

Where passwords or passphrases are employed for authentication (e.g., signing in to workstations), systems enforce our password policies, including password expiration, restrictions on password reuse, and sufficient password strength.

We make widespread use of two-factor (2-step) authentication mechanisms, such one-time password generators and dedicated devices that support the FIDO2 standard. Two-factor authentication is required for access to any infrastructure resource, in both production and development environment. We require third party applications, such as version control systems, to be integrated with our corporate identity service which allows us to enforce two-factor authentication with third-party software as well.

Authorization Controls

Access rights and levels are based on an employee’s job function and role, using the concepts of least-privilege and need-to-know to match access privileges to defined responsibilities. Our employees are only granted a limited set of default permissions to access company resources, such as their email, and internal resources. Employees are granted access to certain additional resources based on their specific job function.

Requests for additional access follow a formal process that involves a request and an approval from a data or system owner. Approvals are managed by workflow tools that maintain audit records of all changes. These tools control both the modification of authorization settings and the approval process to ensure consistent application of the approval policies. An employee’s authorization settings are used to control access to all resources.

Accounting

Our policy is to log administrative access to every system and all data. We do not treat environments differently when it comes to logging and audit trails. Events must be traceable across all environments to be able to reconstruct incidents. These logs are reviewable by our security staff on an as-needed basis.

Personnel Security

Our employees are required to conduct themselves in a manner consistent with the company’s guidelines regarding confidentiality, business ethics, appropriate usage, and professional standards. Upon hire, we will verify an individual’s education and previous employment, and perform internal and external reference checks. Where local labor law or statutory regulations permit, we may also conduct criminal, credit, immigration, and security checks. The extent of background checks is dependent on the desired position.

Upon acceptance of employment, all employees are required to execute a confidentiality agreement and must complete a training and test that ensures receipt of and compliance with our policies. The confidentiality and privacy of customer information and data is emphasized in the handbook and during new employee orientation.

Employees are provided with security training as part of new hire orientation. In addition, each employee is required to read, understand, and take a training course on the company’s Code of Conduct. The code outlines our expectation that every employee will conduct business lawfully, ethically, with integrity, and with respect for each other and the company’s users, partners, and competitors.

Depending on an employee’s job role, additional security training and policies may apply. Employees handling customer data are required to complete necessary requirements in accordance with these policies. Training concerning customer data outlines the appropriate use of data in conjunction with business processes as well as the consequences of violations.

Every employee is responsible for communicating security, conflict of interest and privacy issues to our compliance staff. The company provides confidential reporting mechanisms to ensure that employees can anonymously report any ethics violation they may witness.

Physical and Environmental Security

We have policies and procedures that ensure that the physical and environmental security of our cloud providers meet our expectations. Please refer to the AWS Compliance for Data Centers page for details on the pysical and environmental security of the data centers where our data is hosted.

Infrastructure Security

Our security policies provide a series of threat prevention and infrastructure management procedures.

Monitoring

Our security monitoring program analyzes information gathered from internal network traffic, employee actions on systems, and outside knowledge of vulnerabilities. At multiple points across our global network, internal traffic is inspected for suspicious behavior, such as the presence of traffic that might indicate botnet connections. This analysis is performed using a combination of open source and commercial tools for traffic capture and parsing. Network analysis is supplemented by examining system logs to identify unusual behavior, such as unexpected activity in former employees’ accounts or attempted access of customer data.

Our security engineers place standing search alerts on public data repositories to look for security incidents that might affect the company’s infrastructure. They review inbound security reports and monitor public mailing lists, blog posts, and web bulletin board systems. Automated network analysis helps determine when an unknown threat may exist and escalates to our security staff. Network analysis is supplemented by automated analysis of system logs.

Vulnerability Management

We employ a team that has the responsibility to manage vulnerabilities in a timely manner. Our security team scans for security threats using commercial and in-house-developed tools, automated and manual penetration efforts, quality assurance (QA) processes, software security reviews, and external audits. The vulnerability management team is responsible for tracking and following up on vulnerabilities.

Once a legitimate vulnerability requiring remediation has been identified by the security team, it is logged, prioritized according to severity, and assigned an owner. The vulnerability management team tracks such issues and follows up until they can verify that the vulnerability has been remediated.

Dependency Management

Supply chain attacks pose a serious risk to modern application stacks. They include typosquatting (registering a malicious dependency with a typo), use after free (re-use of a name that no longer exists), social engineering and others. This type of attack has been shown to be very effective as developers flock towards public package repositories.

Our goal is to enable our engineers to benefit from public package repositories while ensuring that external packages do not pose a risk to our infrastructure. Packages must be code-reviewed before they are used and their version must be pinned. All package managers used must support auditing the installed dependencies for vulnerabilities. Our continuous integration and deployment pipelines are designed to run these audits before any deployments and can abort early if any discrepancies are detected. It is in the responsibility of the engineer to ensure that the dependencies (and its dependencies) are clean. If this cannot be guaranteed, we commit to invest in the engineering effort required to create our own packages with similar functionality.

Incident Management

We have an incident management process for security events that may affect the confidentiality, integrity, or availability of its systems or data. This process specifies courses of action and procedures for notification, escalation, mitigation, and documentation. Staff are trained in forensics and handling evidence in preparation for an event, including the use of third party and proprietary tools. Testing of incident response plans is performed for identified areas, such as systems that store sensitive customer information. These tests take into consideration a variety of scenarios, including insider threats and software vulnerabilities.

When an information security incident occurs, our security staff responds by logging and prioritizing the incident according to its severity. Events that directly impact customers are treated with the highest priority. An individual or team is assigned to remediating the problem and enlisting the help of product and subject experts as appropriate. Our security engineers conduct post-mortem investigations when necessary to determine the root cause for single events, trends spanning multiple events over time, and to develop new strategies to help prevent recurrence of similar incidents.

Network Security

We employs multiple layers of defense to help protect the network perimeter from external attacks.

Our network security strategy is composed of the following elements:

  • Control of the size and make-up of the network perimeter. Enforcement of network segregation using firewall and ACL technology.
  • Management of network firewall and ACL rules that employs change management, peer review, and automated testing.
  • Leave the operation of network devices to our cloud vendor. No custom network (software) appliances are used.
  • Routing of all external traffic through AWS managed front-end servers (CloudFront, Global Accelerator) that help detect and stop malicious requests.
  • Internal aggregation points to support better monitoring.
  • Examination of logs for exploitation of programming errors (e.g., cross-site scripting) and generating high priority alerts if an event is found.

Transport Layer Security

All services make use of the Hypertext Transfer Protocol Secure (HTTPS) for more secure browser and service-to-service connections. Information sent via HTTPS is encrypted from the time it leaves our infrastructure until it is received by the recipient’s client. We ensure in collaboration with our cloud provider that only secure encryption protocols and ciphers are being used.

Operating System Security

We limit ourselves to operating systems that receive automated updates and patches that are managed by our cloud provider. We ensure that the servers are configured to be updated automatically. Our applications are designed to handle the replacement or reboot of the underlying servers without impacting its availability.

Many services use container technology to further isolate the runtime environment of an application from its infrastructure. All containers are hosted on servers that are fully managed by our cloud provider. Container images are stored in our cloud provider's container registry. This registry conducts automated scans of the stored images on a regular basis and informs us about security vulnerabilities.

Software Security

This section outlines our current approach to software security; it may adapt and evolve in the future.

Software Development Lifecycle

Security is a key component of our design and development process. Our engineering organization does not require Product Development teams to follow a specific software development process; rather, teams choose and implement processes that fit the project’s needs. As such, a variety of software development processes are in use, from Agile Software Development methodologies to more traditional, phased processes. Our security review processes are adapted to work within the chosen framework. Engineering management has defined requirements for project development processes:

  • Peer-reviewed design documentation
  • Adherence to coding style guidelines
  • Peer code review
  • Multi-layered security testing

The above mandates embody our software engineering culture, where key objectives include software quality, robustness, and maintainability. While the primary goal of these mandates is to foster the creation of software artifacts that excel in all aspects of software quality, our experience also suggests that they can reduce the incidence of security flaws and defects in software design and implementation:

  • The existence of adequately detailed design documentation is a prerequisite of the security design review process, since in early project stages it is generally the only available artifact on which to base security evaluations.
  • Many classes of implementation-level security vulnerabilities are fundamentally no different from low-risk, common functional defects. Many implementation-level vulnerabilities are caused by fairly straightforward oversights on the developer’s part.
  • Given developers and code reviewers who are educated with respect to applicable vulnerability patterns and their avoidance, a peer review-based development culture that emphasizes the creation of high-quality code supports a secure code base.

Our engineering teams are incentivized to embrace open source and share tooling with the public through our GitHub organization at https://github.com/emdgroup. This gives us the opportunity to get an external perspective on our development processes and software quality. We encourage collaboration with other engineers across our organization on the development and vetting of reusable components designed and implemented to help software projects avoid certain classes of vulnerabilities.

Security Education

Recognizing the importance of an engineering workforce that is educated with respect to secure coding practices, we maintain an engineering outreach and education program that currently includes:

  • Security training for new engineers.
  • In-depth training in application security for select engineers, with the goal of fostering the development of resident security experts on development project teams.
  • The creation and maintenance of documentation on secure design and coding practices.
  • Targeted, context-sensitive references to documentation and training material. For example, automated vulnerability testing tools provide engineers with references to training and background documentation related to specific bugs or classes of bugs flagged by the tool.
  • Technical presentations on security-related topics.
  • A dedicated chat channel that is intended to keep our engineering workforce abreast of new threats, attack patterns, mitigation techniques, security-related libraries and infrastructure, best practices and guidelines.

Implementation-Level Security Testing and Review

We employs a number of approaches intended to reduce the incidence of implementation-level security vulnerabilities in its products and services:

  • Implementation-level security reviews, which are conducted by members of the security team typically in later stages of product development, aim to validate that a software artifact has protection against relevant security threats. Such reviews typically consist of a re-evaluation of threats and countermeasures identified during security design review, targeted security reviews of security-critical code, selective code reviews to assess code quality from a security perspective, and targeted security testing.
  • Automated testing for flaws in certain relevant vulnerability classes. We use both in-house developed tools and some commercially available tools for this testing.
  • External certification of the implementation of open standards used in our infrastructure. This certification tests our implementation for correctness against the open standard and tests for common security vulnerabilities and weaknesses.
  • Security testing performed by Software Quality Engineers in the context of the project’s overall software quality assessment and testing efforts.

Disaster Recovery and Business Continuity

To minimize service interruption due to hardware failure, natural disaster, or other catastrophe, we implement a number of mitigations to ensure business continuity. This program includes multiple components to minimize the risk of any single point of failure, including the following:

  • Data replication and backup: All application data is replicated to multiple systems within an AWS region (a few miles apart) and in some cases also replicated to other regions (hundreds of miles apart).
  • AWS operates a geographically distributed set of data centers that is designed to maintain service continuity in the event of a disaster or other incident in a single region. High-speed connections between the data centers help to support swift failover. Management of the data centers is also distributed to provide location-independent, around-the-clock coverage, and system administration.

We conduct regular testing of our Disaster Recovery Plan. For example, During such tests, a disaster in a geographic location or region is simulated by taking IT systems and business and operational processes in that location off-line, and allowing such systems and processes to transfer to fail-over locations designated by the Disaster Recovery Plan. During the course of the test, it is verified that business and operations functions can operate at the fail-over location, and hidden/unknown dependencies on the off-line location are identified and logged for later remediation.

Summary

As described above, we employs a multi-layered security strategy consisting of the components illustrated in this document that support a platform that is used by many organizations worldwide, including ourselves. We feel confident in our capabilities to provide a secure and stable environment for our customers.

References:

  • Google’s Approach to IT Security, A Google White Paper, 2012