By Jimmy Ji
Understanding Audit Logging
Audit logging involves recording activity within software utilized throughout your organization. These logs capture details such as the event that took place, the time it happened, the user or service involved, and the entity affected. Every modern device on your network, along with your cloud services and applications, generates logs that can be used for auditing purposes. It is a cornerstone of building robust enterprise systems as it is critical for:
- Security monitoring and incident response
- Accountability and data tracking
- Troubleshooting and problem resolution
- Performance analysis
- Compliance and legal requirements
Despite its importance, there is no universal audit logging solution as organizations have different requirements that will directly influence the design and implementation of such systems.
Evolving our Framework
At Hyperscience, we have an approach and framework to audit logging but our goal of FedRAMP High Accreditation and strategic partnership with Palantir revealed two major areas for improvement: deeper coverage and increased logging details.
Audit Logging Levels
As we moved forward in our pursuit of FedRAMP High Accreditation, we needed to consider customers who were already using our database-backed audit logs. We did not want to burden them with unnecessary migrations or unexpected volume when their previous system was working perfectly fine.
Therefore, the team decided to split up our logging into two types and allowed clients to choose their level of logging:
- (Classic) Audit Logging
- Enhanced Audit Logging
For the purposes of this post, we will be primarily discussing the complete features of Enhanced Audit Logging.
Expanding Coverage
We set out with the goal to implement logging for every human action taken. The most straightforward solution was to intercept and track all HTTP requests to the Hyperscience Hypercell. The caveat is that we would receive a lot of duplicate logs when it comes to an event that we have already covered in our platform (we already had a number of audits in place). Not only that, but critical details in our application would be lost in a generalized approach like this, as we would not be able to convey the full context of those actions.
Consequently, we decided to adopt a hybrid approach: any requests that result in data changes may require additional information and will have to be manually instrumented by our engineering team. Endpoints that are more navigational in nature (such as a client going between tabs) can be handled by a custom middleware.
With this implementation, we should be able to get the best of both worlds: complete coverage and essential background knowledge per audit, albeit a bit of manual work from our engineering teams to extend coverage for those pivotal endpoints.
Enhancing our Insights
To increase the detail, we sought to always include the timestamp, source, identity, and outcome per log. In other words, for every log we would need to understand:
- What action was triggered
- When an action was triggered
- Who triggered the action
- The IP address of the client who triggered the action
- If the action triggered was a success or failure
Most of this information comes included when a request is made to our platform. We had to implement logic to extract those details and insert them into the newly created model fields that encapsulate these new specifications. We were also able to leverage some clever threading tricks to auto-populate as much information as possible, making it as simple for engineers when it came to manually introducing new audits.
Considering Rapid Growth
As with most engineering changes that involve data growth, our team needed to consider how we were going to handle larger volumes of audit logs. Historically, our audit logs were stored in the database as an elementary but effective solution designed with durability and queryability. With our new approach, we projected a dramatic increase in logs and it was no longer practical to continue saving all our audits to our database without performance degradation.
To handle such volume, we included two new flags that allowed us to send our logs to the database or to the operating system. The new configurations give our customers the option to choose where they would like to store their audit logs, minimizing impact.
Databases that consistently save large volumes can grow exponentially in size. In order to more efficiently manage our storage, we have decided to enable our Audit Log deletion policy by default for new clients. We have also adjusted our policies to set the maximum retention to 180 days for SaaS environments.
Steadfast in new Standards
As our team continues to navigate the changes in our logging philosophy, a number of tools are needed to ensure that the new rigorous requirements are maintained. Looking forward, we have implemented a variety of internal testing tools that allow us to:
- Require that all non-middleware endpoints are properly audited.
- Ensure that the messaging and content of existing audits are valid.
- Enforce particular endpoints to have manual audits.
Assertions like these set up the team for success and prevent potential holes in our audit logging framework. Moreover, the team has written a whole new suite of unit and end-to-end tests, ensuring that our previous efforts do not falter.
Ready for the Future
Continuous improvements are important to any modern technology platform, especially those that are used by federal services and other large organizations. As we strive for FedRAMP High Accreditation, Hyperscience is dramatically bolstering the way we are tracking actions across the application. We are committed to developing robust, accountable and secure systems to deliver hyperautomation that you can trust.