Privacy and AI Ethics Frequently Asked Questions
At Hyperscience, we understand the importance of addressing your questions and concerns regarding data privacy and ethical considerations when it comes to adopting AI solutions. Below, we provide answers to commonly asked questions in alignment with our core Ethical AI principles.
1. What does AI Ethics mean?
AI Ethics refers to the ethical considerations and guidelines that govern the development, deployment, and use of artificial intelligence (AI) technologies. It involves addressing the potential moral, societal, and legal implications of AI systems to ensure that they are developed and used in ways that align with human values, rights, and well-being.
2. What are the ethical risks of an enterprise using AI?
Using AI in enterprises poses ethical risks including bias, privacy breaches, lack of transparency, accountability issues, unintended consequences, job displacement, depersonalization, and security vulnerabilities. These risks can lead to discrimination, loss of privacy, mistrust, and societal harm. Mitigation involves fair data handling, transparent AI design, clear accountability, ongoing monitoring, and adherence to regulations. Ethical AI implementation ensures responsible technology use, trust building, and long-term positive impact.
3. What’s the difference between AI and machine learning (ML) in regards to the ethics conversation?
AI is a broader concept of creating intelligent systems. Machine learning is a specific approach within AI that involves learning from data. Ethical concerns related to AI often revolve around the potential consequences of these systems’ decisions, their transparency, accountability, and overall impact on society. Ethical issues in machine learning primarily center on aspects like bias and fairness in training data, model interpretability, transparency, accountability for decisions made by models, and the potential for unintended consequences due to the complexity of learned patterns.
4. What data does Hyperscience use to train its ML models? Who trains the models?
For our proprietary models (such as our internal transcription model), we rely exclusively on a mix of synthetic data (artificially generated data that mimics the characteristics of real data), open-source data, and data from our data partnerships where sensitive Preliminary Draft Privileged and Confidential and/or personal information has been redacted. We do not use any data from customers without their express written consent.
- Internal models are trained by ML experts who are also the guarantors of data collection and access. We do not have access to models or training data from any customer without their explicit approval.
- Upon request, we sometimes train models for customers either through our annotation service, or for debugging purposes. In both cases, the data is not used by Hyperscience for any other purpose, and is deleted as soon as the request is completed.
5. How does Hyperscience ensure its ML models are unbiased?
Hyperscience’s proprietary models are trained and evaluated on a large collection of datasets to ensure we have extensive and diverse coverage. This includes data from various industries, broad demographic and geographic groups, and diverse types of documents to create a more universally applicable and unbiased machine learning model. This multi-faceted approach to data collection and training ensures that our models are not confined to specific domains, thus minimizing the risk of biased outcomes.
Additionally, many of our models are designed for users to fine-tune against their own data sources to ensure that the models can understand data for their specific context. As an extra control, we also have human-in-the-loop supervision and QA workflows that allow customers to frequently review model performance and tune model behavior. These advanced workflows ensure customers are fully aware of the model results including any bias or errors that may have been introduced to promptly correct and mitigate going forward.
As per the use of large language models (LLMs), Hyperscience does not rely on any LLMs/Gen-AI within our core data processing workflows. We do however, enable customers to leverage LLM/GenAI to perform additional tasks against the data that we’ve processed. First, we have an out of the box integration with two well known LLMs on the market (GPT4 and Llama2) which customers can enable. Llama2 is an open-source model that’s available within the Hyperscience SaaS solution or customers can install along with their on-premise Hyperscience solution. GPT4 is a cloud based solution that customers must leverage their own license agreement with OpenAI to enable. With either LLM model, Hyperscience helps organizations establish “guardrails” around the use of LLMs, with intuitive supervision interfaces that ensure human employees can review the outputs, ensuring accuracy and precision, and preventing the risks associated with the use of these models.
6. What are Hyperscience best practices to ensure a fair and ethical practice and use of AI/ML?
Our ML experts carefully review licenses and data sources for every model used in the platform, with the support of our privacy compliance team. They also partner with our product experts on privacy-by-design initiatives to ensure the safety of any customer data stored by our platform through personal information deletion, and secure platform access.
The Hyperscience Platform continuously oversees the performance of the models deployed in production environments, and gives the customer the ability to decide on the human involvement for any model deployed. Additionally, we provide robust tools and alert systems to mitigate unpredictable model behavior.
All of our annotations (either internal, or as-a-service) are done by expert keyers who are fairly compensated for their work. We do not outsource annotations to unethical actors.
7. Does Hyperscience use customer data to train or improve its services?
Hyperscience SaaS customers can allow Hyperscience to use customer data to improve services. These improvements can relate to models in use by customers, including models for particular document types, or other product performance improvements.
In cases where Hyperscience SaaS customers agree to allow us to collect and use de-identified data to improve our services, the data must be de-identified (PII replaced with synthetic data) prior to being used.
Access to the de-identified data by Hyperscience employees is limited and tracked. Even when customer data is PII-free, only a minimal set of engineers will be allowed to access the de-identified data.
A Hyperscience SaaS customer’s data is eligible for redaction and replacement with synthetic data only if all conditions below are met:
- Customer has not opted out of the clause allowing Hyperscience to use their de-identified data
- The customer is US-based and their data is not subject to data protection regulations (such as GDPR or similar)
- Data processed by the customer is not under to HIPAA (or similar) regulation
8. Does Hyperscience share models across different customers?
Hyperscience ships with a number of out of the box models (e.g. structured classification, transcription) that are used across all customers. Models that customers train in their own environment (e.g. semi-structured classification, field identification, table identification) are not shared to other customers. If a customer has made data available for model training in the scenario described in answer 7, models trained on this data may be made available to other customers. For example, a customer is processing invoices and Hyperscience has redacted and synthesized some of these invoices. This training set may be supplemented with more training data (e.g. invoices from other customers, similarly redacted and synthesized) and a model will be trained. Customers may have access to the model that is trained, but not the underlying invoices that were used to train the model.
9. Will my training data be used by other customers of Hyperscience?
Models that are trained on redacted and synthesized customer data may be used by other customers. The training data itself does not “travel” with the model. Training data will only be available to the customer who generated it and a limited set of Hyperscience employees. For example, a customer is processing invoices and Hyperscience has redacted and synthesized some of these invoices. This training set may be supplemented with more training data (e.g. invoices from other customers, similarly redacted and synthesized) and a model will be trained. Customers may have access to the model that is trained, but not the underlying invoices that were used to train the model.
10. How does Hypercience use Generative AI? Does your platform send information to 3rd-party LLMs?
- The Hyperscience platform provides our customers and partners with the necessary enterprise stack to operationalize AI. With Flow Studio, our low-code development environment, customers can easily use 3rd-party LLMs combined with our human supervision interfaces, to guarantee the required accuracy and to ensure they can trust the outputs.
- In our product release October 2023, we included two new Blocks to provide easy access to GPT and Llama, as part of Hyperscience Flows. You will be able to use the power of these LLMs, combined with Hyperscience’s world-class classification and extraction capabilities, and its supervision interfaces to make sure the outputs produced by Generative AI meet the quality standards and brand image of your organization.
- With the Llama block, customers have the opportunity to deploy Generative AI within their own air-gapped environment, so that sensitive data never leaves your trusted environment.
- We are currently also experimenting with the use of Out-Of-The-Box Generative Document AI Models for specific use cases as an innovative way to simplify the ML pipeline, improve processing time, and reduce time-to-value (TTV) so that customers can start seeing results even faster.
- It’s important to note that in the case of using 3rd-party public APIs such as ChatGPT, customers are strongly advised not to send any PII or sensitive information as part of their prompts.
11. What has Hyperscience done to address the requirements of EU AI Act, the principles set out in the UK Government White Paper, and the ICO Guidance on the processing of personal data using AI technologies?
Tracking legislative developments
Hyperscience actively monitors legislative developments relating to AI in Europe and how it will apply to our solutions. We are taking steps to ensure our solutions are developed – and can be used by our customers – in an ethical, transparent, trustworthy, and ultimately compliant manner.
AI Ethics Board
We have appointed an AI Ethics Board, staffed with senior individuals across key areas of our business, to ensure a holistic approach to meeting AI ethics imperatives at all stages of a solution’s lifecycle.
AI Ethics Principles
We have pro-actively developed a series of Ethical AI Core Principles aimed at ensuring that our AI solutions are:
- Human Centered and Socially Beneficial
- Fair and Accountable
- Explainable and Transparent
- Secure and Safe
Our core principles are designed to align with fundamental aspects of the principles proposed by both the EU and UK – we plan to monitor and adapt these principles as legislators settle on requirements in Europe.
For more about our commitment to Ethical AI, please click here.
Data Privacy and Security
A key element of creating trustworthy AI is and will remain adherence to existing European privacy and data protection laws – ensuring high standards of data quality, data integrity and data security.
Hyperscience has developed a targeted privacy compliance programme based on existing requirements in Europe, including those set out in the EU’s and UK’s implementations of the General Data Protection Regulation.
12. What is the ‘classification’ of Hyperscience products for the purposes of the EU AI Act?
The EU has adopted a targeted, risk-based approach to regulate AI systems differentially depending on the risk category they fall into:
- Unacceptable risk – these systems are prohibited.
- High risk – these systems must comply with strict obligations before they can be put on the market in Europe.
- Limited risk – these systems have only relatively limited transparency obligations.
- Minimal or no risk – these systems – which are noted by the European Commission to make up the “vast majority of AI systems currently used in the EU” – are largely unregulated by the AI Act.
We anticipate that Hyperscience’s solutions will fall into the “minimal or no risk” category. Despite this fact, we remain committed to ensuring adherence to our Core Principles to develop safe, transparent and trustworthy AI solutions.
13. How does Hyperscience align with the strategic AI Policies from the US Government?
In October 2023, The White House published the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Hyperscience is committed to upholding the highest standards in the development and deployment of artificial intelligence (AI) technologies. In line with recent US government regulations pertaining to AI, we want to assure our stakeholders that our company complies with these new guidelines. We continue to keep a close eye on any further regulatory developments.