Recent claims by threat actors that they obtained an OmniGPT backend database show the risks of using sensitive data on AI chatbot platforms, where any data inputs could potentially be revealed to other users or exposed in a breach.
OmniGPT has yet to respond to the claims, which were made by threat actors on the BreachForums leak site, but Cyble dark web researchers analyzed the exposed data.
Cyble researchers detected potentially sensitive and critical data in the files, ranging from personally identifiable information (PII) to financial information, access credentials, tokens and API keys. The researchers did not attempt to validate the credentials but based their analysis on the potential severity of the leak if the TAs’ claims are confirmed to be valid.
OmniGPT Hacker Claims
OmniGPT integrates several well-known large language models (LLMs) into a single platform, including Google Gemini, ChatGPT, Claude Sonnet, Perplexity, DeepSeek and DALL-E, making it a convenient platform for accessing a range of LLM tools.

The threat actors (TAs), who posted under aliases that included Gloomer and SyntheticEmotions, claimed that the data “contains all messages between the users and the chatbot of this site as well as all links to the files uploaded by users and also 30k user emails. You can find a lot of useful information in the messages such as API keys and credentials and many of the files uploaded to this site are very interesting because sometimes, they contain credentials/billing information.

The data analyzed by Cyble includes four files:
- Files.txt, which contains links to files uploaded by users, some of which contain highly sensitive data
- Messages.txt, which contains user prompt data
- User_Email_Only.txt, a list of user email addresses found within the data
- UserID_Phone_Number.txt, which contains a combination of user email IDs and phone numbers.
Analysis of Claimed OmniGPT Data Leak
Cyble found that the email and phone number file (UserID_Phone_Number.txt) contained personally identifiable information (PII) such as email addresses and phone numbers that could be used in phishing attacks, spam, and identity theft, while exposed phone numbers could be used for harassment, targeted scams, or social engineering.
Some of the email addresses appear to belong to organizational domains such as educational institutions or corporations, revealing potential associations with businesses or institutions and increasing the risk of spear phishing attacks for those organizations.
The User_Email_Only.txt file contains numerous email addresses, which can be used as personal identifiers. Although there are no associated full names or physical addresses, these emails can still be linked to individuals, and some of the addresses were for organizational domains. The risk of phishing attacks is high, especially if these email addresses are cross-referenced with other leaks. Email hijacking could be a possibility if the leaked addresses are used across multiple platforms, Cyble said in its analysis.
Messages.txt, which contains user prompt data, contains critical security issues if valid. These include:
- Bearer tokens and OpenAI API keys exposed in messages, potentially allowing unauthorized access to APIs.
- Database credentials, including username, password, and database name, which could provide attackers with full access to internal databases.
- Credential leaks, which includes email addresses and passwords that could lead to account compromises or unauthorized access.
- Table structure, which includes sensitive database schema information, potentially revealing critical business logic or proprietary data.
- Code base and endpoint leaks, which could enable attackers to analyze system vulnerabilities.
- Payment card details, such as credit card numbers, CVVs, and expiry dates, that could put users at risk for financial theft and fraud.
The exposure of tokens or API keys could provide an easy entry point for attackers to abuse the system and perform unauthorized actions, Cyble said, while exposing database credentials could allow attackers to potentially access and modify sensitive business data.
Cyble’s analysis said the file “involves a range of sensitive information, including credentials, tokens, and financial details. The severity of the leak is high, as it includes exploitable access points like database credentials, API keys, and payment information.”
Files.txt, which contains links to files uploaded by users, contains a range of sensitive file types, some of which include:
- Document files that include sensitive information such as resumes and organizational documents with sensitive data
- Access credentials to databases
- API documentation detailing production-level API tokens, payload data, and expected outputs
- File types, such as .pdf, .doc, and .csv files, containing sensitive business information
- Project integration files with production-level data, including API keys, tokens, and user input payloads
The exposure of these files could potentially lead to data breaches, unauthorized access to business systems, and exploitation of API vulnerabilities, while production-level details exposed increase the risk of system manipulation, unauthorized access, and exploitation of business-critical operations. If valid, these leaks pose a potentially serious risk to both user privacy and organizational security, Cyble said.
Safe LLM Use
Adding information to an LLM makes it part of a larger data lake, and the right prompt may be all it takes to reveal the data to other parties in unintended ways. For that reason alone, sensitive data should not be entered into LLMs.
If an organization decides that there is value in extracting LLM insights from sensitive data, then the data should be masked, minimized, anonymized, and other sensitive data handling controls applied.
Organizations should address the use of LLMs and chatbots at the policy level, monitor for compliance, and limit inputs as much as possible while extracting needed insights.



