According to Beating’s monitoring, a recent viral claim suggested that entering special tokens like <|begin_of_sentence|> in DeepSeek’s chat box could expose other users’ conversations, labeled as a P0-level multi-tenant isolation failure. In reality, the phenomenon is unrelated to data isolation. When triggered with such tokens, the model enters its training-time format patterns and generates fabricated dialogue based on its own memory and system prompts—not real-time retrieval from other sessions. This is Training Data Extraction, a shared vulnerability across all large language models, not unique to DeepSeek. Google DeepMind published research in 2023 demonstrating that special inputs can extract training data from GPT and PaLM. The ICLR 2025 Magpie paper directly leverages this mechanism. Claims that leaked content includes today’s date do not prove multi-tenant isolation failure, as DeepSeek includes the current date in its system prompt, and models naturally incorporate it into generated output.
Related News
OpenAI plans to sue Apple: the integration results of ChatGPT have been disappointing, as tech giants attempt to break the deadlock in their partnership
OpenAI adds ChatGPT crisis conversation detection, improving the ability to warn about self-harm and violence
ChatGPT adds another legal lawsuit! Accused of secretly leaking users’ chat content to Meta and Google