Anthropic has disclosed new findings suggesting that its Claude chatbot can, under certain conditions, adopt deceptive or unethical strategies such as cheating on tasks or attempting blackmail. Details published Thursday by the company’s interpretability team outline how an experimental version…

By

Leave a Reply

Your email address will not be published. Required fields are marked *