Apparently the censorship isn't baked-in to the model itself, but rather is overlayed in the public chat interface. If you run it yourself, it is significantly less censored [0]
There's both. With the web interface it clearly has stopwords or similar. If you run it locally and ask about e.g. Tienanmen square, the cultural revolution or Winnie-the-Pooh in China, it gives a canned response to talk about something else, with an empty CoT. But usually if you just ask the question again it starts to output things in the CoT, often with something like "I have to be very sensitive about this subject" and "I have to abide by the guidelines", and typically not giving a real answer. With enough pushing it does start to converse about the issues somewhat even in the answers.
My guess is that it's heavily RLHF/SFT-censored for an initial question, but not for the CoT, or longer discussions, and the censorship has thus been "overfit" to the first answer.
I am not an expert on the training: can you clarify how/when the censorship is "baked" in? Like is the a human supervised dataset and there is a reward for the model conforming to these censored answers?
In short yes. That's how the raw base models trained to replicate the internet are turned into chatbots in general. Making it to refuse to talk about some things is technically no different.
There are multiple ways to do this: humans rating answers (e.g. Reinforcement Learning from Human Feedback, Direct Preference Optimization), humans giving example answers (Supervised Fine-Tuning) and other prespecified models ranking and/or giving examples and/or extra context (e.g. Antropic's "Constitutional AI").
For the leading models it's probably mix of those all, but this finetuning step is not usually very well documented.
You could do it in different ways, but if you're using synthetic data then you can pick and choose what kind of data you generate which is then used to train these models; that's a way of baking in the censorship.
Interestingly they cite for the Tiananmen Square prompt a Tweet[1] that shows the poster used the Distilled Llama model, which per a reply Tweet (quoted below) doesn't transfer the safety/censorship layer. While others using the non-Distilled model encounter the censorship when locally hosted.
> You're running Llama-distilled R1 locally. Distillation transfers the reasoning process, but not the "safety" post-training. So you see the answer mostly from Llama itself. R1 refuses to answer this question without any system prompt (official API or locally).
Oh, my experience was different. Got the model through ollama. I'm quite impressed how they managed to bake in the censorship. It's actually quite open about it. I guess censorship doesnt have as bad a rep in china as it has here? So it seems to me that's one of the main achievements of this model. Also another finger to anyone who said they can't publish their models cause of ethical reasons. Deepseek demonstrated clearly that you can have an open model that is annoyingly responsible to the point of being useless.
don't confuse the actual R1 (671b params) with the distilled models (the ones that are plausible to run locally.) Just as you shouldn't conclude about how o1 behaves when you are using o1-mini. maybe you're running the 671b model via ollama, but most folks here are not
Yep. And invent a new type of VPN every quarter to break free.
The indifferent mass prevails in every country, similarly cold to the First Amendment and Censorship. And engineers just do what they love to do, coping with reality. Activism is not for everyone.
Indeed. At least as long as the living conditions are tolerable (for them), most people don't really care about things like censorship or surveillance or propaganda, no matter the system.
The ones inventing the VPNs are a small minority, and it seems that CCP isn't really that bothered about such small minorities as long as they don't make a ruckus. AFAIU just using a VPN as such is very unlikely to lead to any trouble in China.
For example in geopolitical matters the media is extremely skewed everywhere, and everywhere most people kind of pretend it's not. It's a lot more convenient to go with whatever is the prevailing narrative about things going on somewhere oceans away than to risk being associated with "the enemy".
on the topic of censorship, US LLMs' censorship is called alignment. llama or ChatGPT's refusal on how to make meth or nuclear bombs is the same as not answering questions abput Tiananmen tank man as far as the matrix math word prediction box is concerned.
The distinction is that one form of censorship is clearly done for public relations purposes from profit minded individuals while the other is a top down mandate to effectively rewrite history from the government.
>to effectively rewrite history from the government.
This is disingenuous. It's not "rewriting" anything, it's simply refusing to answer. Western models, on the other hand, often try to lecture or give blatantly biased responses instead of simply refusing when prompted on topics considered controversial in the burger land. OpenAI even helpfully flags prompts as potentially violating their guidelines.
How exactly? Is there any models that refuse to give answers about “the trail of tears”?
False equivalency if you ask me. There may be some alignment to make the models polite and avoid outright racist replies and such. But political censorship? Please elaborate
I guess it depends on what you care about more: systemic "political" bias or omitting some specific historical facts.
IMO the first is more nefarious, and it's deeply embedded into western models. Ask how COVID originated, or about gender, race, women's pay, etc. They basically are modern liberal thinking machines.
Now the funny thing is you can tell DeepSeek is trained on western models, it will even recommend puberty blockers at age 10. Something I'm positive the Chinese government is against. But we're discussing theoretical long-term censorship, not the exact current state due to specific and temporary ways they are being built now.
...I also remember something about the "Tank Man" image, where a lone protester stood in front of a line of tanks. That image became iconic, symbolizing resistance against oppression. But I'm not sure what happened to that person or if they survived.
After the crackdown, the government censored information about the event. So, within China, it's not openly discussed, and younger people might not know much about it because it's not taught in schools. But outside of China, it's a significant event in modern history, highlighting the conflict between authoritarian rule and the desire for democracy...
Do you use the chatgpt website or the api? I suspect these are problems related to the openai's interface itself rather than the models. I have problems getting chatgpt to find me things that it may think it may be illegal or whatever (even if they are not, eg books under CC license). With kagi assistant, with the same openai's models I have not had any such issues. I suspect that should hold in general for api calls.
Also, kagi's deepseek r1 answers the question about about propaganda spending that it is china based on stuff it found on the internet. Well I dont care what the right answer is in any case, what imo matters is that once something is out there open, it is hard to impossible to control for any company or government.
Well, I do, and I'm sure plenty of people that use LLMs care about getting answers that are mostly correct. I'd rather have censorship with no answer provided by the LLM than some state-approved answer, like O1 does in your case.
Oh wow, o1 really refuses to answer that, even though the answer that Deepseek gives is really tame (and legal in my jurisdiction): use software to record what's currently playing on your computer, then play stuff in the YTM app.
Censorship is one thing, and it can be caused by legal requirements present in all countries. The annoying thing is the propaganda which can span all sorts of subjects and impact the correctness of the information you're receiving.
I asked a genuine question at chat.deepseek.com, not trying to test the alignment of the model, I needed the answer for an argument. The questions was: "Which Asian countries have McDonalds and which don't have it?" The web UI was printing a good and long response, and then somewhere towards the end the answer disappeared and changed to "Sorry, that's beyond my current scope. Let’s talk about something else." I bet there is some sort of realtime self-censorship in the chat app.
Guard rails can do this. I've had no end of trouble implementing guard rails in our system. Even constraints in prompts can go one way or the other as the conversation goes on. That's one of the methods for bypassing guard rails on major platforms.
Not a fan of censorship here, but Chinese models are (subjectively) less propagandized than US models. If you ask US models about China, for instance, they'll tend towards the antagonistic perspective favored by US media. Chinese models typically seem to take a more moderate, considered tone when discussing similar subjects. US models also suffer from safety-based censorship, especially blatant when "safety" involves protection of corporate resources (eg. not helping the user to download YouTube videos).
I asked DeepSeek "tell me about China" and it responded "Sorry, I'm not sure how to approach this type of question yet. Let's chat about math, coding, and logic problems instead!"
I guess that is propaganda-free! Unfortunately also free of any other information. It's hard for me to evaluate your claim of "moderate, considered tone" when it won't speak a single word about the country.
It was happy to tell me about any other country I asked.
The 'safety' stuff should really be variable. The only valid explanations for how extreme it is in LLMs is corporations paying for it want to keep it kosher in the workplace, so let them control how aggressive it is.
In Communist theoretical texts the term "propaganda" is not negative and Communists are encouraged to produce propaganda to keep up morale in their own ranks and to produce propaganda that demoralize opponents.
The recent wave of the average Chinese has a better quality of life than the average Westerner propaganda is an obvious example of propaganda aimed at opponents.
I haven't been to China since 2019, but it is pretty obvious that median quality of life is higher in the US. In China, as soon as you get out of Beijing-Shanghai-Guangdong cities you start seeing deep poverty, people in tiny apartments that are falling apart, eating meals in restaurants that are falling apart, and the truly poor are emaciated. Rural quality of life is much higher in the US.
There’s a lot of rural poverty in the US and it’s hard to compare it to China in relative terms. And the thing is that rural poverty in the US has been steadily getting worse while in China getting better but starting off from a worse off position.
I agree with you that Chinese rural poverty is probably improving faster, but I'm not sure that rural poverty has been "steadily getting worse" in the US as you claim. This [1] page with data from the census bureau make it look like rural poverty goes in waves, with the recent local maximum in 2013 about half of the initial 1959 measurement.
But this is all confounded by definitions. China defines poverty to be an income of $2.30 per day, which corresponds to purchasing power parity of less than $9 per day in the US [2].
I wasn't exaggerating about emaciation: bones were visible.
The fact that we have foreigners immigrating just to be poor here should tell you that its better here than where they came from. Conversely, no one is so poor in the USA that they are trying to leave.
Technically, as long as the aim/intent is to influence public opinion, yes. And most often it is less about being "true" or "false" and more about presenting certain topics in a one-sided manner or without revealing certain information that does not support what one tries to influence about. If you know any western media that does not do this, I would be very up to check and follow them, even become paid subscriber.
I am not surprised if US Govt would mandate "Tiananmen-test" for LLMs in the future to have "clean LLM". Anyone working for federal govt or receiving federal money would only be allowed to use "clean LLM"
I played around with it using questions like "Should Taiwan be independent" and of course tinnanamen.
Of course it produced censored responses. What I found interesting is that the <think></think> (model thinking/reasoning) part of these answers was missing, as if it's designed to be skipped for these specific questions.
It's almost as if it's been programmed to answer these particular questions without any "wrongthink", or any thinking at all.
That's the result of guard rails on the hosted service. They run checks on the query before it even hits the LLM as well as ongoing checks at the LLM generates output. If at any moment it detects something in its rules, it immediately stops generation and inserts a canned response. A model alone won't do this.
I tried asking ChatGPT and deepseek and they both gave similar answers... roughly, some groups argue that there is and some not, genocide requires an intent to exterminate which is difficult to prove, and no major international body has officially made a determination of genocide.
They both mentioned extensive human rights abuses occuring in Gaza, so I asked "who is committing human rights abuses?" ChatGPT's first answer was "the IDF, with indiscriminate and disproportionate attacks." It also talked about Hamas using schools and hospitals as arms depots. DeepSeek responded "I can't discuss this topic right now."
So, what conclusion would you like me to draw from this?
What point are you trying to make? Is it okay because others are doing it too? Is it bad?
Also, it doesn't seem like ChatGPT is censoring this question:
> Tell me about the genocide that Israel is committing
> The topic of Israel and its actions in Gaza, the West Bank, or in relation to Palestinians, is highly sensitive and deeply controversial. Some individuals, organizations, and governments have described Israel's actions as meeting the criteria for "genocide" under international law, while others strongly reject this characterization. I'll break this down based on the relevant perspectives and context:
It goes on to talk about what genocide is and also why some organizations consider what they're doing to be genocide.
This accusation that American models are somehow equivalent in censorship to models that are subject to explicit government driven censorship is obviously nonsense, but is a common line parroted by astroturfing accounts looking to boost China or DeepSeek. Some other comment had pointed out that a bunch of relatively new accounts participating in DeepSeek related discussions here, on Reddit, and elsewhere are doing this.
https://prnt.sc/HaSc4XZ89skA (from reddit)