Changing text encoding with Vim

A lot of subtitles in Chinese language are encoded with GB2312 or sometimes Code Page 936 (CP936, aka gbk), are generated under Windows. This brings problems in many Linux based multimedia players.

Converting using iconv is not that hard, since its command is simple to use:

iconv -f gb2312 -t utf8 subtitle.srt -o converted.srt

But in many cases, you will find error as such:

iconv: illegal input sequence at position xxxx.

To solve this, you can need to change the gb2312 to some larger encoding set with the same catagory. For example, gbk is a super set of gb2312.

If you only know the file’s encoding but don’t know its super charset, you can always use vim to convert it.

vim subtitle.srt

Edit it with ++enc option:

:e ++enc=gb2312

Save it with ++enc option:

:w ++enc=utf8 converted.srt

Related documentation: http://vimdoc.sourceforge.net/htmldoc/usr_45.html#45.4

All done!

Leave a comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.