Introduction:
One of the most common NLS-related problems reported to Oracle Worldwide Customer Support is the loss or changing of characters after an export and import. This is almost always due to NLS_LANG being set to the incorrect character set during export.
Role of NLS_LANG During Export/Import:
This is explained in detail in Note 15095.1 but, in brief, export and import are client tools and will work under the character set specified by NLS_LANG. If, for instance, the database is created with a character set of WE8DEC and NLS_LANG is set to AMERICAN_AMERICA.WE8PC850 then the ascii values of the stored characters in the database will be translated to the values for the same characters in the WE8PC850 character set. The character set defined by NLS_LANG during the export is stored in the export dump file and is used to ensure that the data is translated correctly to the character set defined by NLS_LANG for the import.
Potential Problems:
If NLS_LANG is not set, for instance, export will be done under US7ASCII, the default character set. If the database was built with character set WE8DEC the characters stored in the database will be converted to US7ASCII and any 8-bit characters, having no equivalent in US7ASCII, will be stripped out.
The same problem will be seen if the character set defined by NLS_LANG is not a superset of the one being translated from (ie: the database character set on export or the export file character set on import).
Identifying the Export Character Set:
When investigating problems like these it is useful to check the character set used for the export. As said above, this is held in the export dump file. It can be seen by doing a hex dump of the export file as follows (in Unix):
cat expdat.dmp | od -x | head
This will produce output similar to:
0000000 0300 0145 5850 4f52 543a 5630 372e 3033
0000020 2e30 330a 4454 534f 0a52 5441 424c 4553
0000040 0a31 3032 340a 300a 0020 2020 2020 2020
0000060 2020 2020 2020 2020 2020 2020 2020 2020
*
0000140 2020 2020 2020 2020 4d6f 6e20 4e6f 7620
0000160 3130 2031 343a 3031 3a33 3620 3139 3937
0000200 0a54 4142 4c45 2022 454d 5022 0a43 5245
0000220 4154 4520 5441 424c 4520 2245 4d50 2220
The second and the third byte in the file define the character set used for the export.
In the example above, the second byte is 0x00 and the third byte is 0x01, yielding 0x0001 as the character set ID. This shows that NLS_LANG was set to US7ASCII during the export. The new Oracle8 functions NLS_CHARSET_NAME and NLS_CHARSET_ID can be used to map character set IDs to character set names.
Note that the 16-bit value is stored in the EXP platform endian.
Most unix platforms are big-endian ( Sparc, PowerPc, PARisc, RS/6000, SGI R4000 systems), i.e. the most significant byte is showed first. (like above example -> if the file begins with 03xx -> big endian)
=============================================================
On little-endian platforms, (platforms running on Intel/AMD x86 and Alpha mainly) the output will be slightly different as below:
00000000 0003 4501 5058 524f 3a54 2e37 3330
etc.
Here the most significant byte is showed *last* (!)
(if the file begins with xx03 -> little endian)
The first and the forth byte in the file define the character set used for the export.
0000000 0003 4501 5058 524f 3a54 3156 2e30 3130
0000020 302e 0a30 5344 5359 414d 0a4e 5452 4241
0000040 454c 0a53 3138 3239 300a 320a 0a30 0a30
0000060 6703 6903 6703 0100 0000 0000 0000 0000
0000100 0800 2000 2020 2020 2020 2020 2020 2020
0000120 2020 2020 2020 2020 2020 2020 2020 2020
0000140 2020 2020 2020 2020 7553 206e 634f 2074
0000160 3531 3120 3a31 3832 343a 2033 3032 3630
0000200 6574 7473 642e 706d 0000 0000 0000 0000
0000220 0000 0000 0000 0000 0000 0000 0000 0000
In the example above, the first byte is 0x00 and the forth byte is 0x01, yielding 0x0001 as the character set ID. This shows that NLS_LANG was set to US7ASCII during the export. The new Oracle8 functions NLS_CHARSET_NAME and NLS_CHARSET_ID can be used to map character set IDs to character set names.
The values for the most commonly used character sets are below:
ID | ID(Hex) | Name | ID | ID(Hex) | Name | ID | ID(Hex) | Name |
1 | 0001 | US7ASCII | 2 | 0002 | WE8DEC | 3 | 0003 | WE8HP |
4 | 0004 | US8PC437 | 5 | 0005 | WE8EBCDIC37 | 6 | 0006 | WE8EBCDIC500 |
7 | 0007 | WE8EBCDIC1140 | 8 | 0008 | WE8EBCDIC285 | 9 | 0009 | WE8EBCDIC1146 |
10 | 000A | WE8PC850 | 11 | 000B | D7DEC | 12 | 000C | F7DEC |
13 | 000D | S7DEC | 14 | 000E | E7DEC | 15 | 000F | SF7ASCII |
16 | 0010 | NDK7DEC | 17 | 0011 | I7DEC | 18 | 0012 | NL7DEC |
19 | 0013 | CH7DEC | 20 | 0014 | YUG7ASCII | 21 | 0015 | SF7DEC |
22 | 0016 | TR7DEC | 23 | 0017 | IW7IS960 | 25 | 0019 | IN8ISCII |
27 | 001B | WE8EBCDIC1148 | 28 | 001C | WE8PC858 | 31 | 001F | WE8ISO8859P1 |
32 | 0020 | EE8ISO8859P2 | 33 | 0021 | SE8ISO8859P3 | 34 | 0022 | NEE8ISO8859P4 |
35 | 0023 | CL8ISO8859P5 | 36 | 0024 | AR8ISO8859P6 | 37 | 0025 | EL8ISO8859P7 |
38 | 0026 | IW8ISO8859P8 | 39 | 0027 | WE8ISO8859P9 | 40 | 0028 | NE8ISO8859P10 |
41 | 0029 | TH8TISASCII | 42 | 002A | TH8TISEBCDIC | 43 | 002B | BN8BSCII |
44 | 002C | VN8VN3 | 45 | 002D | VN8MSWIN1258 | 46 | 002E | WE8ISO8859P15 |
50 | 0032 | WE8NEXTSTEP | 61 | 003D | AR8ASMO708PLUS | 70 | 0046 | AR8EBCDICX |
72 | 0048 | AR8XBASIC | 81 | 0051 | EL8DEC | 82 | 0052 | TR8DEC |
90 | 005A | WE8EBCDIC37C | 91 | 005B | WE8EBCDIC500C | 92 | 005C | IW8EBCDIC424 |
93 | 005D | TR8EBCDIC1026 | 94 | 005E | WE8EBCDIC871 | 95 | 005F | WE8EBCDIC284 |
96 | 0060 | WE8EBCDIC1047 | 97 | 0061 | WE8EBCDIC1140C | 98 | 0062 | WE8EBCDIC1145 |
99 | 0063 | WE8EBCDIC1148C | 110 | 006E | EEC8EUROASCI | 113 | 0071 | EEC8EUROPA3 |
114 | 0072 | LA8PASSPORT | 140 | 008C | BG8PC437S | 150 | 0096 | EE8PC852 |
152 | 0098 | RU8PC866 | 153 | 0099 | RU8BESTA | 154 | 009A | IW8PC1507 |
155 | 009B | RU8PC855 | 156 | 009C | TR8PC857 | 158 | 009E | CL8MACCYRILLIC |
159 | 009F | CL8MACCYRILLICS | 160 | 00A0 | WE8PC860 | 161 | 00A1 | IS8PC861 |
162 | 00A2 | EE8MACCES | 163 | 00A3 | EE8MACCROATIANS | 164 | 00A4 | TR8MACTURKISHS |
165 | 00A5 | IS8MACICELANDICS | 166 | 00A6 | EL8MACGREEKS | 167 | 00A7 | IW8MACHEBREWS |
170 | 00AA | EE8MSWIN1250 | 171 | 00AB | CL8MSWIN1251 | 172 | 00AC | ET8MSWIN923 |
173 | 00AD | BG8MSWIN | 174 | 00AE | EL8MSWIN1253 | 175 | 00AF | IW8MSWIN1255 |
176 | 00B0 | LT8MSWIN921 | 177 | 00B1 | TR8MSWIN1254 | 178 | 00B2 | WE8MSWIN1252 |
179 | 00B3 | BLT8MSWIN1257 | 180 | 00B4 | D8EBCDIC273 | 181 | 00B5 | I8EBCDIC280 |
182 | 00B6 | DK8EBCDIC277 | 183 | 00B7 | S8EBCDIC278 | 184 | 00B8 | EE8EBCDIC870 |
185 | 00B9 | CL8EBCDIC1025 | 186 | 00BA | F8EBCDIC297 | 187 | 00BB | IW8EBCDIC1086 |
188 | 00BC | CL8EBCDIC1025X | 189 | 00BD | D8EBCDIC1141 | 190 | 00BE | N8PC865 |
191 | 00BF | BLT8CP921 | 192 | 00C0 | LV8PC1117 | 193 | 00C1 | LV8PC8LR |
194 | 00C2 | BLT8EBCDIC1112 | 195 | 00C3 | LV8RST104090 | 196 | 00C4 | CL8KOI8R |
197 | 00C5 | BLT8PC775 | 198 | 00C6 | DK8EBCDIC1142 | 199 | 00C7 | S8EBCDIC1143 |
200 | 00C8 | I8EBCDIC1144 | 201 | 00C9 | F7SIEMENS9780X | 202 | 00CA | E7SIEMENS9780X |
203 | 00CB | S7SIEMENS9780X | 204 | 00CC | DK7SIEMENS9780X | 205 | 00CD | N7SIEMENS9780X |
206 | 00CE | I7SIEMENS9780X | 207 | 00CF | D7SIEMENS9780X | 208 | 00D0 | F8EBCDIC1147 |
210 | 00D2 | WE8GCOS7 | 211 | 00D3 | EL8GCOS7 | 221 | 00DD | US8BS2000 |
222 | 00DE | D8BS2000 | 223 | 00DF | F8BS2000 | 224 | 00E0 | E8BS2000 |
225 | 00E1 | DK8BS2000 | 226 | 00E2 | S8BS2000 | 230 | 00E6 | WE8BS2000E |
231 | 00E7 | WE8BS2000 | 232 | 00E8 | EE8BS2000 | 235 | 00EB | CL8BS2000 |
239 | 00EF | WE8BS2000L5 | 241 | 00F1 | WE8DG | 251 | 00FB | WE8NCR4970 |
261 | 0105 | WE8ROMAN8 | 262 | 0106 | EE8MACCE | 263 | 0107 | EE8MACCROATIAN |
264 | 0108 | TR8MACTURKISH | 265 | 0109 | IS8MACICELANDIC | 266 | 010A | EL8MACGREEK |
267 | 010B | IW8MACHEBREW | 277 | 0115 | US8ICL | 278 | 0116 | WE8ICL |
279 | 0117 | WE8ISOICLUK | 301 | 012D | EE8EBCDIC870C | 311 | 0137 | EL8EBCDIC875S |
312 | 0138 | TR8EBCDIC1026S | 314 | 013A | BLT8EBCDIC1112S | 315 | 013B | IW8EBCDIC424S |
316 | 013C | EE8EBCDIC870S | 317 | 013D | CL8EBCDIC1025S | 319 | 013F | TH8TISEBCDICS |
320 | 0140 | AR8EBCDIC420S | 322 | 0142 | CL8EBCDIC1025C | 323 | 0143 | CL8EBCDIC1025R |
324 | 0144 | EL8EBCDIC875R | 351 | 015F | WE8MACROMAN8 | 352 | 0160 | WE8MACROMAN8S |
353 | 0161 | TH8MACTHAI | 354 | 0162 | TH8MACTHAIS | 368 | 0170 | HU8CWI2 |
380 | 017C | EL8PC437S | 381 | 017D | EL8EBCDIC875 | 382 | 017E | EL8PC737 |
383 | 017F | LT8PC772 | 384 | 0180 | LT8PC774 | 385 | 0181 | EL8PC869 |
386 | 0182 | EL8PC851 | 390 | 0186 | CDN8PC863 | 401 | 0191 | HU8ABMOD |
500 | 01F4 | AR8ASMO8X | 504 | 01F8 | AR8NAFITHA711T | 505 | 01F9 | AR8SAKHR707T |
506 | 01FA | AR8MUSSAD768T | 507 | 01FB | AR8ADOS710T | 508 | 01FC | AR8ADOS720T |
509 | 01FD | AR8APTEC715T | 511 | 01FF | AR8NAFITHA721T | 514 | 0202 | AR8HPARABIC8T |
554 | 022A | AR8NAFITHA711 | 555 | 022B | AR8SAKHR707 | 556 | 022C | AR8MUSSAD768 |
557 | 022D | AR8ADOS710 | 558 | 022E | AR8ADOS720 | 559 | 022F | AR8APTEC715 |
560 | 0230 | AR8MSWIN1256 | 560 | 0230 | AR8MSAWIN | 561 | 0231 | AR8NAFITHA721 |
563 | 0233 | AR8SAKHR706 | 565 | 0235 | AR8ARABICMAC | 566 | 0236 | AR8ARABICMACS |
567 | 0237 | AR8ARABICMACT | 590 | 024E | LA8ISO6937 | 798 | 031E | WE8DECTST |
829 | 033D | JA16VMS | 830 | 033E | JA16EUC | 831 | 033F | JA16EUCYEN |
832 | 0340 | JA16SJIS | 833 | 0341 | JA16DBCS | 834 | 0342 | JA16SJISYEN |
835 | 0343 | JA16EBCDIC930 | 836 | 0344 | JA16MACSJIS | 840 | 0348 | KO16KSC5601 |
842 | 034A | KO16DBCS | 845 | 034D | KO16KSCCS | 846 | 034E | KO16MSWIN949 |
850 | 0352 | ZHS16CGB231280 | 851 | 0353 | ZHS16MACCGB231280 | 852 | 0354 | ZHS16GBK |
853 | 0355 | ZHS16DBCS | 860 | 035C | ZHT32EUC | 861 | 035D | ZHT32SOPS |
862 | 035E | ZHT16DBT | 863 | 035F | ZHT32TRIS | 864 | 0360 | ZHT16DBCS |
865 | 0361 | ZHT16BIG5 | 866 | 0362 | ZHT16CCDC | 867 | 0363 | ZHT16MSWIN950 |
868 | 0364 | ZHT16HKSCS | 870 | 0366 | AL24UTFFSS | 871 | 0367 | UTF8 |
872 | 0368 | UTFE | 994 | 03E2 | WE16DECTST2 | 995 | 03E3 | WE16DECTST |
996 | 03E4 | KO16TSTSET | 997 | 03E5 | JA16TSTSET2 | 998 | 03E6 | JA16TSTSET |
1001 | 03E9 | US16TSTFIXED | 1830 | 0726 | JA16EUCFIXED | 1832 | 0728 | JA16SJISFIXED |
1833 | 0729 | JA16DBCSFIXED | 1840 | 0730 | KO16KSC5601FIXED | 1842 | 0732 | KO16DBCSFIXED |
1850 | 073A | ZHS16CGB231280FIXED | 1852 | 073C | ZHS16GBKFIXED | 1853 | 073D | ZHS16DBCSFIXED |
1860 | 0744 | ZHT32EUCFIXED | 1863 | 0747 | ZHT32TRISFIXED | 1864 | 0748 | ZHT16DBCSFIXED |
1865 | 0749 | ZHT16BIG5FIXED | 9996 | 270C | HZ-GB-2312 | 9997 | 270D | ISO2022-KR |
9998 | 270E | ISO2022-CN | 9999 | 270F | ISO2022-JP |
select nls_charset_id(value) nls_charset_id, value
from v$nls_valid_values
where parameter = ‘CHARACTERSET’
order by nls_charset_id(value);
Gives the nls_charset_id in DECIMAL, so you need to convert it to HEX first.
Warning: User modifications of export dump files are not supported by Oracle. The character set information is also held in other places in the export dump file and modifying only the two bytes may lead to problems with imported data.
NOT WORKING any more with Oracle 9i R2 (9.2) and up, due to changes in the import /export tools but you can use a 8i exp against a 9i db for example, more info is in Note 132904.1 Compatibility Matrix for Export & Import Between Different Oracle Versions
In some cases it can be useful to modify the character set information held in the dump file. This should not be taken lightly since the character set information is also held in other places.
We STRONGLY advice you to log a NLS tar FIRST to get confirmation that this is a solution for your problem before starting to change this header.
If, after careful consideration of other options and verification by support, you do decide to edit the character set simply use a binary file editor to do so.
You could for example use the freeware Hex Editor XVI32 from
or the shareware File Editor 2000 from
In case of multiple dump files from a single export, you need to modify each and every export dump file. Otherwise, you will error out with ?
IMP-00008: unrecognized statement in the export file:
when opening the second file.
No comments:
Post a Comment