Unicode

Programming in Modern Fortran

Modern Fortran language standards have no intrinsic support for Unicode I/O. To bypass this limitation, the Universal Coded Character Set (UCS-2) defined in ISO 10646 can be used instead, which is the predecessor of UTF-16. All code points of the Basic Multilingual Plane can be accessed, while the Supplementary Multilingual Plane (> U+FFFF) is not available (UCS-4).

The following example prints the Unicode symbol ☻ (black smiling face) of code point U+263B. The compiled binary must be executed in a terminal with Unicode support, like XTerm or sakura.

! unicode.f90
program main
    use, intrinsic :: iso_fortran_env, only: output_unit
    implicit none
    integer, parameter :: ucs2 = selected_char_kind('ISO_10646')
    character(kind=ucs2, len=:), allocatable :: str

    str = ucs2_'Unicode character: \u263B'

    open (output_unit, encoding='utf-8')
    print '(a)', str
end program main

Build and run the executable with:

$ gfortran13 -fbackslash -o unicode unicode.f90
$ ./unicode
Unicode character: ☻

The source code does not compile with Flang 7, as it has no support for ISO 10646. The -fbackslash compiler flag is required for escaped Unicode characters. Otherwise, the type conversion has to be done manually using BOZ literals, for instance:

str = ucs2_'Unicode character: ' // char(int(z'263B'), kind=ucs2)

Or, simply by using the decimal value of the character code point, without BOZ literal:

str = ucs2_'Unicode character: ' // char(9787, kind=ucs2)

References

Wikipedia: Unicode blocks

< Random Numbers

[Index]

Object-Oriented Programming >