caseness

Descripción del contenido de la página

Palabras en Forth para convertir caracteres y cadenas de texto de minúsculas a mayúsculas o al contrario.

Etiquetas:

Este pequeño programa en Forth con Gforth añade varias palabras para convertir de mayúsculas a minúsculas, y al contrario, caracteres y cadenas de texto. Dichas palabras las implementé de forma nativa en mi Forth 5mx y necesitaba una versión en Forth para poder ejecutar algunas fuentes en otros sistemas Forth.

El programa construye y usa una tabla de búsqueda que contiene la cualidad de cada carácter (mayúscula o minúscula). La transformación se hace cambiando el estado del sexto bitio del octeto (lo que, como se explica en el código fuente, no funciona con unos pocos caracteres). Este método por tanto sólo sirve para juegos de caracteres con codificaciones de un octeto por carácter.

Una solución mejor, que no tendría las dos limitaciones citadas, usaría una tabla de sustitución completa, con todas las parejas de caracteres.

Código fuente

CR .( caseness )

\ Copyright (C) 2007 Marcos Cruz (http://programandala.net)
\ License/Licencia/Permesilo: http://programandala.net/license

\ This Forth program implements the following common use words:
\   tolower toupper lower upper

\ My own Forth system, Forth 5mx, has those words as native,
\ but I needed a high level version
\ to use my Forth programs in other Forth systems that don't have them.

\ My approach is not perfect or complete (in fact it's too complex) 
\ but it suited my needs:
\ a lookup array stores the "caseness" of every letter
\ and the change is done simply changing the 6th bit of the char.
\ In ISO-8859-1 and Windows-1252 this works for all letters 
\ but the following pairs:
\   Y/y with diaeresis.
\   S/s with caron.
\   Z/z with caron.
\   OE/oe.
\ A much simpler approach would be to store the pair of every char 
\ (or zero if it doesn't have one); then all chars could be treated.
\ This program works only with single byte encodings.

\ Written in ANS Forth. 

\ 2007-12-08 First version.
\ 2007-12-15 Improved: two IF structures removed.

char u constant uppercase-id  \ caseness id
char l constant lowercase-id  \ caseness id
256 constant #chars

create caseness-array #chars chars allot
caseness-array #chars chars erase

: caseness@  ( c -- u ) caseness-array + c@  ;
: caseness!  ( u c -- )  caseness-array + c!  ;

: all-caseness!  ( u1 u2 u3 -- )

  \ Store a caseness id into several char positions in the caseness array.

  \ u1 = caseness id
  \ u2 = first char
  \ u3 = last char

  \ 2007-12-08

  1+ swap  do
    dup i caseness!
  loop  drop

  ;

\ Init all the caseness:
uppercase-id char A char Z all-caseness!
lowercase-id char a char z all-caseness!
uppercase-id 192 222 all-caseness!  \ from "A grave" to uppercase thorn
lowercase-id 224 254 all-caseness!  \ from "a grave" to lowercase thorn

false [if]  \ obsolete, not used

: change-case  ( c1 -- c2 )

  \ Change the case of a char.

  \ 2007-12-08

  32 xor

  ;

[then]

: ?change-case  ( c1 f -- c2 )

  \ Change the case of a char if needed.
  \ Only some chars can not be translated this way.

  \ c1 = char to change
  \ f = change char?
  \ c2 = char changed if needed

  \ 2007-12-15

  32 and xor

  ;

[undefined] tolower [if]

: uppercase?  ( u -- flag )  caseness@ uppercase-id =  ;

: tolower  ( c1 -- c2 | c1 )

  \ Change a char to lower case if possible.

  \ 2007-12-08 First version.
  \ dup uppercase? if  change-case  then

  \ 2007-12-08 Second version, without control structures.
  dup uppercase? ?change-case

  ;

[then]

[undefined] toupper [if]

: lowercase?  ( u -- flag )  caseness@ lowercase-id =  ;

: toupper  ( c1 -- c2 | c1 )

  \ Change a char to upper case if possible.

  \ 2007-12-08 First version.
  \ dup lowercase? if  change-case  then

  \ 2007-12-15 Second version, without control structures.
  dup lowercase? ?change-case

  ;

[then]

[undefined] lower [if]

: lower  ( c-addr u -- )

  \ Change the string to lower case.

  \ 2007-12-08

  over + swap  do
    i c@ tolower i c!
  loop

  ;

[then]

[undefined] upper [if]

: upper  ( c-addr u -- )

  \ Change the string to upper case.

  \ 2007-12-08

  over + swap  do
    i c@ toupper i c!
  loop

  ;

[then]

true [if]  \ used for debugging

: .caseness-array  ( -- )

  \ Show the content of the caseness array.
  \ Just for debugging.

  \ 2007-12-08 First version.
  \ 2010-01-25 Better printout.

  #chars 0  do
    i caseness@ ?dup  if
      i dup . [char] ( emit emit [char] ) emit [char] : emit emit [char] , emit space
    then
  loop

  ;

[then]

.( caseness ok!)

Descargas