charfreq
Descripción del contenido de la página
Utilidad programada en Forth para contar la frecuencia de los signos de un texto.
charfreq hace un recuento de todos los caracteres presentes en un texto. Fue un preludio de una herramienta más elaborada que tenía un objetivo relacionado: dvorakizer.
Código fuente
\ --------------------------------------------------------------
CR .( charfreq )
\ (C) 2008 Marcos Cruz (programandala.net)
\ License: http://programandala.net/license
\ Program written in ANS Forth.
\ --------------------------------------------------------------
\ History
\ 2008-05-11 Start. First working version.
\ --------------------------------------------------------------
\ To do
\ Filtering flags: accents, digits...
\ --------------------------------------------------------------
\ Init
MARKER --charfreq--
DECIMAL
\ --------------------------------------------------------------
\ File management
VARIABLE input-file-id
: open-input ( c-addr u -- )
R/O BIN OPEN-FILE ABORT" open file error"
input-file-id !
;
: close-input ( -- )
input-file-id @ CLOSE-FILE ABORT" close file error"
;
\ --------------------------------------------------------------
\ Data space
256 CONSTANT #characters
CREATE counts #characters CELLS ALLOT
: clear-counts ( -- )
counts #characters CELLS ERASE
;
: character>count-address ( b -- a )
\ b = character
\ u = character count address
CELLS counts +
;
: character>count ( b -- u )
\ b = character
\ u = character count
character>count-address @
;
\ --------------------------------------------------------------
\ Data input
VARIABLE character
: read-character ( -- )
character 1 input-file-id @ READ-FILE
ABORT" read file error"
;
VARIABLE counted-characters
: init-count ( -- )
clear-counts
0 counted-characters !
;
: count-character ( -- )
1 character @ character>count-address +!
1 counted-characters +!
;
: count-characters ( -- )
init-count
BEGIN read-character
WHILE count-character
REPEAT
;
\ --------------------------------------------------------------
\ Report
9 VALUE field-separator \ tab character
: .field-separator ( -- )
field-separator EMIT
;
: .character-code ( b -- )
\ b = character
3 .R
;
: .character ( b -- )
." (" EMIT ." )"
;
: printable? ( b -- f )
\ b = character
BL >=
;
: .character-field ( b -- )
\ b = character
DUP .character-code
DUP printable? IF .character ELSE DROP THEN
;
: count>frequency ( u -- d )
\ u = character count
\ d = character frequency
1000000 counted-characters @ */MOD NIP S>D
;
: character>frequency ( b -- d )
\ b = character
\ d = character frequency
character>count count>frequency
;
: .frequency ( d -- )
\ d = character frequency
<# # # # # # # [CHAR] . HOLD #S #> TYPE
;
: .frequency-field ( b -- )
\ b = character
character>frequency .frequency
;
: .report-record ( b -- )
\ b = character
CR DUP .character-field .field-separator .frequency-field
;
: .report ( -- )
#characters 0 DO I .report-record LOOP
;
\ --------------------------------------------------------------
\ Main
: charfreq ( c-addr u -- )
\ Calculate character frequencies in a text file.
\ The text encoding has to be one byte per character.
\ c-addr u = file name
open-input count-characters .report close-input
;
.( charfreq ok!) CR