charfreq

Descripción del contenido de la página

Utilidad programada en Forth para contar la frecuencia de los signos de un texto.

Etiquetas:

charfreq hace un recuento de todos los caracteres presentes en un texto. Fue un preludio de una herramienta más elaborada que tenía un objetivo relacionado: dvorakizer.

Código fuente

\ --------------------------------------------------------------
CR .( charfreq )

\ (C) 2008 Marcos Cruz (programandala.net)
\ License: http://programandala.net/license

\ Program written in ANS Forth.

\ --------------------------------------------------------------
\ History

\ 2008-05-11 Start. First working version.

\ --------------------------------------------------------------
\ To do

\ Filtering flags: accents, digits...

\ --------------------------------------------------------------
\ Init

MARKER --charfreq--

DECIMAL

\ --------------------------------------------------------------
\ File management

VARIABLE input-file-id

: open-input  ( c-addr u -- )
  R/O BIN OPEN-FILE ABORT" open file error"
  input-file-id !
  ;

: close-input  ( -- )
  input-file-id @ CLOSE-FILE ABORT" close file error"
  ;

\ --------------------------------------------------------------
\ Data space

256 CONSTANT #characters

CREATE counts  #characters CELLS ALLOT

: clear-counts  ( -- )
  counts #characters CELLS ERASE
  ;

: character>count-address  ( b -- a )
  \ b = character
  \ u = character count address
  CELLS counts +
  ;

: character>count  ( b -- u )
  \ b = character
  \ u = character count
  character>count-address @
  ;

\ --------------------------------------------------------------
\ Data input

VARIABLE character

: read-character  ( -- )
  character 1 input-file-id @ READ-FILE
  ABORT" read file error"
  ;

VARIABLE counted-characters

: init-count  ( -- )
  clear-counts
  0 counted-characters !
  ;

: count-character  ( -- )
  1 character @ character>count-address +!
  1 counted-characters +!
  ;

: count-characters  ( -- )
  init-count
  BEGIN  read-character
  WHILE  count-character
  REPEAT
  ;

\ --------------------------------------------------------------
\ Report

9 VALUE field-separator  \ tab character

: .field-separator  ( -- )
  field-separator EMIT
  ;

: .character-code  ( b -- )
  \ b = character
   3 .R
   ;

: .character  ( b -- )
  ."  (" EMIT  ." )"
  ;

: printable? ( b -- f )
  \ b = character
  BL >=
  ;

: .character-field ( b -- )
  \ b = character
  DUP .character-code
  DUP printable?  IF  .character  ELSE  DROP  THEN
  ;

: count>frequency  ( u -- d )
  \ u = character count
  \ d = character frequency
  1000000 counted-characters @ */MOD NIP S>D
  ;

: character>frequency  ( b -- d )
  \ b = character
  \ d = character frequency
  character>count count>frequency
  ;

: .frequency  ( d -- )
  \ d = character frequency
  <# # # # # # # [CHAR] . HOLD #S #> TYPE
  ;

: .frequency-field ( b -- )
  \ b = character
  character>frequency .frequency
  ;

: .report-record  ( b -- )
  \ b = character
  CR DUP .character-field  .field-separator  .frequency-field
  ;

: .report  ( -- )
  #characters 0  DO  I .report-record  LOOP
  ;

\ --------------------------------------------------------------
\ Main

: charfreq  ( c-addr u  -- )
  \ Calculate character frequencies in a text file.
  \ The text encoding has to be one byte per character.
  \ c-addr u = file name
  open-input  count-characters .report  close-input
  ;

.(  charfreq ok!) CR


Descargas