[comp.lang.pascal] Unused chars

OEYO8722%TREARN.BITNET@uga.cc.uga.edu ( Hur AKDULGER) (09/06/90)
!> Date: Wed, 5 Sep 90 07:37:29 -0500
!> From: convex!graham@uxc.cso.uiuc.edu (Marv Graham)

!> In article <24386@adm.BRL.MIL> you write:
!> >and now we're converting the most repeated words (greater than 1)
!> >to unused chars. Unused characters are (K)(L)(M)(N)(O)(P) in the OURRING.

!> Suppose there are no unused characters?

!> Marv Graham; Convex Computer Corp.  {uunet,sun,uiucdcs,allegra}!converaham

I received private mail from Mr. M. Graham today.
It's private mail, but I'll answer him question on here.

I think we can't suppose. Because it's impossible.

I'll code a DICTIONARY (english-turkish, turkish-english).
English alphabet have got 26 characters (uppers + lowers = 52 chars).
And our (Turkish) alphabet have got 29 characters (uppers + lowers = 58 chars).
23 letters are identical (same) in both of alphabets.
26 - 23 = 3 (three letters aren't using in turkish alphabet).
29 - 23 = 6 (our 6 letters aren't using in english alphabet).

Total = 2 * (23 + 6 + 3) = 64 characters (capital letters + small letters).
and special characters're "~.:,;-'12345678910!&?<>()" and square brackets
(I can't type square brackets on my term).

New total = 64 + 25 = 89 characters.
ASCII character set have got 255 characters.
Unused characters count can be ---> 255 - 89 = 166.

My algorithm don't useful,
I think, the data file of dictionary includes maximum 89 different letters.
It's special case.
I use it only in my DICTIONARY program.

--------- Now, We're developing new logics ----------------------
Converting two bytes to one byte algorithm better than one word (string) to one
 byte algorithm.

Because On Dictionary file (Unused chars (A)(B)(C)(D))

string to one word algoritm.

String             Count   New String
----------------   -----   ----------------   ----------------------
HOP                  1     HOP                It didn't change (same size)
HOPE                 1     HOPE                        "
HOPEFUL              1     HOPEFUL                     "
HOPEFULLY            1     HOPEFULLY                   "
HOPEFULNESS          1     HOPEFULLNESS                "
HOPELESS             1     HOPELESS                    "
HOPELESSLY           1     HOPELESSLY                  "
HOPELESSNESS         1     HOPELESSNESS                "


two bytes to one byte algorithm.

String             Count   chance
----------------   -----   ----------------
HO                    8    HO ---> A
PE                    7    PE ---> B
FU                    2    FU ---> C
LL                    1    dont change
LN                    1       "
ES                    1       "
LE                    3    LE ---> D
SS                    4    SS ---> F
LY                    1    dont change
NE                    1    dont change

String                New String
----------------      ----------------   ----------------------
HOP                    AP                3  bytes will be 2 bytes
HOPE                   AB                4    "    "    " 2   "
HOPEFUL                ABCL              7    "    "    " 4   "
HOPEFULLY              ABCLLY            9    "    "    " 6   "
HOPEFULNESS            ABCLNESS          11   "    "    " 8   "
HOPELESS               ABDF              8    "    "    " 4   "
HOPELESSLY             ABDFLY            10   "    "    " 6   "
HOPELESSNESS           ABDFNEF           12   "    "    " 7   "

Our header string is "HOAPEBFUCLEDSSF". (15 bytes)

size of input strings : 64
size of output strings + header : 39 + 15 = 54

it's not bad result, because all "HO"s in our data file will be "A".
header size dont change. (3 bytes)

if we've got 166 unused char and if we used all of them,
our header string size will be 166 * 3 = 498 bytes.


I trying Huffman tree, it's good way. But I cant code it...



Hur AKDULGER...........