Friday, August 03, 2007

Making Strings Out of Numbers in C

A while back I was writing a C program and found I needed to convert numbers (integers) into strings. The language has no built-in facility for this. While sprintf() can be used, it has some drawbacks. Sprintf() only works with binary, octal, or hexadecimal bases, and the calling function has to calculate how much space each number needs. In any case, it didn't fit the bill at the time.

So I wrote itoa.c, which does the job. The file also contains a test wrapper with a main() function which may be useful in its own right. It will convert a decimal number on the command line to its representation in any base from 2 to 62.

One feature you may like or dislike is itoa's use of malloc() to get space for the resulting string. Itoa figures out how much space to allocate, which makes for twice the work for it but may save a large hassle in the calling function. I think that's an example of robustness, since it works correctly for any base, but your mileage, etc.

How do I know it works correctly for any base from 2 to 62? By inductive proof. The length for a string whose value is less than its base is 1 plus a null char for C, so the length is 2. So suppose the algorithm works for some base k. Since the same work is being done* when actually filling in the string with ritoa(), we see that working for base k means it works for base k+1. The boundary at 62 also works, so the algorithm is correct.

As always with C, be sure to garbage collect with free() when you no longer need the string.


/*
* Itoa.c
* Copyright 2004, Heal Consulting
* Published under the General Public License. See it at http://gnu.org
*/
/*
A sleeker version is available at:
http://www.cs.princeton.edu/courses/archive/fall96/cs126/examples/itoa.c
*/

#include /* only needed for main, fprintf error*/

static char *ItoaDigits =
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

/*
* ritoa
* recursive itoa
*/
long int
ritoa(long int val, long int topval, char *s, int base)
{
long int n = val / base;
if (n > 0)
topval = ritoa(n, topval, s+1, base);
else
*(s+1) = '\0';
*s = ItoaDigits[ topval % base ];
return(topval / base);
}

/*
* itoa
* - mallocs a string of the right length
* - calls ritoa to fill in the string for a given base
*/
char * itoa(long int val, int base)
{
int len;
char *s,*buf;
long int t = val;
for (len=2; t; t /= base) len++ ; /* quickie log_base */
/* printf("len; %d\n", len); */

if((buf = (char *) malloc(len)) == NULL)
{fprintf(stderr,"out of memory in itoa\n"); exit(1);}
s = buf;
if (val < 0)
{
*s++ = '-';
val = -val;
};
len = (int) ritoa(val, val, s, base);
return(buf);
}

/*
* The test program may actually be useful.
* Converts a decimal number (long int) to a base you supply.
*/
int
main(int argc, char *argv[])
{
long int num = 0;
char *s;
int base = 10;
int maxlen = strlen(ItoaDigits);

if (argc >2) { base = atoi(argv[2]); };
if (argc >1) { num = atol(argv[1]); };
if ((argc <= 1) || (base > maxlen ) || (base < 2))
{
printf("Usage:\n\n%s [number [base]]\n", argv[0]);
printf("where number is in decimal and 2 <= base <= %d\n", maxlen);
exit(1);
}
s = itoa(num, base);
puts(s);
return(0);
}


After I wrote it, I found a slightly sleeker (base 2 through 16 only) version from an old Princeton CS class*.


* No, I didn't go to Princeton. That's why I waved my hands in the proof.

4 comments:

lucas said...

Hi, nice post! Your code doesn't handle well the LONG_MIN of "limits.h" I think you should use an unsigned to handle negative numbers. This happens because negative numbers reach 2^(sizeoflong-1) and positive numbers reach 2^(sizeoflong-1)-1
the 0 makes this asymmetric, so when you do
if (num < 0)
num = -num
if num is equal to LONG_MIN you have an overflow.

Loren Heal said...

You are correct that there is a boundary bug there.

But I don't see how to use an unsigned (u_long) to fix the problem without resorting to special case code. Perhaps you can show me the obvious.

I've been toying with a generalization of this algorithm to convert between arbitrary bases. If I do, I'll include the required LONG_MIN fix.

lucas said...

Hi I think that the solution would be to:

- add an unsigned var in the itoa function
that becomes val or -val depending if val is positive or negative and pass the unsiged var to ritoa instead of val.

- and the rtoa declaration should be:
long int
ritoa(unsigned long int val, unsigned long int topval, char *s, int base)

Doing this change works fine for LONG_MIN, I did not test it much though.

Loren Heal said...

Yes, I see now. I'll work on it and post and update.