Spoofing Those Bitly Alpha-Numeric Keys in MySQL

2011/3/26 15:21
Recently I was tasked with making an internal "link shortener", a la Bit.ly's. For those that are unfamiliar with the concept, if you have an exceedingly long url that's difficult to remember i.e. http://us.rd.yahoo.com/dailynews/external/ap_av/av_ap3_wl/92a15245e85f7ca01a3a65b88ede5075/39997189;_ylt=AvuGi5zx_KMlvJdMHmFncMoUewgF;_ylu=X3oDMTFhanRoYjYwBHBvcwM5BHNlYwN5bl9yXzNzbG90X3ZpZGVvBHNsawN2aWQtZWQtbGluaw--/*http://news.yahoo.com/video/world-15749633/24062366 You can use a shortener to make it more manageable. Bitly compressed the example URL above to: http://bit.ly/g25djH much easier to remember, much easier to distribute.

So I wanted similar functionality. I want to be able to generate a short, unique key to represent a long url. Most relational databases already use some form of base-ten integers (i.e. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... so forth) for keys on their tables. This is pretty well established, there are either auto increment support or sequences depending on the database that will keep track of these values for you. I could use these keys for shortening, but in the offshoot chance this feature took off it wouldn't be too long before the integer numbers would get pretty long themselves, plus its generally advisable to mask the real table ids from the public eye.

So back to that url that bit.ly generated for me: http://bit.ly/g25djH. "g25djH" is obviously a key that translates to a row in a database somewhere that holds the original long url. Given that databases prefer numeric keys, what's to make of bit.ly's alpha numeric version? By using upper and lower case letters in their keys, they save a ton of space.

To wit: with a decimal representation of a number, each digit represents 10^x. If you add lowercase letters (a-z), each digit represents 36^x. Much larger values can be represented with fewer digits. So the standard ASCII character set provides for a couple hundred or so characters with which to construct an alpha numeric key, but the ten digits and twenty six lower-case letters are enough for my needs, and this example.

But still, how should you leverage a database to store these keys? You could make a varchar column a primary key and store these alpha-numeric values, but that wouldn't be practical as the dataset grew not to mention if you wanted to use this value as a foreign key elsewhere. You could have a standard int-type primary key and a second key for the alpha-numeric representation, this is closer to ideal but the extra column is wasted space.

The best solution, in my opinion, is to make a function that will convert the standard base 10 integer into a base 36 value. This is essentially the same process as converting a decimal to hexidecimal value, but with a whole slew of characters. Therefore you get your condensed alpha-numeric keys while only using the standard int-type primary key.

This is functionality I'd like the db to have so I can translate keys directly in the sql. To accomplish this I've written two functions, one to convert an int to an alpha numeric key, and the other to convert an alpha numeric key to an int:

DELIMITER $$
DROP FUNCTION IF EXISTS ConvertIntToKey$$
CREATE FUNCTION ConvertIntToKey(intval INT )
RETURNS VARCHAR(100)
DETERMINISTIC
BEGIN
        DECLARE i INT;
        DECLARE d INT;
        DECLARE r INT;
        DECLARE dMoinsR INT;
        DECLARE ch CHAR(1);
        DECLARE result VARCHAR(200);


        SET result = '';
        SET d = intval;

        REPEAT
        SET r = MOD(d, 36);
        SET ch = SUBSTR("0123456789abcdefghijklmnopqrstuvwxyz", (r+1), 1);
        SET result = CONCAT( ch, result);
        SET dMoinsR = d - r;
        SET d = (dMoinsR / 36);
        UNTIL (dMoinsR = 0)
        END REPEAT;

        RETURN result;
END$$
DELIMITER ;

DELIMITER $$
DROP FUNCTION IF EXISTS ConvertKeyToInt$$
CREATE FUNCTION ConvertKeyToInt( keyval VARCHAR(100) )
RETURNS INT
DETERMINISTIC
BEGIN
        DECLARE backwards VARCHAR(100);
        DECLARE keylen INT;
        DECLARE intval INT;
        DECLARE i INT;
        DECLARE rootval INT;



        SET backwards = REVERSE(keyval);
        SET keylen = CHAR_LENGTH(keyval);
        SET i = 0;
        SET intval = 0;
        SET rootval = 0;


        WHILE i < keylen DO
        SET rootval = POW(36, i) * (LOCATE(SUBSTR(backwards, (i+1), 1), "0123456789abcdefghijklmnopqrstuvwxyz") - 1);
        SET intval = intval + rootval;
        SET i = i + 1;
        END WHILE;

        RETURN intval;
END$$
DELIMITER ;

Because these are functions--not procedures--you can slot these right into a SQL query.

So the user types in http://your.bitly.clone/aba1

"aba1" is the key. Your SQL query would look like:

SELECT su.id, ConvertIntToKey(su.id), su.full_url
FROM stored_urls su
WHERE su.id = ConvertKeyToInt(“aba1”);

Also, credit where credit's due, I basically modified the hex to decimal conversion up on wikipedia: http://en.wikipedia.org/wiki/Hexadecimal#Converting_from_other_bases

update 2014/4/6: MySQL offers the conv function to convert numbers between bases. It supports base 2 through base 36.