Function for creating unique strings for urls

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
4
down vote

favorite












I wrote this function in order to create unique url's for my site. Using these urls, visitors could demo site functionality without having to create a login. The function was intended to return a unique string of characters which could be added to a url like this:



http://www.mysite.com?temp_user=dd5vbrk4ax4v4o09l2bm



My function had the following criteria:



  1. Every call must generate a unique string of alpha numeric characters.

  2. Most important: Zero chance of string duplication.

  3. It should be 20 characters long (secure, but not awkwardly long.)

  4. Repeat calls should reflect a high degree of dissimilarity.

  5. Character usage should appear as random as possible.

Here is my code



function url_code()

$letters = "1234567890abcdefghijklmnopqrstuvwxyz";
$microtime = microtime();
$time_array = explode(' ', substr($microtime,2));
$code = '';

$time_array[0] = substr($time_array[0],0,6);
$time_array[0] .= strrev($time_array[0]);
$time_array[2] = strrev($time_array[1]);

for($i = 0; $i <10; $i++)

$num = ((int)$time_array[1][$i]);
$num += ((int)$time_array[2][$i])*10;
$num += ((int)$time_array[0][$i])*100;

$mult = (int)($num / 36);
$mod = $num - ($mult * 36);

$code .= $letters[$mult].$letters[$mod];



return $code;




Please let me know if there are any simple improvements I could make or if I missed any serious problems with string repetition.



One interesting note on character randomness: currently, characters generated by the $mult variable (the first, third, fifth, and other odd character in the code string) will never be t,u,v,w,x,y, or z. How can I fix this so that [t-z] are also used?



Possible fix (all characters will be used. However, [8-9a-s] are twice as likely to appear as the other characters, [0-7t-z]):



$mult = (int)($num / 36);
$mod = $num - ($mult * 36);
if($mult < 9 && mt_rand(0,1))$mult += 27;

$code .= $letters[$mult].$letters[$mod];


Note, Scalability issue found: If this function is run on more than one server, there is a tiny chance that two users could send a request at exactly the same time and get a duplicated code.







share|improve this question



























    up vote
    4
    down vote

    favorite












    I wrote this function in order to create unique url's for my site. Using these urls, visitors could demo site functionality without having to create a login. The function was intended to return a unique string of characters which could be added to a url like this:



    http://www.mysite.com?temp_user=dd5vbrk4ax4v4o09l2bm



    My function had the following criteria:



    1. Every call must generate a unique string of alpha numeric characters.

    2. Most important: Zero chance of string duplication.

    3. It should be 20 characters long (secure, but not awkwardly long.)

    4. Repeat calls should reflect a high degree of dissimilarity.

    5. Character usage should appear as random as possible.

    Here is my code



    function url_code()

    $letters = "1234567890abcdefghijklmnopqrstuvwxyz";
    $microtime = microtime();
    $time_array = explode(' ', substr($microtime,2));
    $code = '';

    $time_array[0] = substr($time_array[0],0,6);
    $time_array[0] .= strrev($time_array[0]);
    $time_array[2] = strrev($time_array[1]);

    for($i = 0; $i <10; $i++)

    $num = ((int)$time_array[1][$i]);
    $num += ((int)$time_array[2][$i])*10;
    $num += ((int)$time_array[0][$i])*100;

    $mult = (int)($num / 36);
    $mod = $num - ($mult * 36);

    $code .= $letters[$mult].$letters[$mod];



    return $code;




    Please let me know if there are any simple improvements I could make or if I missed any serious problems with string repetition.



    One interesting note on character randomness: currently, characters generated by the $mult variable (the first, third, fifth, and other odd character in the code string) will never be t,u,v,w,x,y, or z. How can I fix this so that [t-z] are also used?



    Possible fix (all characters will be used. However, [8-9a-s] are twice as likely to appear as the other characters, [0-7t-z]):



    $mult = (int)($num / 36);
    $mod = $num - ($mult * 36);
    if($mult < 9 && mt_rand(0,1))$mult += 27;

    $code .= $letters[$mult].$letters[$mod];


    Note, Scalability issue found: If this function is run on more than one server, there is a tiny chance that two users could send a request at exactly the same time and get a duplicated code.







    share|improve this question























      up vote
      4
      down vote

      favorite









      up vote
      4
      down vote

      favorite











      I wrote this function in order to create unique url's for my site. Using these urls, visitors could demo site functionality without having to create a login. The function was intended to return a unique string of characters which could be added to a url like this:



      http://www.mysite.com?temp_user=dd5vbrk4ax4v4o09l2bm



      My function had the following criteria:



      1. Every call must generate a unique string of alpha numeric characters.

      2. Most important: Zero chance of string duplication.

      3. It should be 20 characters long (secure, but not awkwardly long.)

      4. Repeat calls should reflect a high degree of dissimilarity.

      5. Character usage should appear as random as possible.

      Here is my code



      function url_code()

      $letters = "1234567890abcdefghijklmnopqrstuvwxyz";
      $microtime = microtime();
      $time_array = explode(' ', substr($microtime,2));
      $code = '';

      $time_array[0] = substr($time_array[0],0,6);
      $time_array[0] .= strrev($time_array[0]);
      $time_array[2] = strrev($time_array[1]);

      for($i = 0; $i <10; $i++)

      $num = ((int)$time_array[1][$i]);
      $num += ((int)$time_array[2][$i])*10;
      $num += ((int)$time_array[0][$i])*100;

      $mult = (int)($num / 36);
      $mod = $num - ($mult * 36);

      $code .= $letters[$mult].$letters[$mod];



      return $code;




      Please let me know if there are any simple improvements I could make or if I missed any serious problems with string repetition.



      One interesting note on character randomness: currently, characters generated by the $mult variable (the first, third, fifth, and other odd character in the code string) will never be t,u,v,w,x,y, or z. How can I fix this so that [t-z] are also used?



      Possible fix (all characters will be used. However, [8-9a-s] are twice as likely to appear as the other characters, [0-7t-z]):



      $mult = (int)($num / 36);
      $mod = $num - ($mult * 36);
      if($mult < 9 && mt_rand(0,1))$mult += 27;

      $code .= $letters[$mult].$letters[$mod];


      Note, Scalability issue found: If this function is run on more than one server, there is a tiny chance that two users could send a request at exactly the same time and get a duplicated code.







      share|improve this question













      I wrote this function in order to create unique url's for my site. Using these urls, visitors could demo site functionality without having to create a login. The function was intended to return a unique string of characters which could be added to a url like this:



      http://www.mysite.com?temp_user=dd5vbrk4ax4v4o09l2bm



      My function had the following criteria:



      1. Every call must generate a unique string of alpha numeric characters.

      2. Most important: Zero chance of string duplication.

      3. It should be 20 characters long (secure, but not awkwardly long.)

      4. Repeat calls should reflect a high degree of dissimilarity.

      5. Character usage should appear as random as possible.

      Here is my code



      function url_code()

      $letters = "1234567890abcdefghijklmnopqrstuvwxyz";
      $microtime = microtime();
      $time_array = explode(' ', substr($microtime,2));
      $code = '';

      $time_array[0] = substr($time_array[0],0,6);
      $time_array[0] .= strrev($time_array[0]);
      $time_array[2] = strrev($time_array[1]);

      for($i = 0; $i <10; $i++)

      $num = ((int)$time_array[1][$i]);
      $num += ((int)$time_array[2][$i])*10;
      $num += ((int)$time_array[0][$i])*100;

      $mult = (int)($num / 36);
      $mod = $num - ($mult * 36);

      $code .= $letters[$mult].$letters[$mod];



      return $code;




      Please let me know if there are any simple improvements I could make or if I missed any serious problems with string repetition.



      One interesting note on character randomness: currently, characters generated by the $mult variable (the first, third, fifth, and other odd character in the code string) will never be t,u,v,w,x,y, or z. How can I fix this so that [t-z] are also used?



      Possible fix (all characters will be used. However, [8-9a-s] are twice as likely to appear as the other characters, [0-7t-z]):



      $mult = (int)($num / 36);
      $mod = $num - ($mult * 36);
      if($mult < 9 && mt_rand(0,1))$mult += 27;

      $code .= $letters[$mult].$letters[$mod];


      Note, Scalability issue found: If this function is run on more than one server, there is a tiny chance that two users could send a request at exactly the same time and get a duplicated code.









      share|improve this question












      share|improve this question




      share|improve this question








      edited 9 hours ago
























      asked 11 hours ago









      Hoytman

      1285




      1285




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          4
          down vote













          Algorithm




          1. Every call must generate a unique string of alpha numeric characters.

          2. Most important: Zero chance of string duplication.



          "zero chance" is really hard to achieve, if it's even possible.
          Your algorithm is essentially based on the current time in microseconds.
          Since systems may adjust their clock, for example with NTP,
          the same microsecond may happen again,
          resulting in a non-unique string.



          Striving for a unique random string is known challenge,
          see for example the uniqid function.
          The documentation offers some recommendations for cryptographic security,
          and I think it might be a good idea to incorporate uniqid into your algorithm.




          1. Repeat calls should reflect a high degree of dissimilarity.



          If I call the function twice in a row I get values with a similar prefix, for example:




          qf1p1dojax7n7m08no1f
          qf1p1drbj0nbnahjqg1f

          gp70pxdfghnbnaetckpa
          gp70pxg8j0q4q3hlfcpa



          This is because in the microseconds parts are too similar in consecutive calls.
          You seem to be using the microseconds part as a 6-digit random number.
          You would get better results with a proper pseudo random generator,
          for example one of the recommendations from uniqid.



          Readability



          The implemented algorithm is quite straightforward:



          • Prepare 3 strings of at least 10-digits:

            • a 12-digit string from the first 6 decimal digits of the microseconds elapsed since the current timestamp seconds, concatenated with its reverse

            • a 10-digit string from the current timestamp

            • a 10-digit string from the reverse of the current timestamp


          • Count from 0 until 9 to generate a 20-letter string:

            • Create a 3-digit number using i-th positions from the prepared strings

            • Encode the 3-digit number as a base-36 number, so that it becomes a 2-letter string

            • Concatenate the 2-letter encoded strings to get a 20-letter string


          Unfortunately this flow of logic is not easy to see in the implementation,
          because the variable names are not helping.



          • Instead of the array $time_array,
            it would be better to use 3 variables with descriptive names.


          • If you extract the big logical steps to functions, for example one to generate a 3-digit number, and another to encode a 3-digit number in base-36, the code could read almost like a story.


          Technique



          Instead of $mod = $num - ($mult * 36) you could simply use the modulo operator: $mod = $num % 36.



          Instead of hard-coding 36, which has to be the length of $letters,
          it would be better to store it in a variable, and derive its value from the length of $letters.
          So that if one day you change the alphabet (for example add capital letters too),
          then you won't have to remember to replace all the 36 with the new value,
          the program will "just work".



          Instead of implementing base-36 encoding yourself,
          you could use the existing base_convert function.
          It's not 100% the same, because it uses slightly different alphabet,
          and you would need to convert the number parameter to string.






          share|improve this answer

















          • 1




            This is a nitpicky comment but I'm making it to clear up my (mis)understanding... I believe NTP won't repeat a millisecond time. Instead, I think it slowly adjusts to computer clock to the "correct" time such that the values are always monotonic. There's also monatomic clock to avoid time zone changes. As a side note, you would have to ensure that your generator function takes longer than the granularity of clock. Please correct me if I am wrong
            – sudo rm -rf slash
            7 hours ago











          • @sudorm-rfslash I think that's a very valuable comment, thank you!
            – janos
            7 hours ago

















          up vote
          1
          down vote













          I would rather use the integrated md5 funcion that should be faster.
          In case it s too long. The string could be shorten.
          By the end. There s a probability that the same string is to be generated twice... so bigger is the string and lower is that probability.






          share|improve this answer





















          • I considered md5, but I wanted zero chance that there could be duplicate strings.
            – Hoytman
            10 hours ago










          • I just updated the question to contain this criteria.
            – Hoytman
            10 hours ago










          • Could you not generate the md5, then check the site if the string was allocated before? You must store this somewhere...
            – Dean Meehan
            53 mins ago

















          up vote
          1
          down vote













          Don't go about reinventing-the-wheel. PHP already has a uniqid() function that does that. Like your function, uniqid() is just a way of encoding the time, with no randomness involved: the results will look dissimilar, but they are not cryptographically unguessable. It would probably be a good idea to call it with a server-specific prefix, if you are running the application across multiple servers, otherwise collisions would be possible. You might also want to incorporate the process ID somewhere, if you have multiple instances of the application that can run simultaneously on one server.



          If you truly want the string to be unique, then follow the wisdom of the standards-makers, and use a UUID. PHP does not have a UUID function, but Andrew Moore has written a recipe for generating standard UUIDs in the comments. (Unfortunately, standard UUIDs would be 36 characters long.)






          share|improve this answer





















          • I like the idea of including getmypid()
            – Hoytman
            3 hours ago










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200962%2ffunction-for-creating-unique-strings-for-urls%23new-answer', 'question_page');

          );

          Post as a guest






























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          4
          down vote













          Algorithm




          1. Every call must generate a unique string of alpha numeric characters.

          2. Most important: Zero chance of string duplication.



          "zero chance" is really hard to achieve, if it's even possible.
          Your algorithm is essentially based on the current time in microseconds.
          Since systems may adjust their clock, for example with NTP,
          the same microsecond may happen again,
          resulting in a non-unique string.



          Striving for a unique random string is known challenge,
          see for example the uniqid function.
          The documentation offers some recommendations for cryptographic security,
          and I think it might be a good idea to incorporate uniqid into your algorithm.




          1. Repeat calls should reflect a high degree of dissimilarity.



          If I call the function twice in a row I get values with a similar prefix, for example:




          qf1p1dojax7n7m08no1f
          qf1p1drbj0nbnahjqg1f

          gp70pxdfghnbnaetckpa
          gp70pxg8j0q4q3hlfcpa



          This is because in the microseconds parts are too similar in consecutive calls.
          You seem to be using the microseconds part as a 6-digit random number.
          You would get better results with a proper pseudo random generator,
          for example one of the recommendations from uniqid.



          Readability



          The implemented algorithm is quite straightforward:



          • Prepare 3 strings of at least 10-digits:

            • a 12-digit string from the first 6 decimal digits of the microseconds elapsed since the current timestamp seconds, concatenated with its reverse

            • a 10-digit string from the current timestamp

            • a 10-digit string from the reverse of the current timestamp


          • Count from 0 until 9 to generate a 20-letter string:

            • Create a 3-digit number using i-th positions from the prepared strings

            • Encode the 3-digit number as a base-36 number, so that it becomes a 2-letter string

            • Concatenate the 2-letter encoded strings to get a 20-letter string


          Unfortunately this flow of logic is not easy to see in the implementation,
          because the variable names are not helping.



          • Instead of the array $time_array,
            it would be better to use 3 variables with descriptive names.


          • If you extract the big logical steps to functions, for example one to generate a 3-digit number, and another to encode a 3-digit number in base-36, the code could read almost like a story.


          Technique



          Instead of $mod = $num - ($mult * 36) you could simply use the modulo operator: $mod = $num % 36.



          Instead of hard-coding 36, which has to be the length of $letters,
          it would be better to store it in a variable, and derive its value from the length of $letters.
          So that if one day you change the alphabet (for example add capital letters too),
          then you won't have to remember to replace all the 36 with the new value,
          the program will "just work".



          Instead of implementing base-36 encoding yourself,
          you could use the existing base_convert function.
          It's not 100% the same, because it uses slightly different alphabet,
          and you would need to convert the number parameter to string.






          share|improve this answer

















          • 1




            This is a nitpicky comment but I'm making it to clear up my (mis)understanding... I believe NTP won't repeat a millisecond time. Instead, I think it slowly adjusts to computer clock to the "correct" time such that the values are always monotonic. There's also monatomic clock to avoid time zone changes. As a side note, you would have to ensure that your generator function takes longer than the granularity of clock. Please correct me if I am wrong
            – sudo rm -rf slash
            7 hours ago











          • @sudorm-rfslash I think that's a very valuable comment, thank you!
            – janos
            7 hours ago














          up vote
          4
          down vote













          Algorithm




          1. Every call must generate a unique string of alpha numeric characters.

          2. Most important: Zero chance of string duplication.



          "zero chance" is really hard to achieve, if it's even possible.
          Your algorithm is essentially based on the current time in microseconds.
          Since systems may adjust their clock, for example with NTP,
          the same microsecond may happen again,
          resulting in a non-unique string.



          Striving for a unique random string is known challenge,
          see for example the uniqid function.
          The documentation offers some recommendations for cryptographic security,
          and I think it might be a good idea to incorporate uniqid into your algorithm.




          1. Repeat calls should reflect a high degree of dissimilarity.



          If I call the function twice in a row I get values with a similar prefix, for example:




          qf1p1dojax7n7m08no1f
          qf1p1drbj0nbnahjqg1f

          gp70pxdfghnbnaetckpa
          gp70pxg8j0q4q3hlfcpa



          This is because in the microseconds parts are too similar in consecutive calls.
          You seem to be using the microseconds part as a 6-digit random number.
          You would get better results with a proper pseudo random generator,
          for example one of the recommendations from uniqid.



          Readability



          The implemented algorithm is quite straightforward:



          • Prepare 3 strings of at least 10-digits:

            • a 12-digit string from the first 6 decimal digits of the microseconds elapsed since the current timestamp seconds, concatenated with its reverse

            • a 10-digit string from the current timestamp

            • a 10-digit string from the reverse of the current timestamp


          • Count from 0 until 9 to generate a 20-letter string:

            • Create a 3-digit number using i-th positions from the prepared strings

            • Encode the 3-digit number as a base-36 number, so that it becomes a 2-letter string

            • Concatenate the 2-letter encoded strings to get a 20-letter string


          Unfortunately this flow of logic is not easy to see in the implementation,
          because the variable names are not helping.



          • Instead of the array $time_array,
            it would be better to use 3 variables with descriptive names.


          • If you extract the big logical steps to functions, for example one to generate a 3-digit number, and another to encode a 3-digit number in base-36, the code could read almost like a story.


          Technique



          Instead of $mod = $num - ($mult * 36) you could simply use the modulo operator: $mod = $num % 36.



          Instead of hard-coding 36, which has to be the length of $letters,
          it would be better to store it in a variable, and derive its value from the length of $letters.
          So that if one day you change the alphabet (for example add capital letters too),
          then you won't have to remember to replace all the 36 with the new value,
          the program will "just work".



          Instead of implementing base-36 encoding yourself,
          you could use the existing base_convert function.
          It's not 100% the same, because it uses slightly different alphabet,
          and you would need to convert the number parameter to string.






          share|improve this answer

















          • 1




            This is a nitpicky comment but I'm making it to clear up my (mis)understanding... I believe NTP won't repeat a millisecond time. Instead, I think it slowly adjusts to computer clock to the "correct" time such that the values are always monotonic. There's also monatomic clock to avoid time zone changes. As a side note, you would have to ensure that your generator function takes longer than the granularity of clock. Please correct me if I am wrong
            – sudo rm -rf slash
            7 hours ago











          • @sudorm-rfslash I think that's a very valuable comment, thank you!
            – janos
            7 hours ago












          up vote
          4
          down vote










          up vote
          4
          down vote









          Algorithm




          1. Every call must generate a unique string of alpha numeric characters.

          2. Most important: Zero chance of string duplication.



          "zero chance" is really hard to achieve, if it's even possible.
          Your algorithm is essentially based on the current time in microseconds.
          Since systems may adjust their clock, for example with NTP,
          the same microsecond may happen again,
          resulting in a non-unique string.



          Striving for a unique random string is known challenge,
          see for example the uniqid function.
          The documentation offers some recommendations for cryptographic security,
          and I think it might be a good idea to incorporate uniqid into your algorithm.




          1. Repeat calls should reflect a high degree of dissimilarity.



          If I call the function twice in a row I get values with a similar prefix, for example:




          qf1p1dojax7n7m08no1f
          qf1p1drbj0nbnahjqg1f

          gp70pxdfghnbnaetckpa
          gp70pxg8j0q4q3hlfcpa



          This is because in the microseconds parts are too similar in consecutive calls.
          You seem to be using the microseconds part as a 6-digit random number.
          You would get better results with a proper pseudo random generator,
          for example one of the recommendations from uniqid.



          Readability



          The implemented algorithm is quite straightforward:



          • Prepare 3 strings of at least 10-digits:

            • a 12-digit string from the first 6 decimal digits of the microseconds elapsed since the current timestamp seconds, concatenated with its reverse

            • a 10-digit string from the current timestamp

            • a 10-digit string from the reverse of the current timestamp


          • Count from 0 until 9 to generate a 20-letter string:

            • Create a 3-digit number using i-th positions from the prepared strings

            • Encode the 3-digit number as a base-36 number, so that it becomes a 2-letter string

            • Concatenate the 2-letter encoded strings to get a 20-letter string


          Unfortunately this flow of logic is not easy to see in the implementation,
          because the variable names are not helping.



          • Instead of the array $time_array,
            it would be better to use 3 variables with descriptive names.


          • If you extract the big logical steps to functions, for example one to generate a 3-digit number, and another to encode a 3-digit number in base-36, the code could read almost like a story.


          Technique



          Instead of $mod = $num - ($mult * 36) you could simply use the modulo operator: $mod = $num % 36.



          Instead of hard-coding 36, which has to be the length of $letters,
          it would be better to store it in a variable, and derive its value from the length of $letters.
          So that if one day you change the alphabet (for example add capital letters too),
          then you won't have to remember to replace all the 36 with the new value,
          the program will "just work".



          Instead of implementing base-36 encoding yourself,
          you could use the existing base_convert function.
          It's not 100% the same, because it uses slightly different alphabet,
          and you would need to convert the number parameter to string.






          share|improve this answer













          Algorithm




          1. Every call must generate a unique string of alpha numeric characters.

          2. Most important: Zero chance of string duplication.



          "zero chance" is really hard to achieve, if it's even possible.
          Your algorithm is essentially based on the current time in microseconds.
          Since systems may adjust their clock, for example with NTP,
          the same microsecond may happen again,
          resulting in a non-unique string.



          Striving for a unique random string is known challenge,
          see for example the uniqid function.
          The documentation offers some recommendations for cryptographic security,
          and I think it might be a good idea to incorporate uniqid into your algorithm.




          1. Repeat calls should reflect a high degree of dissimilarity.



          If I call the function twice in a row I get values with a similar prefix, for example:




          qf1p1dojax7n7m08no1f
          qf1p1drbj0nbnahjqg1f

          gp70pxdfghnbnaetckpa
          gp70pxg8j0q4q3hlfcpa



          This is because in the microseconds parts are too similar in consecutive calls.
          You seem to be using the microseconds part as a 6-digit random number.
          You would get better results with a proper pseudo random generator,
          for example one of the recommendations from uniqid.



          Readability



          The implemented algorithm is quite straightforward:



          • Prepare 3 strings of at least 10-digits:

            • a 12-digit string from the first 6 decimal digits of the microseconds elapsed since the current timestamp seconds, concatenated with its reverse

            • a 10-digit string from the current timestamp

            • a 10-digit string from the reverse of the current timestamp


          • Count from 0 until 9 to generate a 20-letter string:

            • Create a 3-digit number using i-th positions from the prepared strings

            • Encode the 3-digit number as a base-36 number, so that it becomes a 2-letter string

            • Concatenate the 2-letter encoded strings to get a 20-letter string


          Unfortunately this flow of logic is not easy to see in the implementation,
          because the variable names are not helping.



          • Instead of the array $time_array,
            it would be better to use 3 variables with descriptive names.


          • If you extract the big logical steps to functions, for example one to generate a 3-digit number, and another to encode a 3-digit number in base-36, the code could read almost like a story.


          Technique



          Instead of $mod = $num - ($mult * 36) you could simply use the modulo operator: $mod = $num % 36.



          Instead of hard-coding 36, which has to be the length of $letters,
          it would be better to store it in a variable, and derive its value from the length of $letters.
          So that if one day you change the alphabet (for example add capital letters too),
          then you won't have to remember to replace all the 36 with the new value,
          the program will "just work".



          Instead of implementing base-36 encoding yourself,
          you could use the existing base_convert function.
          It's not 100% the same, because it uses slightly different alphabet,
          and you would need to convert the number parameter to string.







          share|improve this answer













          share|improve this answer



          share|improve this answer











          answered 9 hours ago









          janos

          94.9k12119338




          94.9k12119338







          • 1




            This is a nitpicky comment but I'm making it to clear up my (mis)understanding... I believe NTP won't repeat a millisecond time. Instead, I think it slowly adjusts to computer clock to the "correct" time such that the values are always monotonic. There's also monatomic clock to avoid time zone changes. As a side note, you would have to ensure that your generator function takes longer than the granularity of clock. Please correct me if I am wrong
            – sudo rm -rf slash
            7 hours ago











          • @sudorm-rfslash I think that's a very valuable comment, thank you!
            – janos
            7 hours ago












          • 1




            This is a nitpicky comment but I'm making it to clear up my (mis)understanding... I believe NTP won't repeat a millisecond time. Instead, I think it slowly adjusts to computer clock to the "correct" time such that the values are always monotonic. There's also monatomic clock to avoid time zone changes. As a side note, you would have to ensure that your generator function takes longer than the granularity of clock. Please correct me if I am wrong
            – sudo rm -rf slash
            7 hours ago











          • @sudorm-rfslash I think that's a very valuable comment, thank you!
            – janos
            7 hours ago







          1




          1




          This is a nitpicky comment but I'm making it to clear up my (mis)understanding... I believe NTP won't repeat a millisecond time. Instead, I think it slowly adjusts to computer clock to the "correct" time such that the values are always monotonic. There's also monatomic clock to avoid time zone changes. As a side note, you would have to ensure that your generator function takes longer than the granularity of clock. Please correct me if I am wrong
          – sudo rm -rf slash
          7 hours ago





          This is a nitpicky comment but I'm making it to clear up my (mis)understanding... I believe NTP won't repeat a millisecond time. Instead, I think it slowly adjusts to computer clock to the "correct" time such that the values are always monotonic. There's also monatomic clock to avoid time zone changes. As a side note, you would have to ensure that your generator function takes longer than the granularity of clock. Please correct me if I am wrong
          – sudo rm -rf slash
          7 hours ago













          @sudorm-rfslash I think that's a very valuable comment, thank you!
          – janos
          7 hours ago




          @sudorm-rfslash I think that's a very valuable comment, thank you!
          – janos
          7 hours ago












          up vote
          1
          down vote













          I would rather use the integrated md5 funcion that should be faster.
          In case it s too long. The string could be shorten.
          By the end. There s a probability that the same string is to be generated twice... so bigger is the string and lower is that probability.






          share|improve this answer





















          • I considered md5, but I wanted zero chance that there could be duplicate strings.
            – Hoytman
            10 hours ago










          • I just updated the question to contain this criteria.
            – Hoytman
            10 hours ago










          • Could you not generate the md5, then check the site if the string was allocated before? You must store this somewhere...
            – Dean Meehan
            53 mins ago














          up vote
          1
          down vote













          I would rather use the integrated md5 funcion that should be faster.
          In case it s too long. The string could be shorten.
          By the end. There s a probability that the same string is to be generated twice... so bigger is the string and lower is that probability.






          share|improve this answer





















          • I considered md5, but I wanted zero chance that there could be duplicate strings.
            – Hoytman
            10 hours ago










          • I just updated the question to contain this criteria.
            – Hoytman
            10 hours ago










          • Could you not generate the md5, then check the site if the string was allocated before? You must store this somewhere...
            – Dean Meehan
            53 mins ago












          up vote
          1
          down vote










          up vote
          1
          down vote









          I would rather use the integrated md5 funcion that should be faster.
          In case it s too long. The string could be shorten.
          By the end. There s a probability that the same string is to be generated twice... so bigger is the string and lower is that probability.






          share|improve this answer













          I would rather use the integrated md5 funcion that should be faster.
          In case it s too long. The string could be shorten.
          By the end. There s a probability that the same string is to be generated twice... so bigger is the string and lower is that probability.







          share|improve this answer













          share|improve this answer



          share|improve this answer











          answered 10 hours ago









          user9181232

          111




          111











          • I considered md5, but I wanted zero chance that there could be duplicate strings.
            – Hoytman
            10 hours ago










          • I just updated the question to contain this criteria.
            – Hoytman
            10 hours ago










          • Could you not generate the md5, then check the site if the string was allocated before? You must store this somewhere...
            – Dean Meehan
            53 mins ago
















          • I considered md5, but I wanted zero chance that there could be duplicate strings.
            – Hoytman
            10 hours ago










          • I just updated the question to contain this criteria.
            – Hoytman
            10 hours ago










          • Could you not generate the md5, then check the site if the string was allocated before? You must store this somewhere...
            – Dean Meehan
            53 mins ago















          I considered md5, but I wanted zero chance that there could be duplicate strings.
          – Hoytman
          10 hours ago




          I considered md5, but I wanted zero chance that there could be duplicate strings.
          – Hoytman
          10 hours ago












          I just updated the question to contain this criteria.
          – Hoytman
          10 hours ago




          I just updated the question to contain this criteria.
          – Hoytman
          10 hours ago












          Could you not generate the md5, then check the site if the string was allocated before? You must store this somewhere...
          – Dean Meehan
          53 mins ago




          Could you not generate the md5, then check the site if the string was allocated before? You must store this somewhere...
          – Dean Meehan
          53 mins ago










          up vote
          1
          down vote













          Don't go about reinventing-the-wheel. PHP already has a uniqid() function that does that. Like your function, uniqid() is just a way of encoding the time, with no randomness involved: the results will look dissimilar, but they are not cryptographically unguessable. It would probably be a good idea to call it with a server-specific prefix, if you are running the application across multiple servers, otherwise collisions would be possible. You might also want to incorporate the process ID somewhere, if you have multiple instances of the application that can run simultaneously on one server.



          If you truly want the string to be unique, then follow the wisdom of the standards-makers, and use a UUID. PHP does not have a UUID function, but Andrew Moore has written a recipe for generating standard UUIDs in the comments. (Unfortunately, standard UUIDs would be 36 characters long.)






          share|improve this answer





















          • I like the idea of including getmypid()
            – Hoytman
            3 hours ago














          up vote
          1
          down vote













          Don't go about reinventing-the-wheel. PHP already has a uniqid() function that does that. Like your function, uniqid() is just a way of encoding the time, with no randomness involved: the results will look dissimilar, but they are not cryptographically unguessable. It would probably be a good idea to call it with a server-specific prefix, if you are running the application across multiple servers, otherwise collisions would be possible. You might also want to incorporate the process ID somewhere, if you have multiple instances of the application that can run simultaneously on one server.



          If you truly want the string to be unique, then follow the wisdom of the standards-makers, and use a UUID. PHP does not have a UUID function, but Andrew Moore has written a recipe for generating standard UUIDs in the comments. (Unfortunately, standard UUIDs would be 36 characters long.)






          share|improve this answer





















          • I like the idea of including getmypid()
            – Hoytman
            3 hours ago












          up vote
          1
          down vote










          up vote
          1
          down vote









          Don't go about reinventing-the-wheel. PHP already has a uniqid() function that does that. Like your function, uniqid() is just a way of encoding the time, with no randomness involved: the results will look dissimilar, but they are not cryptographically unguessable. It would probably be a good idea to call it with a server-specific prefix, if you are running the application across multiple servers, otherwise collisions would be possible. You might also want to incorporate the process ID somewhere, if you have multiple instances of the application that can run simultaneously on one server.



          If you truly want the string to be unique, then follow the wisdom of the standards-makers, and use a UUID. PHP does not have a UUID function, but Andrew Moore has written a recipe for generating standard UUIDs in the comments. (Unfortunately, standard UUIDs would be 36 characters long.)






          share|improve this answer













          Don't go about reinventing-the-wheel. PHP already has a uniqid() function that does that. Like your function, uniqid() is just a way of encoding the time, with no randomness involved: the results will look dissimilar, but they are not cryptographically unguessable. It would probably be a good idea to call it with a server-specific prefix, if you are running the application across multiple servers, otherwise collisions would be possible. You might also want to incorporate the process ID somewhere, if you have multiple instances of the application that can run simultaneously on one server.



          If you truly want the string to be unique, then follow the wisdom of the standards-makers, and use a UUID. PHP does not have a UUID function, but Andrew Moore has written a recipe for generating standard UUIDs in the comments. (Unfortunately, standard UUIDs would be 36 characters long.)







          share|improve this answer













          share|improve this answer



          share|improve this answer











          answered 5 hours ago









          200_success

          123k14143398




          123k14143398











          • I like the idea of including getmypid()
            – Hoytman
            3 hours ago
















          • I like the idea of including getmypid()
            – Hoytman
            3 hours ago















          I like the idea of including getmypid()
          – Hoytman
          3 hours ago




          I like the idea of including getmypid()
          – Hoytman
          3 hours ago












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200962%2ffunction-for-creating-unique-strings-for-urls%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          pylint3 and pip3 broken

          Missing snmpget and snmpwalk

          How to enroll fingerprints to Ubuntu 17.10 with VFS491