Working With Number – Infinity Multiplication – Beyond Memory – Ruby

In the previous chapter of “Infinity Multiplication”, we had discussed and demonstrated a productional version of InfiX that is capable of multiplying two large strings of digits together. However, the strings of digits were limited by memory. In a normal usage environment, the majority of us would not go beyond multiplying numbers that are beyond what memory could store. Take into account that it only require around 500 Mb of memory to multiply two strings that contained 100 millions digits per string.

That is for normal usage, but what if we want to multiply numbers that contained more than one billion digits together? What if we want to multiply the largest number the computer can ever multiply. To be able to multiply numbers that so large and is beyond the capability of what memory can handle, we first must find a solution to our formula so that our formula does not bound to memory limitation. This is chapter four of the “Infinity Multiplication” subject with the name “Beyond Memory”. This is the Ruby version.

Take note that the source code at the end of this article is a prototype. The program itself is capable of producing results for multiplying very large numbers together and the program does not require memory to operate. Nevertheless, the program does require disk space for storing the temporary files and the final result value. Although this version of InfiXF is a prototype, it is very powerful. I actually tested calculated a number with 25000 digits lengthwise multiply by another number that is 25000 digits lengthwise and verify the answer with the InfiX and I also verify the answer with other big numbers calculator. The next test for the prototype which I have not done as of right now is to calculate a billion digits multiply by another billion digits. Nevertheless, InfiXF prototype did go through some extensive testing for smaller numbers multiplication between the length of forty and 390 digits. Nevertheless, in theory, infiXF can actually calculate digits that can be 1/5 the size of the available space on the hard drive.

Beyond Memory – Reaching For Billion of Digits

When we are multiplying numbers that are beyond what memory can store, the first challenge we would have is how to store the numbers. Since the numbers are now larger than the capability of what memory can store, we can only store each number in a file. The second challenge we would face is that we can’t load the complete file into memory since they are larger than memory. In my formula, since we can’t load the entire file into memory we will read the numbers that are stored in multiple files piece by piece and process the multiplication formula for the numbers also piece by piece. In context, this formula would be the same formula as the previous chapter of “Infinity Multiplication”. Instead of working with the numbers that were stored in string format, we are now working with numbers that are stored in a file format.

When storing a number that can contain million of digits long in a file, the first issue we would have is the format of how the digits are stored. In my formula, the digits would be stored as a single straight line in ASCII code without any formatting characters. This formula also built for calculating the decimal placement for the numbers. A negative or positive sign can be appended to the beginning of the file.

When we are working in a file base system, we can’t simply trim off the positive or negative sign in the beginning of a file. Nor can we remove the binary block that held the decimal. Since we are reading the files piece by piece, if we keep the decimal and the negative or positive sign intact we would have to have to evaluate each block of digits that we are grabbing from the files. For this example let imagine that we are grabbing nine digits at a time from a file that name A. If we were to grab nine digits from the file and the block does contain the decimal, we would have to remove the decimal and get an extra digit to have consistency with the number of digits we are grabbing for each calculation in our formula. Besides that, we would also have to keep track of how many digits that are behind the decimal for our number in the event if there was a decimal in our number. Thus in my formula, before processing the files that contained the numbers we wanted to multiply, we first transfer the digits contents of the files into temporary files. In the temporary file, we would only store the digits without the decimal nor the plus or negative sign if there were any.

The first step for the procedure of this step would be to read the first character at the beginning of our A file to check whether if that position does indeed contain a negative or a plus sign. If the position does contain a negative or a plus sign our isaNeg variable would be set to true or a value of “1”. Since we read the first position already and the file is still open, our pointer would be moved to the second position in the file. If the file did contain a positive or a negative sign it would be ok to read the second position and start grabbing digits to transfer to our temporary file. Nevertheless, if the file did not contain a positive or a negative sign we would be missing out on a digit, thus we would have to reset the pointer back to the zero position of the file.

For the purpose of this prototype, I set the amount of position we are reading from A file was set at 50 in a variable with the named perReadTmp. This variable can be changed to read more or less. You can probably read more than 10,000 positions at a time. For the first time we read the file to get the digits out, we would have to remove any positive and plus sign from the string. We do not need to chomp the string if our file is only stored by our program. Nevertheless, when a file is edited with a text editor, a text editor would usually add a format character to the end of the line. Thus I would chomp the block of digits we read in the event where it is at the end of the line and there is a formatting character.

When we grab a block of digits from the A file, we would check to see if there is a decimal and the position of the decimal. If it is the block we found the decimal in, we would subtract the length of the block we read minus the decimal position then minus one to know how many digits are behind the decimal. We have to minus one because positioning in Ruby start at zero, so the length is always larger than the last position by one. If we has already found the decimal and there are more block of digits to read, the length of each additional block of digits would be added to the total of the amount of digit that we would require to have after the decimal in our result value. When we found the decimal we would also remove the decimal from the variable that currently held the value that we just read from file A.

After removing the negative sign or positive sign and or the decimal if there was in our block of digits, we would transfer only the digits to a temporary file. The name of the temporary file for our A file is defined by the atmp variable. This code block below demonstrates the above procedures. What procedures we were applying to file A will also apply to file B.

The first challenge we face for applying the multiplication formula of each block of digits in number A to each block of digits in number B in a text file format would be when we are writing the result to a file. When information is appended to a file, we can only append new information to the end of the file and not the beginning of the file. However, for multiplication, we are reading the string on the right side first. When translated to how we read from a file, we are also reading from the back of the file and calculating results for digits block from the end of the file to the beginning of the file. Therefore when calculating and producing a result value we would also produce the result value from the right to the left, which in term also mean that we would need to append new result value before the result value that already contained in the answer file.

For a multiplication procedure, the maximum digits in a result value can only be the total of length of number A plus the total length of number B. By knowing this we can pre-create an answer file full of zeroes that is as large as the total amount of digits in number A plus the total amount of digits in number B. I added some extra digits bases on the amount of digit we are calculating per cycle for the event where there are variations in the last block of digits. As similar to the previous chapter the pad0 variable would be required to make up for the missing length of our digits block for each calculation. The below code will create a temporary answer file full of zeroes.

Similar to the previous chapter of InfiX, we would first read from file B. We would read from the end of file B until we read the zero position in B file, which would be the beginning of the file. Since we are reading the file backward, we would need to keep track of our index count. We could use a wrapper library to also achieve reading the file backward. Nevertheless, I chose to not incorporate any additional library into the infiXF program. To get the position of where we are in the file, we first need to know many times the loop has executed. I started the loopidxB with the value one. This is because when we are getting the position to read from, we need to subtract B file length to the result value of the loop count times the amount of digit we are multiplying at a time. For this program, we are calculating nine digits at a time. Our first position we are reading from is not the end position of the file, but nine digits away from the end of the file. We would read from that position to the end of the file.

For the last block of digits that is at the beginning of the file, the number of digits can vary and may not match the total amount of digits we are working with at one time. Thus if our read position turned into a negative value, we would convert the value to zero. Since zero is when the loop will end, the loop will also exit after the zero block is calculated. To get the amount of digit out of B file when we are reading B file, we use the digit variable.

However, when our read position turned into a negative value we would be reset at zero. Which mean we would be grabbing more digits than needed. Therefore if our position turned into a negative we would have to add the negative value to the “digit” variable’s value. For example, if we were reading at position five, our read position would be a negative four. We have to change the negative value to zero so that our read does not go out of bound. Since we changed our read position to the zero position, if we were to grab nine digits from that position we would be grabbing four extra digits that we already grab from the previous loop. Nevertheless, if we take nine plus negative four, we would have a five value. That is the amount of digit we would need to grab. The code block below demonstrates the above procedures for Ruby.

Before we are moving into the procedure inside the loop for reading A file, these are the variables that need to be declared inside the loop that read B file but outside of the loop that read A file. Remainder and leftover have to be reset. Position read for A file and loop index count for A file would have to reset after each time we loop through the digits in A file.

The formula for getting the leftover value and the remainder value would still be the same as the previous chapters. The only difference this time is when we are adding values to our temporary answer file. To get the current value that already held inside our temporary answer file, we would apply the same method of grabbing the digits from B file. When we are writing the new result value to the temporary answer file, we would write the value to the exact spot where we just grabbed the value from the temporary answer file. Therefore our write and read position for our temporary answer file is the same.

To calculate the current position of where we are working within the temporary file, we add the total amount of how many times loop A and B have executed and minus the value to one. We have to minus one because both our loop A and B started at one, and we do not move over one until our loop A finished executing the first loop. The value of the previous step would then be multiply by the amount of digit the program was set to work with. We would then subtract the length of the temporary answer file to this value to get the position we are working with. The code block below demonstrates the above procedures.

After looping through the digits in A file, if there are any leftover or remainder we would add this value to the next position in our answer file. Because the final leftover and remainder would always be before the current values that are being held in the answer file. We do not need to get a value from the temporary answer file.

After finished calculating the multiplication for all the digits in A file multiple to all the digits in B file, we would have an answer value in our temporary answer file. Nevertheless, this answer value is not formatted and does not contain a decimal or a negative sign. Thus our next challenge would be to convert the value in our temporary answer file to the true result. Because we can’t simply inject a negative sign to the front of the file or inject a decimal in the middle of the file or remove trailing and leading zeroes, we would have to create a new file and append values from our temporary answer file. This new file I would call the final answer file.

The first character we would append to the final answer file is a negative sign if file A and file B is different in negative and positive base value. Secondly, we would add all the digits that are supposed to be before the decimal to our final answer file from our temporary answer file. To get the position for the amount of digit that supposed to be before the decimal, we would subtract the temporary answer file’s length to the amount of the total digits that we require to have after the decimal. For the demonstration purpose of this article, the amount of position we are reading each time for temporary reading was set at 50 in the perReadTmp variable.

When we are reading the block of digits from the temporary answer file, we will trim off leading zeroes for the first time we read. If after trimming off all the leading zeroes and the block we just read became zero in length, we would trim off leading zeroes for the next block we read. If after trimming off leading zeroes and the block we just read had a length that is greater than zero, we do not trim any more zeroes from subsequence blocks that we read. We would read the temporary answer file until we reach the digits before decimal position. This code block below demonstrate the above procedures.

After adding all the digits before the decimal from our temporary answer file to the final answer file and in the event where zeroes were the only digits before the decimal and we trimmed off all the zeroes, we would have either a negative sign in the final answer file or a zero length answer file. Therefore we would have to add a zero to the final answer file for formatting before we add the decimal to the final answer file.

When we are adding the decimal to the final answer file, we would only add the decimal if the amount of digit after the decimal is more than zero. Otherwise, the decimal would always be added to the final answer file.

After moving all the digits before the decimal from the temporary answer file to the final answer file. We would be moving the digits after the decimal from the temporary answer file to the final answer file. In the event where there are trailing zeroes after the decimal, we would have to remove the trailing zeroes. We can’t simply trim off the zeroes similar to when we were working with digits in string format, we can only truncate the file to the necessary length. To calculate how many zeroes are trailing, we calculate this from the first block we read.

For this example, the readOutputTmp variable is the variable where we hold the value of the current position we are reading from in the temporary answer file. We assign readOutputTmp variable value to our truncatestr variable, we then trim off zeroes in our truncatestr variable. We then minus the length of the readOutputTmp variable to the length of our truncatestr variable. This value is what I call the truncate different. If the truncate different value is equal to the length of the readOutputTmp str. That mean that the string only contained zeroes, therefore if there was a truncate different value from our previous read, we would add that value to the current truncate difference.

For any other circumstances, the truncate different value would be reset after each time we read. Thus we would only trim off the requires amount of zeroes in the final read. Nevertheless, if zeroes were the only digits after the decimal, they would all be truncate off from the final answer file. The truncate different value is first stored in the variable name truncatediff but would finally be stored in the variable name truncatelen. After we writing the value to our final answer file we have to close the file to get the length for our file so that we can truncate the file in the event where we are removing the trailing zeroes from our final answer file.

After getting the truncate length we would truncate the file by getting the current length of the final answer file minus the value that contained in the truncatelen variable. If after truncating off all the unnecessary zeroes and the last character of our final answer file is a decimal we would also truncate off the decimal.

The program below is the full working program. For demonstrating purpose, I also included a random generator that will generate two files. The first file is a.infiX and the second file is b.infiX. There will be a random amount of digit in these two files. Each file will contain a minimum of 40 digits to a maximum of 390 digits. The decimal will be injected into a random position for both files. The program will then calculate the multiplication result from the digits within the two files mentioned. The program will first produce a.infiX.tmp and b.infiX.tmp as temporary files for holding the digits of A and B. The program will also produce a result.infiX.tmp as a temporary answer file for the unprocessed result value. The final answer value will be available inside the result.infiX. All these files would be created inside the same folder as where the program executed.

It took a lot of time and dedication for developing the Infinity formula and the article. Please give respect to the copyright credit. This article is licensed under Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). InfiFX’s source code license is written as comments inside the source code.