PE 0006: Sum Square Difference
The sum of the squares of the first ten natural numbers is,
The square of the sum of the first ten natural numbers is,
Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is
Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.
A Simple Aggregate
If
def SqSumSumSqDifference(n):
sumsquare = sum(x**2 for x in range(1,n+1))
squaresum = sum(list(range(1,n+1)))**2
return squaresum - sumsquare
Even if
Algebraic Simplification
The simple solution does a lot of adding of intermediate values before arriving at an answer. We can get the answer faster just like we did for the algebraic series in problem 1, bypassing the steps in between by deriving a direct computation of the answer.
With the algebraic series we found
TODO elaborate
We can determine what these values are for
TODO equation table for [a,b,c,d]=f(n)
Thus we find the values for these four parameters:
or
int ssss_difference(int n) {
int sum = (n*n + n) / 2
int sumsq = n * (2*n + 1) * (n + 1) / 6
return sum*sum - sumsq
}
Prove it
By induction, if we show that this direct computation holds for
The first part is easy because
The second part follows from the relation between the function for
TODO the above could have its simplification steps animated
No-difference solution
A persistent issue with the previous solution, though it saves us a lot of compute it doesn't avoid a numeric overflow for standard integer types (in most languages[1] ). If the int
value then it will overflow before subtracting the sum-of-squares term, resulting in an incorrect answer.
We can address this, up to a point, but eventually even the correct answer will overflow an int range and we'll need to use a Big-Integer type. Fortunately, most languages have a Big-Int type, but I'll bring it up later with a problem that actually needs to use one.
We might initially try using a signed value, store the smaller (sum of squares) value as its negative, and then add the larger (square of sum) value. But doing that also limits our range to
TODO illustration
That's billions of correct answers that we lose the ability to represent. Instead, we can combine the two together by adding them symbolically and computing them together, instead of computing them separately and taking the difference.
TODO algebra
The resulting code is not much more complicated than the simplest implementation
See Also
Algebraic series (related also to problem 0001)
Solving a system of four equations
Exercises
The derivation above jumps directly to a degree-3 (
) polynomial based on intuition from the representation. Show that an closed-form representation would not be sufficient. If we had guessed instead a degree-4 (
) representation, what would be the initial system of five equations? Does deriving this give us a different answer? (Look into null spaces for more information). For what range of 32-bit integers does the simple solution overflow while the no-difference version give the correct answer? For 64-bit integers? You can try figuring this out algebraicly or by writing some test code.
Reordering the operations of
ssss_difference
in the second code example will lead to incorrect answers (e.g.for the example input of 10). Why does it only work with the / 6
evaluated at the end? (hint: JavaScript is an exception on this one).
Footnotes
Python is a notable exception to this, its default integer type is a bigint (it also has intern-ed representations for the first 100 or so numbers). While other languages often include a BigInt type, they're rarely the default. ↩︎