Skip to content

PE 0006: Sum Square Difference

The sum of the squares of the first ten natural numbers is,

12+22+...+102=385

The square of the sum of the first ten natural numbers is,

(1+2+...+10)2=552=3025

Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025385=2640.

Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.


A Simple Aggregate

If n is small then we don't have to fuss much about it:

python
def SqSumSumSqDifference(n):
  sumsquare = sum(x**2 for x in range(1,n+1))
  squaresum = sum(list(range(1,n+1)))**2
  return squaresum - sumsquare

Even if n is in the millions this is computable in milliseconds.

Algebraic Simplification

The simple solution does a lot of adding of intermediate values before arriving at an answer. We can get the answer faster just like we did for the algebraic series in problem 1, bypassing the steps in between by deriving a direct computation of the answer.

With the algebraic series we found in is n(n+1)/2, a polynomial with an n2 term. By intuition, if we're summing a series of i2 terms we will need a polynomial of the next higher degree, i.e. having an n3 term.

TODO elaborate

We can determine what these values are for n{0,1,2,3} :

TODO equation table for [a,b,c,d]=f(n)

Thus we find the values for these four parameters:

a=1/3b=1/2c=1/6d=0

or f(n)=(n/6)(2n+1)(n+1) .

c
int ssss_difference(int n) {
	int sum = (n*n + n) / 2
	int sumsq = n * (2*n + 1) * (n + 1) / 6

	return sum*sum - sumsq
}

Prove it

By induction, if we show that this direct computation holds for f(1) and that when it holds for f(n) it must also hold for f(n+1), then we've proven that f(n) always holds.

The first part is easy because a+b+c+d=1 was part of our original system of equations, and adding 1/3, 1/2, 1/6 (and 0) does indeed equal 1.

The second part follows from the relation between the function for f(n) and f(n+1), they will always be separated by exactly (n+1)2, by definition. Solving for this equality, we find:

f(n+1)=f(n)+(n+1)2n3/3+3n2/2+13n/6+1=n/6+n2/2+n3/3+n2+2n+1equal

TODO the above could have its simplification steps animated

No-difference solution

A persistent issue with the previous solution, though it saves us a lot of compute it doesn't avoid a numeric overflow for standard integer types (in most languages[1] ). If the sum2 term exceeds the maximum int value then it will overflow before subtracting the sum-of-squares term, resulting in an incorrect answer.

We can address this, up to a point, but eventually even the correct answer will overflow an int range and we'll need to use a Big-Integer type. Fortunately, most languages have a Big-Int type, but I'll bring it up later with a problem that actually needs to use one.

We might initially try using a signed value, store the smaller (sum of squares) value as its negative, and then add the larger (square of sum) value. But doing that also limits our range to 263 instead of 264 (or 31 and 32 on smaller platforms). Does this make a difference?

TODO illustration

That's billions of correct answers that we lose the ability to represent. Instead, we can combine the two together by adding them symbolically and computing them together, instead of computing them separately and taking the difference.

TODO algebra

The resulting code is not much more complicated than the simplest implementation

go

See Also

Algebraic series (related also to problem 0001)

Solving a system of four equations

Exercises

  1. The derivation above jumps directly to a degree-3 ( n3 ) polynomial based on intuition from the n\rarrown2 representation. Show that an n2 closed-form representation would not be sufficient.

  2. If we had guessed instead a degree-4 ( n4 ) representation, what would be the initial system of five equations? Does deriving this give us a different answer? (Look into null spaces for more information).

  3. For what range of 32-bit integers does the simple solution overflow while the no-difference version give the correct answer? For 64-bit integers? You can try figuring this out algebraicly or by writing some test code.

  4. Reordering the operations of ssss_difference in the second code example will lead to incorrect answers (e.g. 2794 for the example input of 10). Why does it only work with the / 6 evaluated at the end? (hint: JavaScript is an exception on this one).

Footnotes


  1. Python is a notable exception to this, its default integer type is a bigint (it also has intern-ed representations for the first 100 or so numbers). While other languages often include a BigInt type, they're rarely the default. ↩︎