Consider a vector y containing the y values of some putative linear relation in the form of y = a x + b and matrices A and B:
A = ( x 1 1 x 2 1 . . . . . . x i 1 )
B = ( a b )
In very basic linear regression theory it is stated that when the residuals are r = A B y, then S S Q = r T r = ( A B y ) T ( A B y ), which is logic. However, then we can simplify this to y T y 2 B T A T y + B T A T A B. I don't really understand how we get this two times B transposed A transposed y, I would simply get: y T y B T A T y + A B y T + B T A T A B instead. When I tried to fill in some numbers for the matrices A B en vector y, I indeed find that A B y T equals A T B T y.
What is the reasoning/theory behind this?

f you use a linear regression, you only obtain an estimate of B. Let B ^ be this estimate. The vector of residuals is: r = y A B ^ .Using your notations, if A is an n × p matrix, A B is an n × 1 matrix. We also have that y T is a 1 × n matrix. This means that A B y T is an n × n matrix. Instead of + A B y T , you should have y T A B , which is a 1 × 1 matrix.You could then write: y T A B = ( y T A B ) T = B T A T y .

