## 2D Orthogonal projection of vector onto line with numpy yields wrong result

Question

I have 350 document scores that, when I plot them, have this shape:

``````docScores = [(0, 68.62998962), (1, 60.21374512), (2, 54.72480392),
(3, 50.71389389), (4, 49.39723969), ...,
(345, 28.3756237), (346, 28.37126923),
(347, 28.36397934), (348, 28.35762787), (349, 28.34219933)]
``````

I posted the complete array here on `pastebin` (it corresponds to the `dataPoints` list on the code below).

Now, I originally needed to find the `elbow point` of this `L-shape` curve, which I found thanks to this post.

Now, on the following plot, the red vector `p` represents the elbow point. I would like to find the point `x=(?,?)` (the yellow star) on the vector `b` which corresponds to the orthogonal projection of `p` onto `b`.

The red point on the plot is the one I obtain (which is obviously wrong). I obtain it doing the following:

``````b_hat = b / np.linalg.norm(b)    #unit vector of b
proj_p_onto_b = p.dot(b_hat)*b_hat
red_point = proj_p_onto_b + s
``````

Now, if the projection of `p` onto `b` is defined by the its starting and ending point, namely `s` and `x` (the yellow star), it follows that `proj_p_onto_b = x - s`, therefore `x = proj_p_onto_b + s` ?

Did I make a mistake here ?

EDIT : In answer to @cxw, here is the code for computing the elbow point :

``````def findElbowPoint(self, rawDocScores):
dataPoints = zip(range(0, len(rawDocScores)), rawDocScores)
s = np.array(dataPoints[0])
l = np.array(dataPoints[len(dataPoints)-1])
b_vect = l-s
b_hat = b_vect/np.linalg.norm(b_vect)
distances = []
for scoreVec in dataPoints[1:]:
p = np.array(scoreVec) - s
proj = p.dot(b_hat)*b_hat
d = abs(np.linalg.norm(p - proj)) # orthgonal distance between b and the L-curve
distances.append((scoreVec[0], scoreVec[1], proj, d))

elbow_x = max(distances, key=itemgetter(3))[0]
elbow_y = max(distances, key=itemgetter(3))[1]
proj = max(distances, key=itemgetter(3))[2]
max_distance = max(distances, key=itemgetter(3))[3]

red_point = proj + s
``````

EDIT : Here is the code for the plot :

``````>>> l_curve_x_values = [x[0] for x in docScores]
>>> l_curve_y_values = [x[1] for x in docScores]
>>> b_line_x_values = [x[0] for x in docScores]
>>> b_line_y_values = np.linspace(s[1], l[1], len(docScores))
>>> p_line_x_values = l_curve_x_values[:elbow_x]
>>> p_line_y_values = np.linspace(s[1], elbow_y, elbow_x)
>>> plt.plot(l_curve_x_values, l_curve_y_values, b_line_x_values, b_line_y_values, p_line_x_values, p_line_y_values)
>>> red_point = proj + s
>>> plt.plot(red_point[0], red_point[1], 'ro')
>>> plt.show()
``````

Show source

## Answers to 2D Orthogonal projection of vector onto line with numpy yields wrong result ( 2 )

1. First of all, is the point at ~(50, 37) `p` or `s+p`? If `p`, that might be your problem right there! If the Y component of your `p` variable is positive, you won't get the results you expect when you do the dot product.

Assuming that point is `s+p`, if a bit of Post-It scribbling is correct,

``````p_len = np.linalg.norm(p)
p_hat = p / p_len
red_len = p_hat.dot(b_hat) * p_len   # red_len = |x-s|
# because p_hat . b_hat = 1 * 1 * cos(angle) = |x-s| / |p|
red_point = s + red_len * b_hat
``````

Not tested! YMMV. Hope this helps.

2. If you are using the plot to visually determine if the solution looks correct, you must plot the data using the same scale on each axis, i.e. use `plt.axis('equal')`. If the axes do not have equal scales, the angles between lines are distorted in the plot.