*172*

Table of Contents

Machine learning algorithms rely heavily on distance measures to make predictions. These algorithms fall under two main categories: classification and regression.

**Regression algorithms**analyze training data to assign weights to various features. This allows them to predict labels for new data.**Classification algorithms**distinguish between different objects. They achieve this by grouping test data based on its distance to the training data. A popular example is the K-Nearest Neighbors (KNN) algorithm.

**KNN and Distance Measures**

The KNN algorithm identifies the closest training data points to a test point and predicts the test point’s label based on their majority. Distance measures play a crucial role in calculating these distances.

**What is Manhattan Distance?**

Manhattan distance, also called Manhattan length, is a distance measure calculated by summing the absolute differences between corresponding coordinates of two points. Imagine a grid layout like city blocks in Manhattan. The Manhattan distance represents the shortest path you’d take to travel between two points on this grid, only moving horizontally or vertically.

Here’s the formula for calculating Manhattan distance in n dimensions:

Distance(x, y) = Σ( |x_i - y_i| ) = |x_2 - x_1| + |y_2 - y_1| + ... + |x_n - x_1| + |y_n - y_1|

where:

- x and y represent two points
- i iterates over all dimensions (1 to n)
- x_i and y_i represent corresponding coordinates in each dimension

**Advantages of Manhattan Distance**

**Effective for High-Dimensional Data:**Since it doesn’t involve squaring terms, Manhattan distance avoids amplifying the influence of any single feature. This makes it well-suited for datasets with many features.**Considers All Features:**Unlike some distance measures, Manhattan distance incorporates all features, preventing any from being ignored.

Manhattan distance is a valuable tool in machine learning, particularly for tasks involving high-dimensional data analysis.

## Code

```
def manhattan(train,test):
dist=[]
train=train.to_numpy()
for ind, r in test.iterrows():
r=r.to_numpy()
distance = np.abs(train - r).sum(-1)
idx = np.argpartition(distance, 10)
dist.append(idx[:10])
return dist
```