Home Blogs What is Manhattan Distance in machine learning?

What is Manhattan Distance in machine learning?

by anupmaurya
9 minutes read

Machine learning algorithms rely heavily on distance measures to make predictions. These algorithms fall under two main categories: classification and regression.

  • Regression algorithms analyze training data to assign weights to various features. This allows them to predict labels for new data.
  • Classification algorithms distinguish between different objects. They achieve this by grouping test data based on its distance to the training data. A popular example is the K-Nearest Neighbors (KNN) algorithm.

KNN and Distance Measures

The KNN algorithm identifies the closest training data points to a test point and predicts the test point’s label based on their majority. Distance measures play a crucial role in calculating these distances.

What is Manhattan Distance?

Manhattan distance, also called Manhattan length, is a distance measure calculated by summing the absolute differences between corresponding coordinates of two points. Imagine a grid layout like city blocks in Manhattan. The Manhattan distance represents the shortest path you’d take to travel between two points on this grid, only moving horizontally or vertically.

Here’s the formula for calculating Manhattan distance in n dimensions:

Distance(x, y) = Σ( |x_i - y_i| ) = |x_2 - x_1| + |y_2 - y_1| + ... + |x_n - x_1| + |y_n - y_1|

where:

  • x and y represent two points
  • i iterates over all dimensions (1 to n)
  • x_i and y_i represent corresponding coordinates in each dimension

Advantages of Manhattan Distance

  • Effective for High-Dimensional Data: Since it doesn’t involve squaring terms, Manhattan distance avoids amplifying the influence of any single feature. This makes it well-suited for datasets with many features.
  • Considers All Features: Unlike some distance measures, Manhattan distance incorporates all features, preventing any from being ignored.

Manhattan distance is a valuable tool in machine learning, particularly for tasks involving high-dimensional data analysis.

Code

def manhattan(train,test):
   
    dist=[]
    train=train.to_numpy()
    for ind, r in test.iterrows():
        r=r.to_numpy()
        distance = np.abs(train - r).sum(-1)


        idx = np.argpartition(distance, 10)
        dist.append(idx[:10])
    
    
    return dist

You may also like

Adblock Detected

Please support us by disabling your AdBlocker extension from your browsers for our website.