How to find the three closest (nearest) values within a vector?Finding the two closest numbers in a list using sortingHow to find two closest (nearest) values within a vector in MATLAB?How to access the last value in a vector?Counting the number of elements with the values of x in a vectorMinimum distance between elements in two logical vectorsFind column value of second, third (etc) closest value in multiple other columnsThe index of second, third,.. min with apply functionFind nearest data from shapefileNearest neighbour vector matching without replacementIn R, sample from a neighborhood according to scoresr - Finding closest coordinates between two large data setsFinding the nearest neighbor in “i & j” coordinates in R based on lat/lon outputs

Infeasibility in mathematical optimization models

Can a fight scene, component-wise, be too complex and complicated?

What should I call bands of armed men in Medieval Times?

Withdrew when Jimmy met up with Heath

Are differences between uniformly distributed numbers uniformly distributed?

How to take the beginning and end parts of a list with simpler syntax?

Acceptable to cut steak before searing?

What does this double-treble double-bass staff mean?

On the Rømer experiments and the speed if light

Who are these characters/superheroes in the posters from Chris's room in Family Guy?

constant evaluation when using differential equations.

Plausibility of Ice Eaters in the Arctic

Multirow in tabularx?

In SQL Server, why does backward scan of clustered index cannot use parallelism?

How to avoid the "need" to learn more before conducting research?

How to mark beverage cans in a cooler for a blind person?

Wherein the Shatapatha Brahmana it was mentioned about 8.64 lakh alphabets in Vedas?

If "more guns less crime", how do gun advocates explain that the EU has less crime than the US?

A simple stop watch which I want to extend

Why are Gatwick's runways too close together?

I accidentally overwrote a Linux binary file

What are the conventions for transcribing Semitic languages into Greek?

How are you supposed to know the strumming pattern for a song from the "chord sheet music"?

What does Apple mean by "This may decrease battery life"?

How to find the three closest (nearest) values within a vector?

Finding the two closest numbers in a list using sortingHow to find two closest (nearest) values within a vector in MATLAB?How to access the last value in a vector?Counting the number of elements with the values of x in a vectorMinimum distance between elements in two logical vectorsFind column value of second, third (etc) closest value in multiple other columnsThe index of second, third,.. min with apply functionFind nearest data from shapefileNearest neighbour vector matching without replacementIn R, sample from a neighborhood according to scoresr - Finding closest coordinates between two large data setsFinding the nearest neighbor in “i & j” coordinates in R based on lat/lon outputs

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I would like to find out the three closest numbers in a vector.
Something like

v = c(10,23,25,26,38,50)
c = findClosest(v,3)
c
23 25 26

I tried with sort(colSums(as.matrix(dist(x))))[1:3], and kind of works, but it selects the three numbers with minimum overall distance not the three closest numbers.

There is already an answer for matlab, but I do not know how to translate it to R:

%finds the index with the minimal difference in A
minDiffInd = find(abs(diff(A))==min(abs(diff(A))));
%extract this index, and it's neighbor index from A
val1 = A(minDiffInd);
val2 = A(minDiffInd+1);

How to find two closest (nearest) values within a vector in MATLAB?

edited Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

asked Jul 31 at 8:24

Terry

564 bronze badges

2

If you replace find with which (and use [ for array/matrix indexing), the Matlab answer will work in R, but obviously only works to find the closest 2. Can you clarify what you mean exactly by "finding the closest values in a vector"? The matlab answer only works if the vector is sorted, is that a fair assumption? Your title says "two" but your example uses "3", which is it? A solution working for arbitrary n is much harder that one that only works for 2. The matlab answer does not extend to >2 numbers, is that why you're asking?

– antoine-sac
Jul 31 at 8:34

Hi, yes I fixed the title. In my case I need three. Following your suggestion I have adapted the code from MATLAB and it works, but it only finds the two closest numbers. How should I adapt it to find also the third? The vector can be sorted, it is just a group of numbers and I have to pick the three closer replicas.

– Terry
Jul 31 at 8:47

add a comment |

I would like to find out the three closest numbers in a vector.
Something like

v = c(10,23,25,26,38,50)
c = findClosest(v,3)
c
23 25 26

I tried with sort(colSums(as.matrix(dist(x))))[1:3], and kind of works, but it selects the three numbers with minimum overall distance not the three closest numbers.

There is already an answer for matlab, but I do not know how to translate it to R:

%finds the index with the minimal difference in A
minDiffInd = find(abs(diff(A))==min(abs(diff(A))));
%extract this index, and it's neighbor index from A
val1 = A(minDiffInd);
val2 = A(minDiffInd+1);

How to find two closest (nearest) values within a vector in MATLAB?

edited Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

asked Jul 31 at 8:24

Terry

564 bronze badges

2

If you replace find with which (and use [ for array/matrix indexing), the Matlab answer will work in R, but obviously only works to find the closest 2. Can you clarify what you mean exactly by "finding the closest values in a vector"? The matlab answer only works if the vector is sorted, is that a fair assumption? Your title says "two" but your example uses "3", which is it? A solution working for arbitrary n is much harder that one that only works for 2. The matlab answer does not extend to >2 numbers, is that why you're asking?

– antoine-sac
Jul 31 at 8:34

Hi, yes I fixed the title. In my case I need three. Following your suggestion I have adapted the code from MATLAB and it works, but it only finds the two closest numbers. How should I adapt it to find also the third? The vector can be sorted, it is just a group of numbers and I have to pick the three closer replicas.

– Terry
Jul 31 at 8:47

add a comment |

I would like to find out the three closest numbers in a vector.
Something like

v = c(10,23,25,26,38,50)
c = findClosest(v,3)
c
23 25 26

I tried with sort(colSums(as.matrix(dist(x))))[1:3], and kind of works, but it selects the three numbers with minimum overall distance not the three closest numbers.

There is already an answer for matlab, but I do not know how to translate it to R:

%finds the index with the minimal difference in A
minDiffInd = find(abs(diff(A))==min(abs(diff(A))));
%extract this index, and it's neighbor index from A
val1 = A(minDiffInd);
val2 = A(minDiffInd+1);

How to find two closest (nearest) values within a vector in MATLAB?

edited Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

asked Jul 31 at 8:24

Terry

564 bronze badges

I would like to find out the three closest numbers in a vector.
Something like

v = c(10,23,25,26,38,50)
c = findClosest(v,3)
c
23 25 26

I tried with sort(colSums(as.matrix(dist(x))))[1:3], and kind of works, but it selects the three numbers with minimum overall distance not the three closest numbers.

There is already an answer for matlab, but I do not know how to translate it to R:

%finds the index with the minimal difference in A
minDiffInd = find(abs(diff(A))==min(abs(diff(A))));
%extract this index, and it's neighbor index from A
val1 = A(minDiffInd);
val2 = A(minDiffInd+1);

How to find two closest (nearest) values within a vector in MATLAB?

edited Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

asked Jul 31 at 8:24

Terry

564 bronze badges

edited Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

asked Jul 31 at 8:24

Terry

564 bronze badges

edited Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

edited Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

edited Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

asked Jul 31 at 8:24

Terry

564 bronze badges

asked Jul 31 at 8:24

Terry

564 bronze badges

asked Jul 31 at 8:24

Terry

564 bronze badges

2

If you replace find with which (and use [ for array/matrix indexing), the Matlab answer will work in R, but obviously only works to find the closest 2. Can you clarify what you mean exactly by "finding the closest values in a vector"? The matlab answer only works if the vector is sorted, is that a fair assumption? Your title says "two" but your example uses "3", which is it? A solution working for arbitrary n is much harder that one that only works for 2. The matlab answer does not extend to >2 numbers, is that why you're asking?

– antoine-sac
Jul 31 at 8:34

Hi, yes I fixed the title. In my case I need three. Following your suggestion I have adapted the code from MATLAB and it works, but it only finds the two closest numbers. How should I adapt it to find also the third? The vector can be sorted, it is just a group of numbers and I have to pick the three closer replicas.

– Terry
Jul 31 at 8:47

add a comment |

2

If you replace find with which (and use [ for array/matrix indexing), the Matlab answer will work in R, but obviously only works to find the closest 2. Can you clarify what you mean exactly by "finding the closest values in a vector"? The matlab answer only works if the vector is sorted, is that a fair assumption? Your title says "two" but your example uses "3", which is it? A solution working for arbitrary n is much harder that one that only works for 2. The matlab answer does not extend to >2 numbers, is that why you're asking?

– antoine-sac
Jul 31 at 8:34

Hi, yes I fixed the title. In my case I need three. Following your suggestion I have adapted the code from MATLAB and it works, but it only finds the two closest numbers. How should I adapt it to find also the third? The vector can be sorted, it is just a group of numbers and I have to pick the three closer replicas.

– Terry
Jul 31 at 8:47

If you replace find with which (and use [ for array/matrix indexing), the Matlab answer will work in R, but obviously only works to find the closest 2. Can you clarify what you mean exactly by "finding the closest values in a vector"? The matlab answer only works if the vector is sorted, is that a fair assumption? Your title says "two" but your example uses "3", which is it? A solution working for arbitrary n is much harder that one that only works for 2. The matlab answer does not extend to >2 numbers, is that why you're asking?

– antoine-sac
Jul 31 at 8:34

Hi, yes I fixed the title. In my case I need three. Following your suggestion I have adapted the code from MATLAB and it works, but it only finds the two closest numbers. How should I adapt it to find also the third? The vector can be sorted, it is just a group of numbers and I have to pick the three closer replicas.

– Terry
Jul 31 at 8:47

add a comment |

4 Answers
4

active

oldest

votes

My assumption is that the for the n nearest values, the only thing that matters is the difference between the v[i] - v[i - (n-1)]. That is, finding the minimum of diff(x, lag = n - 1L).

findClosest <- function(x, n) 
 x <- sort(x)
 x[seq.int(which.min(diff(x, lag = n - 1L)), length.out = n)]


findClosest(v, 3L)

[1] 23 25 26

edited Aug 1 at 3:00

answered Jul 31 at 10:14

Cole

2,3451 gold badge1 silver badge9 bronze badges

add a comment |

Let's define "nearest numbers" by "numbers with minimal sum of L1 distances". You can achieve what you want by a combination of diff and windowed sum.

You could write a much shorter function but I wrote it step by step to make it easier to follow.

v <- c(10,23,25,26,38,50)

#' Find the n nearest numbers in a vector
#'
#' @param v Numeric vector
#' @param n Number of nearest numbers to extract
#'
#' @details "Nearest numbers" defined as the numbers which minimise the
#' within-group sum of L1 distances.
#' 
findClosest <- function(v, n) 
 # Sort and remove NA
 v <- sort(v, na.last = NA)

 # Compute L1 distances between closest points. We know each point is next to
 # its closest neighbour since we sorted.
 delta <- diff(v)

 # Compute sum of L1 distances on a rolling window with n - 1 elements
 # Why n-1 ? Because we are looking at deltas and 2 deltas ~ 3 elements.
 withingroup_distances <- zoo::rollsum(delta, k = n - 1)

 # Now it's simply finding the group with minimum within-group sum
 # And working out the elements
 group_index <- which.min(withingroup_distances)
 element_indices <- group_index + 0:(n-1)

 v[element_indices]


findClosest(v, 2)
# 25 26
findClosest(v, 3)
# 23 25 26

answered Jul 31 at 8:47

antoine-sac

3,6792 gold badges15 silver badges45 bronze badges

Thanks, I have implemented it and it works great! Thanks also for the explanation, I understood the logic behind it.

– Terry
Jul 31 at 10:50

1

Interestingly, this solution can very easily be extended to use another norm such as L2 instead of L1, if you want to penalise larger gaps more. For example, (10,20,30) and (50,55,70) are equally near according to L1 (10+10=5+15) but the first group is better according to L2 (10^2+10^2 < 5^2+15^2).

– antoine-sac
Aug 1 at 7:04

Very interesting. Actually, I think I am gonna give it a try since I would like to find the three numbers with minimum variance between them. The L1 does not allow them, L2 instead would allow me to select the group with minimum variance. Thanks very much!

– Terry
Aug 1 at 16:24

1

You're welcome, you just have to use delta^2 in the rollsum

– antoine-sac
Aug 1 at 16:29

add a comment |

An idea is to use zoo library to do a rolling operation, i.e.

library(zoo)
m1 <- rollapply(v, 3, by = 1, function(i)c(sum(diff(i)), c(i)))
m1[which.min(m1[, 1]),][-1]
#[1] 23 25 26

Or make it into a function,

findClosest <- function(vec, n) 
 require(zoo)
 vec1 <- sort(vec)
 m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i)))
 return(m1[which.min(m1[, 1]),][-1])


findClosest(v, 3)
#[1] 23 25 26

edited Jul 31 at 8:51

answered Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

add a comment |

A base R option, idea being we first sort the vector and subtract every ith element with i + n - 1 element in the sorted vector and select the group which has minimum difference.

closest_n_vectors <- function(v, n) 
 v1 <- sort(v)
 inds <- which.min(sapply(head(seq_along(v1), -(n - 1)), function(x) 
 v1[x + n -1] - v1[x]))
 v1[inds: (inds + n - 1)]


closest_n_vectors(v, 3)
#[1] 23 25 26

closest_n_vectors(c(2, 10, 1, 20, 4, 5, 23), 2)
#[1] 1 2

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 2)
#[1] 65 67

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 3)
#[1] 1 19 23

In case of tie this will return the numbers with smallest value since we are using which.min.

BENCHMARKS

Since we have got quite a few answers, it is worth doing a benchmark of all the solutions till now

set.seed(1234)
x <- sample(100000000, 100000)

identical(findClosest_antoine(x, 3), findClosest_Sotos(x, 3), 
 closest_n_vectors_Ronak(x, 3), findClosest_Cole(x, 3))
#[1] TRUE

microbenchmark::microbenchmark(
 antoine = findClosest_antoine(x, 3),
 Sotos = findClosest_Sotos(x, 3), 
 Ronak = closest_n_vectors_Ronak(x, 3),
 Cole = findClosest_Cole(x, 3),
 times = 10
)



#Unit: milliseconds
# expr min lq mean median uq max neval cld
#antoine 148.751 159.071 163.298 162.581 167.365 181.314 10 b 
# Sotos 1086.098 1349.762 1372.232 1398.211 1453.217 1553.945 10 c
# Ronak 54.248 56.870 78.886 83.129 94.748 100.299 10 a 
# Cole 4.958 5.042 6.202 6.047 7.386 7.915 10 a

edited Jul 31 at 11:27

answered Jul 31 at 8:54

Ronak Shah

74k11 gold badges48 silver badges83 bronze badges

1

@Cole I am not sure about cld either but I have it in my output. Yes, @Rui's solution was not identical. I didn't check that earlier.

– Ronak Shah
Jul 31 at 11:18

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f57286328%2fhow-to-find-the-three-closest-nearest-values-within-a-vector%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

My assumption is that the for the n nearest values, the only thing that matters is the difference between the v[i] - v[i - (n-1)]. That is, finding the minimum of diff(x, lag = n - 1L).

findClosest <- function(x, n) 
 x <- sort(x)
 x[seq.int(which.min(diff(x, lag = n - 1L)), length.out = n)]


findClosest(v, 3L)

[1] 23 25 26

edited Aug 1 at 3:00

answered Jul 31 at 10:14

Cole

2,3451 gold badge1 silver badge9 bronze badges

add a comment |

My assumption is that the for the n nearest values, the only thing that matters is the difference between the v[i] - v[i - (n-1)]. That is, finding the minimum of diff(x, lag = n - 1L).

findClosest <- function(x, n) 
 x <- sort(x)
 x[seq.int(which.min(diff(x, lag = n - 1L)), length.out = n)]


findClosest(v, 3L)

[1] 23 25 26

edited Aug 1 at 3:00

answered Jul 31 at 10:14

Cole

2,3451 gold badge1 silver badge9 bronze badges

add a comment |

My assumption is that the for the n nearest values, the only thing that matters is the difference between the v[i] - v[i - (n-1)]. That is, finding the minimum of diff(x, lag = n - 1L).

findClosest <- function(x, n) 
 x <- sort(x)
 x[seq.int(which.min(diff(x, lag = n - 1L)), length.out = n)]


findClosest(v, 3L)

[1] 23 25 26

edited Aug 1 at 3:00

answered Jul 31 at 10:14

Cole

2,3451 gold badge1 silver badge9 bronze badges

My assumption is that the for the n nearest values, the only thing that matters is the difference between the v[i] - v[i - (n-1)]. That is, finding the minimum of diff(x, lag = n - 1L).

findClosest <- function(x, n) 
 x <- sort(x)
 x[seq.int(which.min(diff(x, lag = n - 1L)), length.out = n)]


findClosest(v, 3L)

[1] 23 25 26

edited Aug 1 at 3:00

answered Jul 31 at 10:14

Cole

2,3451 gold badge1 silver badge9 bronze badges

edited Aug 1 at 3:00

answered Jul 31 at 10:14

Cole

2,3451 gold badge1 silver badge9 bronze badges

answered Jul 31 at 10:14

Cole

2,3451 gold badge1 silver badge9 bronze badges

answered Jul 31 at 10:14

Cole

2,3451 gold badge1 silver badge9 bronze badges

add a comment |

Let's define "nearest numbers" by "numbers with minimal sum of L1 distances". You can achieve what you want by a combination of diff and windowed sum.

You could write a much shorter function but I wrote it step by step to make it easier to follow.

v <- c(10,23,25,26,38,50)

#' Find the n nearest numbers in a vector
#'
#' @param v Numeric vector
#' @param n Number of nearest numbers to extract
#'
#' @details "Nearest numbers" defined as the numbers which minimise the
#' within-group sum of L1 distances.
#' 
findClosest <- function(v, n) 
 # Sort and remove NA
 v <- sort(v, na.last = NA)

 # Compute L1 distances between closest points. We know each point is next to
 # its closest neighbour since we sorted.
 delta <- diff(v)

 # Compute sum of L1 distances on a rolling window with n - 1 elements
 # Why n-1 ? Because we are looking at deltas and 2 deltas ~ 3 elements.
 withingroup_distances <- zoo::rollsum(delta, k = n - 1)

 # Now it's simply finding the group with minimum within-group sum
 # And working out the elements
 group_index <- which.min(withingroup_distances)
 element_indices <- group_index + 0:(n-1)

 v[element_indices]


findClosest(v, 2)
# 25 26
findClosest(v, 3)
# 23 25 26

answered Jul 31 at 8:47

antoine-sac

3,6792 gold badges15 silver badges45 bronze badges

Thanks, I have implemented it and it works great! Thanks also for the explanation, I understood the logic behind it.

– Terry
Jul 31 at 10:50

1

Interestingly, this solution can very easily be extended to use another norm such as L2 instead of L1, if you want to penalise larger gaps more. For example, (10,20,30) and (50,55,70) are equally near according to L1 (10+10=5+15) but the first group is better according to L2 (10^2+10^2 < 5^2+15^2).

– antoine-sac
Aug 1 at 7:04

Very interesting. Actually, I think I am gonna give it a try since I would like to find the three numbers with minimum variance between them. The L1 does not allow them, L2 instead would allow me to select the group with minimum variance. Thanks very much!

– Terry
Aug 1 at 16:24

1

You're welcome, you just have to use delta^2 in the rollsum

– antoine-sac
Aug 1 at 16:29

add a comment |

Let's define "nearest numbers" by "numbers with minimal sum of L1 distances". You can achieve what you want by a combination of diff and windowed sum.

You could write a much shorter function but I wrote it step by step to make it easier to follow.

v <- c(10,23,25,26,38,50)

#' Find the n nearest numbers in a vector
#'
#' @param v Numeric vector
#' @param n Number of nearest numbers to extract
#'
#' @details "Nearest numbers" defined as the numbers which minimise the
#' within-group sum of L1 distances.
#' 
findClosest <- function(v, n) 
 # Sort and remove NA
 v <- sort(v, na.last = NA)

 # Compute L1 distances between closest points. We know each point is next to
 # its closest neighbour since we sorted.
 delta <- diff(v)

 # Compute sum of L1 distances on a rolling window with n - 1 elements
 # Why n-1 ? Because we are looking at deltas and 2 deltas ~ 3 elements.
 withingroup_distances <- zoo::rollsum(delta, k = n - 1)

 # Now it's simply finding the group with minimum within-group sum
 # And working out the elements
 group_index <- which.min(withingroup_distances)
 element_indices <- group_index + 0:(n-1)

 v[element_indices]


findClosest(v, 2)
# 25 26
findClosest(v, 3)
# 23 25 26

answered Jul 31 at 8:47

antoine-sac

3,6792 gold badges15 silver badges45 bronze badges

Thanks, I have implemented it and it works great! Thanks also for the explanation, I understood the logic behind it.

– Terry
Jul 31 at 10:50

1

Interestingly, this solution can very easily be extended to use another norm such as L2 instead of L1, if you want to penalise larger gaps more. For example, (10,20,30) and (50,55,70) are equally near according to L1 (10+10=5+15) but the first group is better according to L2 (10^2+10^2 < 5^2+15^2).

– antoine-sac
Aug 1 at 7:04

Very interesting. Actually, I think I am gonna give it a try since I would like to find the three numbers with minimum variance between them. The L1 does not allow them, L2 instead would allow me to select the group with minimum variance. Thanks very much!

– Terry
Aug 1 at 16:24

1

You're welcome, you just have to use delta^2 in the rollsum

– antoine-sac
Aug 1 at 16:29

add a comment |

Let's define "nearest numbers" by "numbers with minimal sum of L1 distances". You can achieve what you want by a combination of diff and windowed sum.

You could write a much shorter function but I wrote it step by step to make it easier to follow.

v <- c(10,23,25,26,38,50)

#' Find the n nearest numbers in a vector
#'
#' @param v Numeric vector
#' @param n Number of nearest numbers to extract
#'
#' @details "Nearest numbers" defined as the numbers which minimise the
#' within-group sum of L1 distances.
#' 
findClosest <- function(v, n) 
 # Sort and remove NA
 v <- sort(v, na.last = NA)

 # Compute L1 distances between closest points. We know each point is next to
 # its closest neighbour since we sorted.
 delta <- diff(v)

 # Compute sum of L1 distances on a rolling window with n - 1 elements
 # Why n-1 ? Because we are looking at deltas and 2 deltas ~ 3 elements.
 withingroup_distances <- zoo::rollsum(delta, k = n - 1)

 # Now it's simply finding the group with minimum within-group sum
 # And working out the elements
 group_index <- which.min(withingroup_distances)
 element_indices <- group_index + 0:(n-1)

 v[element_indices]


findClosest(v, 2)
# 25 26
findClosest(v, 3)
# 23 25 26

answered Jul 31 at 8:47

antoine-sac

3,6792 gold badges15 silver badges45 bronze badges

Let's define "nearest numbers" by "numbers with minimal sum of L1 distances". You can achieve what you want by a combination of diff and windowed sum.

You could write a much shorter function but I wrote it step by step to make it easier to follow.

v <- c(10,23,25,26,38,50)

#' Find the n nearest numbers in a vector
#'
#' @param v Numeric vector
#' @param n Number of nearest numbers to extract
#'
#' @details "Nearest numbers" defined as the numbers which minimise the
#' within-group sum of L1 distances.
#' 
findClosest <- function(v, n) 
 # Sort and remove NA
 v <- sort(v, na.last = NA)

 # Compute L1 distances between closest points. We know each point is next to
 # its closest neighbour since we sorted.
 delta <- diff(v)

 # Compute sum of L1 distances on a rolling window with n - 1 elements
 # Why n-1 ? Because we are looking at deltas and 2 deltas ~ 3 elements.
 withingroup_distances <- zoo::rollsum(delta, k = n - 1)

 # Now it's simply finding the group with minimum within-group sum
 # And working out the elements
 group_index <- which.min(withingroup_distances)
 element_indices <- group_index + 0:(n-1)

 v[element_indices]


findClosest(v, 2)
# 25 26
findClosest(v, 3)
# 23 25 26

answered Jul 31 at 8:47

antoine-sac

3,6792 gold badges15 silver badges45 bronze badges

answered Jul 31 at 8:47

antoine-sac

3,6792 gold badges15 silver badges45 bronze badges

answered Jul 31 at 8:47

antoine-sac

3,6792 gold badges15 silver badges45 bronze badges

answered Jul 31 at 8:47

antoine-sac

3,6792 gold badges15 silver badges45 bronze badges

Thanks, I have implemented it and it works great! Thanks also for the explanation, I understood the logic behind it.

– Terry
Jul 31 at 10:50

1

Interestingly, this solution can very easily be extended to use another norm such as L2 instead of L1, if you want to penalise larger gaps more. For example, (10,20,30) and (50,55,70) are equally near according to L1 (10+10=5+15) but the first group is better according to L2 (10^2+10^2 < 5^2+15^2).

– antoine-sac
Aug 1 at 7:04

Very interesting. Actually, I think I am gonna give it a try since I would like to find the three numbers with minimum variance between them. The L1 does not allow them, L2 instead would allow me to select the group with minimum variance. Thanks very much!

– Terry
Aug 1 at 16:24

1

You're welcome, you just have to use delta^2 in the rollsum

– antoine-sac
Aug 1 at 16:29

add a comment |

Thanks, I have implemented it and it works great! Thanks also for the explanation, I understood the logic behind it.

– Terry
Jul 31 at 10:50

1

Interestingly, this solution can very easily be extended to use another norm such as L2 instead of L1, if you want to penalise larger gaps more. For example, (10,20,30) and (50,55,70) are equally near according to L1 (10+10=5+15) but the first group is better according to L2 (10^2+10^2 < 5^2+15^2).

– antoine-sac
Aug 1 at 7:04

Very interesting. Actually, I think I am gonna give it a try since I would like to find the three numbers with minimum variance between them. The L1 does not allow them, L2 instead would allow me to select the group with minimum variance. Thanks very much!

– Terry
Aug 1 at 16:24

1

You're welcome, you just have to use delta^2 in the rollsum

– antoine-sac
Aug 1 at 16:29

Thanks, I have implemented it and it works great! Thanks also for the explanation, I understood the logic behind it.

– Terry
Jul 31 at 10:50

Interestingly, this solution can very easily be extended to use another norm such as L2 instead of L1, if you want to penalise larger gaps more. For example, (10,20,30) and (50,55,70) are equally near according to L1 (10+10=5+15) but the first group is better according to L2 (10^2+10^2 < 5^2+15^2).

– antoine-sac
Aug 1 at 7:04

Very interesting. Actually, I think I am gonna give it a try since I would like to find the three numbers with minimum variance between them. The L1 does not allow them, L2 instead would allow me to select the group with minimum variance. Thanks very much!

– Terry
Aug 1 at 16:24

You're welcome, you just have to use delta^2 in the rollsum

– antoine-sac
Aug 1 at 16:29

add a comment |

An idea is to use zoo library to do a rolling operation, i.e.

library(zoo)
m1 <- rollapply(v, 3, by = 1, function(i)c(sum(diff(i)), c(i)))
m1[which.min(m1[, 1]),][-1]
#[1] 23 25 26

Or make it into a function,

findClosest <- function(vec, n) 
 require(zoo)
 vec1 <- sort(vec)
 m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i)))
 return(m1[which.min(m1[, 1]),][-1])


findClosest(v, 3)
#[1] 23 25 26

edited Jul 31 at 8:51

answered Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

add a comment |

An idea is to use zoo library to do a rolling operation, i.e.

library(zoo)
m1 <- rollapply(v, 3, by = 1, function(i)c(sum(diff(i)), c(i)))
m1[which.min(m1[, 1]),][-1]
#[1] 23 25 26

Or make it into a function,

findClosest <- function(vec, n) 
 require(zoo)
 vec1 <- sort(vec)
 m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i)))
 return(m1[which.min(m1[, 1]),][-1])


findClosest(v, 3)
#[1] 23 25 26

edited Jul 31 at 8:51

answered Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

add a comment |

An idea is to use zoo library to do a rolling operation, i.e.

library(zoo)
m1 <- rollapply(v, 3, by = 1, function(i)c(sum(diff(i)), c(i)))
m1[which.min(m1[, 1]),][-1]
#[1] 23 25 26

Or make it into a function,

findClosest <- function(vec, n) 
 require(zoo)
 vec1 <- sort(vec)
 m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i)))
 return(m1[which.min(m1[, 1]),][-1])


findClosest(v, 3)
#[1] 23 25 26

edited Jul 31 at 8:51

answered Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

An idea is to use zoo library to do a rolling operation, i.e.

library(zoo)
m1 <- rollapply(v, 3, by = 1, function(i)c(sum(diff(i)), c(i)))
m1[which.min(m1[, 1]),][-1]
#[1] 23 25 26

Or make it into a function,

findClosest <- function(vec, n) 
 require(zoo)
 vec1 <- sort(vec)
 m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i)))
 return(m1[which.min(m1[, 1]),][-1])


findClosest(v, 3)
#[1] 23 25 26

edited Jul 31 at 8:51

answered Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

edited Jul 31 at 8:51

answered Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

answered Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

answered Jul 31 at 8:46

Sotos

34.7k5 gold badges19 silver badges45 bronze badges

add a comment |

A base R option, idea being we first sort the vector and subtract every ith element with i + n - 1 element in the sorted vector and select the group which has minimum difference.

closest_n_vectors <- function(v, n) 
 v1 <- sort(v)
 inds <- which.min(sapply(head(seq_along(v1), -(n - 1)), function(x) 
 v1[x + n -1] - v1[x]))
 v1[inds: (inds + n - 1)]


closest_n_vectors(v, 3)
#[1] 23 25 26

closest_n_vectors(c(2, 10, 1, 20, 4, 5, 23), 2)
#[1] 1 2

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 2)
#[1] 65 67

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 3)
#[1] 1 19 23

In case of tie this will return the numbers with smallest value since we are using which.min.

BENCHMARKS

Since we have got quite a few answers, it is worth doing a benchmark of all the solutions till now

set.seed(1234)
x <- sample(100000000, 100000)

identical(findClosest_antoine(x, 3), findClosest_Sotos(x, 3), 
 closest_n_vectors_Ronak(x, 3), findClosest_Cole(x, 3))
#[1] TRUE

microbenchmark::microbenchmark(
 antoine = findClosest_antoine(x, 3),
 Sotos = findClosest_Sotos(x, 3), 
 Ronak = closest_n_vectors_Ronak(x, 3),
 Cole = findClosest_Cole(x, 3),
 times = 10
)



#Unit: milliseconds
# expr min lq mean median uq max neval cld
#antoine 148.751 159.071 163.298 162.581 167.365 181.314 10 b 
# Sotos 1086.098 1349.762 1372.232 1398.211 1453.217 1553.945 10 c
# Ronak 54.248 56.870 78.886 83.129 94.748 100.299 10 a 
# Cole 4.958 5.042 6.202 6.047 7.386 7.915 10 a

edited Jul 31 at 11:27

answered Jul 31 at 8:54

Ronak Shah

74k11 gold badges48 silver badges83 bronze badges

1

@Cole I am not sure about cld either but I have it in my output. Yes, @Rui's solution was not identical. I didn't check that earlier.

– Ronak Shah
Jul 31 at 11:18

add a comment |

A base R option, idea being we first sort the vector and subtract every ith element with i + n - 1 element in the sorted vector and select the group which has minimum difference.

closest_n_vectors <- function(v, n) 
 v1 <- sort(v)
 inds <- which.min(sapply(head(seq_along(v1), -(n - 1)), function(x) 
 v1[x + n -1] - v1[x]))
 v1[inds: (inds + n - 1)]


closest_n_vectors(v, 3)
#[1] 23 25 26

closest_n_vectors(c(2, 10, 1, 20, 4, 5, 23), 2)
#[1] 1 2

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 2)
#[1] 65 67

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 3)
#[1] 1 19 23

In case of tie this will return the numbers with smallest value since we are using which.min.

BENCHMARKS

Since we have got quite a few answers, it is worth doing a benchmark of all the solutions till now

set.seed(1234)
x <- sample(100000000, 100000)

identical(findClosest_antoine(x, 3), findClosest_Sotos(x, 3), 
 closest_n_vectors_Ronak(x, 3), findClosest_Cole(x, 3))
#[1] TRUE

microbenchmark::microbenchmark(
 antoine = findClosest_antoine(x, 3),
 Sotos = findClosest_Sotos(x, 3), 
 Ronak = closest_n_vectors_Ronak(x, 3),
 Cole = findClosest_Cole(x, 3),
 times = 10
)



#Unit: milliseconds
# expr min lq mean median uq max neval cld
#antoine 148.751 159.071 163.298 162.581 167.365 181.314 10 b 
# Sotos 1086.098 1349.762 1372.232 1398.211 1453.217 1553.945 10 c
# Ronak 54.248 56.870 78.886 83.129 94.748 100.299 10 a 
# Cole 4.958 5.042 6.202 6.047 7.386 7.915 10 a

edited Jul 31 at 11:27

answered Jul 31 at 8:54

Ronak Shah

74k11 gold badges48 silver badges83 bronze badges

1

@Cole I am not sure about cld either but I have it in my output. Yes, @Rui's solution was not identical. I didn't check that earlier.

– Ronak Shah
Jul 31 at 11:18

add a comment |

A base R option, idea being we first sort the vector and subtract every ith element with i + n - 1 element in the sorted vector and select the group which has minimum difference.

closest_n_vectors <- function(v, n) 
 v1 <- sort(v)
 inds <- which.min(sapply(head(seq_along(v1), -(n - 1)), function(x) 
 v1[x + n -1] - v1[x]))
 v1[inds: (inds + n - 1)]


closest_n_vectors(v, 3)
#[1] 23 25 26

closest_n_vectors(c(2, 10, 1, 20, 4, 5, 23), 2)
#[1] 1 2

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 2)
#[1] 65 67

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 3)
#[1] 1 19 23

In case of tie this will return the numbers with smallest value since we are using which.min.

BENCHMARKS

Since we have got quite a few answers, it is worth doing a benchmark of all the solutions till now

set.seed(1234)
x <- sample(100000000, 100000)

identical(findClosest_antoine(x, 3), findClosest_Sotos(x, 3), 
 closest_n_vectors_Ronak(x, 3), findClosest_Cole(x, 3))
#[1] TRUE

microbenchmark::microbenchmark(
 antoine = findClosest_antoine(x, 3),
 Sotos = findClosest_Sotos(x, 3), 
 Ronak = closest_n_vectors_Ronak(x, 3),
 Cole = findClosest_Cole(x, 3),
 times = 10
)



#Unit: milliseconds
# expr min lq mean median uq max neval cld
#antoine 148.751 159.071 163.298 162.581 167.365 181.314 10 b 
# Sotos 1086.098 1349.762 1372.232 1398.211 1453.217 1553.945 10 c
# Ronak 54.248 56.870 78.886 83.129 94.748 100.299 10 a 
# Cole 4.958 5.042 6.202 6.047 7.386 7.915 10 a

edited Jul 31 at 11:27

answered Jul 31 at 8:54

Ronak Shah

74k11 gold badges48 silver badges83 bronze badges

A base R option, idea being we first sort the vector and subtract every ith element with i + n - 1 element in the sorted vector and select the group which has minimum difference.

closest_n_vectors <- function(v, n) 
 v1 <- sort(v)
 inds <- which.min(sapply(head(seq_along(v1), -(n - 1)), function(x) 
 v1[x + n -1] - v1[x]))
 v1[inds: (inds + n - 1)]


closest_n_vectors(v, 3)
#[1] 23 25 26

closest_n_vectors(c(2, 10, 1, 20, 4, 5, 23), 2)
#[1] 1 2

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 2)
#[1] 65 67

closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 3)
#[1] 1 19 23

In case of tie this will return the numbers with smallest value since we are using which.min.

BENCHMARKS

Since we have got quite a few answers, it is worth doing a benchmark of all the solutions till now

set.seed(1234)
x <- sample(100000000, 100000)

identical(findClosest_antoine(x, 3), findClosest_Sotos(x, 3), 
 closest_n_vectors_Ronak(x, 3), findClosest_Cole(x, 3))
#[1] TRUE

microbenchmark::microbenchmark(
 antoine = findClosest_antoine(x, 3),
 Sotos = findClosest_Sotos(x, 3), 
 Ronak = closest_n_vectors_Ronak(x, 3),
 Cole = findClosest_Cole(x, 3),
 times = 10
)



#Unit: milliseconds
# expr min lq mean median uq max neval cld
#antoine 148.751 159.071 163.298 162.581 167.365 181.314 10 b 
# Sotos 1086.098 1349.762 1372.232 1398.211 1453.217 1553.945 10 c
# Ronak 54.248 56.870 78.886 83.129 94.748 100.299 10 a 
# Cole 4.958 5.042 6.202 6.047 7.386 7.915 10 a

edited Jul 31 at 11:27

answered Jul 31 at 8:54

Ronak Shah

74k11 gold badges48 silver badges83 bronze badges

edited Jul 31 at 11:27

answered Jul 31 at 8:54

Ronak Shah

74k11 gold badges48 silver badges83 bronze badges

answered Jul 31 at 8:54

Ronak Shah

74k11 gold badges48 silver badges83 bronze badges

answered Jul 31 at 8:54

Ronak Shah

74k11 gold badges48 silver badges83 bronze badges

1

@Cole I am not sure about cld either but I have it in my output. Yes, @Rui's solution was not identical. I didn't check that earlier.

– Ronak Shah
Jul 31 at 11:18

add a comment |

1

@Cole I am not sure about cld either but I have it in my output. Yes, @Rui's solution was not identical. I didn't check that earlier.

– Ronak Shah
Jul 31 at 11:18

@Cole I am not sure about cld either but I have it in my output. Yes, @Rui's solution was not identical. I didn't check that earlier.

– Ronak Shah
Jul 31 at 11:18

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

OKH,VD4kRPpPIVdOqhGaWHp 8 oBDP0TSEsNvGeTkPtG,qAp,BJLSTd2OQbwpfDl0R4i

搜尋此網誌

Ttdfjt

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

4 Answers
4

4 Answers
4

4 Answers
4