Is there a short way to check uniqueness of values without using 'if' and multiple 'and's?How do I check if there are duplicates in a flat list?Java's TreeSet equivalent in Python?Why is `*l` faster than `set(l)` - python sets (not really only for sets, for all sequences)How do I check whether a file exists without exceptions?What's the canonical way to check for type in Python?How do I return multiple values from a function?Why are Python lambdas useful?Python progression path - From apprentice to gurulist comprehension vs. lambda + filterSpeed comparison with Project Euler: C vs Python vs Erlang vs HaskellWhy is reading lines from stdin much slower in C++ than Python?How to check if the string is empty?Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?
Why didn't Avengers simply jump 5 years back?
Does friction always oppose motion?
Chandra exiles a card, I play it, it gets exiled again
How useful would a hydroelectric power plant be in the post-apocalypse world?
Why was Pan Am Flight 103 flying over Lockerbie?
Advantages of using bra-ket notation
Why isn't UDP with reliability (implemented at Application layer) a substitute of TCP?
What does 'in attendance' mean on an England death certificate?
How does mmorpg store data?
Enterprise Layers and Naming Conventions
Why do movie directors use brown tint on Mexico cities?
Why didn't Caesar move against Sextus Pompey immediately after Munda?
How does the 'five minute adventuring day' affect class balance?
Why will we fail creating a self sustaining off world colony?
What is my external HDD doing?
Why did the Apple IIe make a hideous noise if you inserted the disk upside down?
Is it theoretically possible to hack printer using scanner tray?
Understanding the as-if rule, "the program was executed as written"
Russian equivalents of 能骗就骗 (if you can cheat, then cheat)
Is there a word for the act of simultaneously pulling and twisting an object?
A* pathfinding algorithm too slow
What prevents a US state from colonizing a smaller state?
Is this house-rule removing the increased effect of cantrips at higher character levels balanced?
What's the lunar calendar of two moons
Is there a short way to check uniqueness of values without using 'if' and multiple 'and's?
How do I check if there are duplicates in a flat list?Java's TreeSet equivalent in Python?Why is `*l` faster than `set(l)` - python sets (not really only for sets, for all sequences)How do I check whether a file exists without exceptions?What's the canonical way to check for type in Python?How do I return multiple values from a function?Why are Python lambdas useful?Python progression path - From apprentice to gurulist comprehension vs. lambda + filterSpeed comparison with Project Euler: C vs Python vs Erlang vs HaskellWhy is reading lines from stdin much slower in C++ than Python?How to check if the string is empty?Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?
I am writing some code and I need to compare some values. The point is that none of the variables should have the same value as another. For example:
a=1
b=2
c=3
if a != b and b != c and a != c:
#do something
Now, it is easy to see that in a case of code with more variables, the if
statement becomes very long and full of and
s. Is there a short way to tell Python that no 2 variable values should be the same.
python duplicates
|
show 1 more comment
I am writing some code and I need to compare some values. The point is that none of the variables should have the same value as another. For example:
a=1
b=2
c=3
if a != b and b != c and a != c:
#do something
Now, it is easy to see that in a case of code with more variables, the if
statement becomes very long and full of and
s. Is there a short way to tell Python that no 2 variable values should be the same.
python duplicates
I can't think of any way to do this easily in a 1-liner. But if this is something I needed to do often, I'd write a short function called something likeall_different
which takes an arbitrary number of arguments and tells you if they're all different or not. Then you can just doif all_different(a, b, c):
– Robin Zigmond
Jun 22 at 14:59
6
Try using length of set. That would make it fairly easy.
– susenj
Jun 22 at 15:00
3
If you make a list[a, b, c]
, then this is a duplicate of How do I check if there are duplicates in a flat list?
– wjandrea
Jun 23 at 1:31
1
What else do you know about the values? I don't believe it's even mathematically possible if equality testing is the only operation allowed.
– Mehrdad
Jun 23 at 3:21
Beware trading understandable for clever — creating a set just to throw it away for the purpose of (IMO) abusing its uniqueness properties means when you go look at it in a few months you’ll wonder “why on earth did I make this set?”
– thehole
Jun 25 at 16:45
|
show 1 more comment
I am writing some code and I need to compare some values. The point is that none of the variables should have the same value as another. For example:
a=1
b=2
c=3
if a != b and b != c and a != c:
#do something
Now, it is easy to see that in a case of code with more variables, the if
statement becomes very long and full of and
s. Is there a short way to tell Python that no 2 variable values should be the same.
python duplicates
I am writing some code and I need to compare some values. The point is that none of the variables should have the same value as another. For example:
a=1
b=2
c=3
if a != b and b != c and a != c:
#do something
Now, it is easy to see that in a case of code with more variables, the if
statement becomes very long and full of and
s. Is there a short way to tell Python that no 2 variable values should be the same.
python duplicates
python duplicates
edited Jun 23 at 11:52
Mars
1,6319 silver badges20 bronze badges
1,6319 silver badges20 bronze badges
asked Jun 22 at 14:55
Bunny DavisBunny Davis
1271 silver badge5 bronze badges
1271 silver badge5 bronze badges
I can't think of any way to do this easily in a 1-liner. But if this is something I needed to do often, I'd write a short function called something likeall_different
which takes an arbitrary number of arguments and tells you if they're all different or not. Then you can just doif all_different(a, b, c):
– Robin Zigmond
Jun 22 at 14:59
6
Try using length of set. That would make it fairly easy.
– susenj
Jun 22 at 15:00
3
If you make a list[a, b, c]
, then this is a duplicate of How do I check if there are duplicates in a flat list?
– wjandrea
Jun 23 at 1:31
1
What else do you know about the values? I don't believe it's even mathematically possible if equality testing is the only operation allowed.
– Mehrdad
Jun 23 at 3:21
Beware trading understandable for clever — creating a set just to throw it away for the purpose of (IMO) abusing its uniqueness properties means when you go look at it in a few months you’ll wonder “why on earth did I make this set?”
– thehole
Jun 25 at 16:45
|
show 1 more comment
I can't think of any way to do this easily in a 1-liner. But if this is something I needed to do often, I'd write a short function called something likeall_different
which takes an arbitrary number of arguments and tells you if they're all different or not. Then you can just doif all_different(a, b, c):
– Robin Zigmond
Jun 22 at 14:59
6
Try using length of set. That would make it fairly easy.
– susenj
Jun 22 at 15:00
3
If you make a list[a, b, c]
, then this is a duplicate of How do I check if there are duplicates in a flat list?
– wjandrea
Jun 23 at 1:31
1
What else do you know about the values? I don't believe it's even mathematically possible if equality testing is the only operation allowed.
– Mehrdad
Jun 23 at 3:21
Beware trading understandable for clever — creating a set just to throw it away for the purpose of (IMO) abusing its uniqueness properties means when you go look at it in a few months you’ll wonder “why on earth did I make this set?”
– thehole
Jun 25 at 16:45
I can't think of any way to do this easily in a 1-liner. But if this is something I needed to do often, I'd write a short function called something like
all_different
which takes an arbitrary number of arguments and tells you if they're all different or not. Then you can just do if all_different(a, b, c):
– Robin Zigmond
Jun 22 at 14:59
I can't think of any way to do this easily in a 1-liner. But if this is something I needed to do often, I'd write a short function called something like
all_different
which takes an arbitrary number of arguments and tells you if they're all different or not. Then you can just do if all_different(a, b, c):
– Robin Zigmond
Jun 22 at 14:59
6
6
Try using length of set. That would make it fairly easy.
– susenj
Jun 22 at 15:00
Try using length of set. That would make it fairly easy.
– susenj
Jun 22 at 15:00
3
3
If you make a list
[a, b, c]
, then this is a duplicate of How do I check if there are duplicates in a flat list?– wjandrea
Jun 23 at 1:31
If you make a list
[a, b, c]
, then this is a duplicate of How do I check if there are duplicates in a flat list?– wjandrea
Jun 23 at 1:31
1
1
What else do you know about the values? I don't believe it's even mathematically possible if equality testing is the only operation allowed.
– Mehrdad
Jun 23 at 3:21
What else do you know about the values? I don't believe it's even mathematically possible if equality testing is the only operation allowed.
– Mehrdad
Jun 23 at 3:21
Beware trading understandable for clever — creating a set just to throw it away for the purpose of (IMO) abusing its uniqueness properties means when you go look at it in a few months you’ll wonder “why on earth did I make this set?”
– thehole
Jun 25 at 16:45
Beware trading understandable for clever — creating a set just to throw it away for the purpose of (IMO) abusing its uniqueness properties means when you go look at it in a few months you’ll wonder “why on earth did I make this set?”
– thehole
Jun 25 at 16:45
|
show 1 more comment
5 Answers
5
active
oldest
votes
You can try making sets.
a, b, c = 1, 2, 3
if len(a,b,c) == 3:
# Do something
If your variables are kept as a list, it becomes even more simple:
a = [1,2,3,4,4]
if len(set(a)) == len(a):
# Do something
Here is the official documentation of python sets.
This works only for hashable objects such as integers, as given in the question. For non-hashable objects, see @chepner's more general solution.
This is definitely the way you should go for hashable objects, since it takes O(n) time for the number of objects n. The combinatorial method for non-hashable objects take O(n^2) time.
1
For objects that are not hashable but are comparable, an O(log(N)) solution is to use binary autobalanced trees. See this stackoverflow.com/questions/2710651/…. This is much more efficient than iterating on all pairs of objects which has O(N^2) time complexity.
– Ismael EL ATIFI
Jun 22 at 18:47
6
@IsmaelELATIFI: I think you dropped a factor of N. You can't even put all the values into a tree in O(log(N)) time. Anyway, sorting would be easier in Python, since Python doesn't come with a TreeSet.
– user2357112
Jun 23 at 0:15
1
@user2357112 Yes indeed I meant O(N*log(N)) :) And yes you are right sorting and then removing adjacent duplicates in one pass would have same time complexity and would be easier than finding a TreeSet Python implementation.
– Ismael EL ATIFI
Jun 23 at 14:57
add a comment |
Assuming hashing is not an option, use itertools.combinations
and all
.
from itertools import combinations
if all(x != y for x, y in combinations([a,b,c], 2)):
# All values are unique
8
worth noting that this is quadratic in the number of elements, while the hashable solution (if available) is linear
– Ant
Jun 23 at 18:38
4
Downvote from me for an unnecessarily slow (quadratic as opposed to linear, as noted by @Ant) solution. It'll be fine for single-digit-sized collections, but not much more. The set-based solution is faster, shorter, and requires no libraries. On the upside, this works for unhashable items, and might consume less memory (depending on the implementation ofcombinations
).
– Aasmund Eldhuset
Jun 23 at 19:30
1
@AasmundEldhuset "It'll be fine for single-digit-sized collections, but not much more" is an exaggeration. If you're running it in a tight loop, that could be true, but if you're not it'll be fine for at least several hundred elements; my computer, at least, can run this on 1000 unique elements in about 50ms, more than fast enough if it isn't being executed repeatedly or in time-critical sections of code.
– insert_name_here
Jun 24 at 5:48
1
If you do have a large set of items AND the likeliness of doubles is high, you're better off not using all. Just have the for loop and break out as soon as you find a double.
– Falc
Jun 24 at 7:41
2
Doesn'tall
also short-circuit?
– Mees de Vries
Jun 24 at 9:18
|
show 1 more comment
It depends a bit on the kind of values that you have.
If they are well-behaved and hashable then you can (as others already pointed out) simply use a set
to find out how many unique values you have and if that doesn't equal the number of total values you have at least two values that are equal.
def all_distinct(*values):
return len(set(values)) == len(values)
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
Hashable values and lazy
In case you really have lots of values and want to abort as soon as one match is found you could also lazily create the set. It's more complicated and probably slower if all values are distinct but it provides short-circuiting in case a duplicate is found:
def all_distinct(*values):
seen = set()
seen_add = seen.add
last_count = 0
for item in values:
seen_add(item)
new_count = len(seen)
if new_count == last_count:
return False
last_count = new_count
return True
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
However if the values are not hashable this will not work because set
requires hashable values.
Unhashable values
In case you don't have hashable values you could use a plain list to store the already processed values and just check if each new item is already in the list:
def all_distinct(*values):
seen = []
for item in values:
if item in seen:
return False
seen.append(item)
return True
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
all_distinct([1, 2], [2, 3], [3, 4]) # True
all_distinct([1, 2], [2, 3], [1, 2]) # False
This will be slower because checking if a value is in a list requires to compare it to each item in the list.
A (3rd-party) library solution
In case you don't mind an additional dependency you could also use one of my libraries (available on PyPi and conda-forge) for this task iteration_utilities.all_distinct
. This function can handle both hashable and unhashable values (and a mix of these):
from iteration_utilities import all_distinct
all_distinct([1, 2, 3]) # True
all_distinct([1, 2, 2]) # False
all_distinct([[1, 2], [2, 3], [3, 4]]) # True
all_distinct([[1, 2], [2, 3], [1, 2]]) # False
General comments
Note that all of the above mentioned approaches rely on the fact that equality means "not not-equal" which is the case for (almost) all built-in types but doesn't necessarily be the case!
However I want to point out chepners answers which doesn't require hashability of the values and doesn't rely on "equality means not not-equal" by explicitly checking for !=
. It's also short-circuiting so it behaves like your original and
approach.
Performance
To get a rough idea about the performance I'm using another of my libraries (simple_benchmark
)
I used distinct hashable inputs (left) and unhashable inputs (right). For hashable inputs the set-approaches performed best, while for unhashable inputs the list-approaches performed better. The combinations
-based approach seemed slowest in both cases:
I also tested the performance in case there are duplicates, for convenience I regarded the case when the first two elements were equal (otherwise the setup was identical to the previous case):
from iteration_utilities import all_distinct
from itertools import combinations
from simple_benchmark import BenchmarkBuilder
# First benchmark
b1 = BenchmarkBuilder()
@b1.add_function()
def all_distinct_set(values):
return len(set(values)) == len(values)
@b1.add_function()
def all_distinct_set_sc(values):
seen = set()
seen_add = seen.add
last_count = 0
for item in values:
seen_add(item)
new_count = len(seen)
if new_count == last_count:
return False
last_count = new_count
return True
@b1.add_function()
def all_distinct_list(values):
seen = []
for item in values:
if item in seen:
return False
seen.append(item)
return True
b1.add_function(alias='all_distinct_iu')(all_distinct)
@b1.add_function()
def all_distinct_combinations(values):
return all(x != y for x, y in combinations(values, 2))
@b1.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, range(size)
r1 = b1.run()
r1.plot()
# Second benchmark
b2 = BenchmarkBuilder()
b2.add_function(alias='all_distinct_iu')(all_distinct)
b2.add_functions([all_distinct_combinations, all_distinct_list])
@b2.add_arguments('number of unhashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [[i] for i in range(size)]
r2 = b2.run()
r2.plot()
# Third benchmark
b3 = BenchmarkBuilder()
b3.add_function(alias='all_distinct_iu')(all_distinct)
b3.add_functions([all_distinct_set, all_distinct_set_sc, all_distinct_combinations, all_distinct_list])
@b3.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [0, *range(size)]
r3 = b3.run()
r3.plot()
# Fourth benchmark
b4 = BenchmarkBuilder()
b4.add_function(alias='all_distinct_iu')(all_distinct)
b4.add_functions([all_distinct_combinations, all_distinct_list])
@b4.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [[0], *[[i] for i in range(size)]]
r4 = b4.run()
r4.plot()
Is comparing the length of the set to check whether theadd
ed element was new really the best (fastest?) way? It seems pretty cumbersome.
– Bergi
Jun 24 at 10:52
1
@Bergi Yes t's cumbersome. But checking forin
and doing theadd
invokes the hashing and (internal) find-slot-for-hash operation twice. Depending on the value to add and the state of the set it can be significantly faster to justadd
it and check the length. Maybe I'm missing something obvious but at least in my benchmarks theset.add
+len(set)
approach was faster thanin set
andset.add
.
– MSeifert
Jun 24 at 10:55
add a comment |
The slightly neater way is to stick all the variables in a list, then create a new set from the list. If the list and the set aren't the same length, some of the variables were equal, since sets can't contain duplicates:
vars = [a, b, c]
no_dupes = set(vars)
if len(vars) != len(no_dupes):
# Some of them had the same value
This assumes the values are hashable; which they are in your example.
2
This assumes the values are hashable.
– chepner
Jun 22 at 15:23
@chepner Yep. They're using numbers in their question. Added a note at the bottom though.
– Carcigenicate
Jun 22 at 15:25
add a comment |
You can use all
with list.count
as well, it is reasonable, may not be the best, but worth to answer:
>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
True
>>> a, b, c = 1, 2, 1
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
False
>>>
Also this solution works with unhashable objects in the list.
A way that only works with hashable objects in the list:
>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> len(*l) == len(l)
True
>>>
Actually:
>>> from timeit import timeit
>>> timeit(lambda: *l, number=1000000)
0.5163292075532642
>>> timeit(lambda: set(l), number=1000000)
0.7005311807841572
>>>
*l
is faster than set(l)
, more info here.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56716492%2fis-there-a-short-way-to-check-uniqueness-of-values-without-using-if-and-multip%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can try making sets.
a, b, c = 1, 2, 3
if len(a,b,c) == 3:
# Do something
If your variables are kept as a list, it becomes even more simple:
a = [1,2,3,4,4]
if len(set(a)) == len(a):
# Do something
Here is the official documentation of python sets.
This works only for hashable objects such as integers, as given in the question. For non-hashable objects, see @chepner's more general solution.
This is definitely the way you should go for hashable objects, since it takes O(n) time for the number of objects n. The combinatorial method for non-hashable objects take O(n^2) time.
1
For objects that are not hashable but are comparable, an O(log(N)) solution is to use binary autobalanced trees. See this stackoverflow.com/questions/2710651/…. This is much more efficient than iterating on all pairs of objects which has O(N^2) time complexity.
– Ismael EL ATIFI
Jun 22 at 18:47
6
@IsmaelELATIFI: I think you dropped a factor of N. You can't even put all the values into a tree in O(log(N)) time. Anyway, sorting would be easier in Python, since Python doesn't come with a TreeSet.
– user2357112
Jun 23 at 0:15
1
@user2357112 Yes indeed I meant O(N*log(N)) :) And yes you are right sorting and then removing adjacent duplicates in one pass would have same time complexity and would be easier than finding a TreeSet Python implementation.
– Ismael EL ATIFI
Jun 23 at 14:57
add a comment |
You can try making sets.
a, b, c = 1, 2, 3
if len(a,b,c) == 3:
# Do something
If your variables are kept as a list, it becomes even more simple:
a = [1,2,3,4,4]
if len(set(a)) == len(a):
# Do something
Here is the official documentation of python sets.
This works only for hashable objects such as integers, as given in the question. For non-hashable objects, see @chepner's more general solution.
This is definitely the way you should go for hashable objects, since it takes O(n) time for the number of objects n. The combinatorial method for non-hashable objects take O(n^2) time.
1
For objects that are not hashable but are comparable, an O(log(N)) solution is to use binary autobalanced trees. See this stackoverflow.com/questions/2710651/…. This is much more efficient than iterating on all pairs of objects which has O(N^2) time complexity.
– Ismael EL ATIFI
Jun 22 at 18:47
6
@IsmaelELATIFI: I think you dropped a factor of N. You can't even put all the values into a tree in O(log(N)) time. Anyway, sorting would be easier in Python, since Python doesn't come with a TreeSet.
– user2357112
Jun 23 at 0:15
1
@user2357112 Yes indeed I meant O(N*log(N)) :) And yes you are right sorting and then removing adjacent duplicates in one pass would have same time complexity and would be easier than finding a TreeSet Python implementation.
– Ismael EL ATIFI
Jun 23 at 14:57
add a comment |
You can try making sets.
a, b, c = 1, 2, 3
if len(a,b,c) == 3:
# Do something
If your variables are kept as a list, it becomes even more simple:
a = [1,2,3,4,4]
if len(set(a)) == len(a):
# Do something
Here is the official documentation of python sets.
This works only for hashable objects such as integers, as given in the question. For non-hashable objects, see @chepner's more general solution.
This is definitely the way you should go for hashable objects, since it takes O(n) time for the number of objects n. The combinatorial method for non-hashable objects take O(n^2) time.
You can try making sets.
a, b, c = 1, 2, 3
if len(a,b,c) == 3:
# Do something
If your variables are kept as a list, it becomes even more simple:
a = [1,2,3,4,4]
if len(set(a)) == len(a):
# Do something
Here is the official documentation of python sets.
This works only for hashable objects such as integers, as given in the question. For non-hashable objects, see @chepner's more general solution.
This is definitely the way you should go for hashable objects, since it takes O(n) time for the number of objects n. The combinatorial method for non-hashable objects take O(n^2) time.
edited Jun 24 at 4:18
answered Jun 22 at 14:59
TaegyungTaegyung
1,2758 silver badges23 bronze badges
1,2758 silver badges23 bronze badges
1
For objects that are not hashable but are comparable, an O(log(N)) solution is to use binary autobalanced trees. See this stackoverflow.com/questions/2710651/…. This is much more efficient than iterating on all pairs of objects which has O(N^2) time complexity.
– Ismael EL ATIFI
Jun 22 at 18:47
6
@IsmaelELATIFI: I think you dropped a factor of N. You can't even put all the values into a tree in O(log(N)) time. Anyway, sorting would be easier in Python, since Python doesn't come with a TreeSet.
– user2357112
Jun 23 at 0:15
1
@user2357112 Yes indeed I meant O(N*log(N)) :) And yes you are right sorting and then removing adjacent duplicates in one pass would have same time complexity and would be easier than finding a TreeSet Python implementation.
– Ismael EL ATIFI
Jun 23 at 14:57
add a comment |
1
For objects that are not hashable but are comparable, an O(log(N)) solution is to use binary autobalanced trees. See this stackoverflow.com/questions/2710651/…. This is much more efficient than iterating on all pairs of objects which has O(N^2) time complexity.
– Ismael EL ATIFI
Jun 22 at 18:47
6
@IsmaelELATIFI: I think you dropped a factor of N. You can't even put all the values into a tree in O(log(N)) time. Anyway, sorting would be easier in Python, since Python doesn't come with a TreeSet.
– user2357112
Jun 23 at 0:15
1
@user2357112 Yes indeed I meant O(N*log(N)) :) And yes you are right sorting and then removing adjacent duplicates in one pass would have same time complexity and would be easier than finding a TreeSet Python implementation.
– Ismael EL ATIFI
Jun 23 at 14:57
1
1
For objects that are not hashable but are comparable, an O(log(N)) solution is to use binary autobalanced trees. See this stackoverflow.com/questions/2710651/…. This is much more efficient than iterating on all pairs of objects which has O(N^2) time complexity.
– Ismael EL ATIFI
Jun 22 at 18:47
For objects that are not hashable but are comparable, an O(log(N)) solution is to use binary autobalanced trees. See this stackoverflow.com/questions/2710651/…. This is much more efficient than iterating on all pairs of objects which has O(N^2) time complexity.
– Ismael EL ATIFI
Jun 22 at 18:47
6
6
@IsmaelELATIFI: I think you dropped a factor of N. You can't even put all the values into a tree in O(log(N)) time. Anyway, sorting would be easier in Python, since Python doesn't come with a TreeSet.
– user2357112
Jun 23 at 0:15
@IsmaelELATIFI: I think you dropped a factor of N. You can't even put all the values into a tree in O(log(N)) time. Anyway, sorting would be easier in Python, since Python doesn't come with a TreeSet.
– user2357112
Jun 23 at 0:15
1
1
@user2357112 Yes indeed I meant O(N*log(N)) :) And yes you are right sorting and then removing adjacent duplicates in one pass would have same time complexity and would be easier than finding a TreeSet Python implementation.
– Ismael EL ATIFI
Jun 23 at 14:57
@user2357112 Yes indeed I meant O(N*log(N)) :) And yes you are right sorting and then removing adjacent duplicates in one pass would have same time complexity and would be easier than finding a TreeSet Python implementation.
– Ismael EL ATIFI
Jun 23 at 14:57
add a comment |
Assuming hashing is not an option, use itertools.combinations
and all
.
from itertools import combinations
if all(x != y for x, y in combinations([a,b,c], 2)):
# All values are unique
8
worth noting that this is quadratic in the number of elements, while the hashable solution (if available) is linear
– Ant
Jun 23 at 18:38
4
Downvote from me for an unnecessarily slow (quadratic as opposed to linear, as noted by @Ant) solution. It'll be fine for single-digit-sized collections, but not much more. The set-based solution is faster, shorter, and requires no libraries. On the upside, this works for unhashable items, and might consume less memory (depending on the implementation ofcombinations
).
– Aasmund Eldhuset
Jun 23 at 19:30
1
@AasmundEldhuset "It'll be fine for single-digit-sized collections, but not much more" is an exaggeration. If you're running it in a tight loop, that could be true, but if you're not it'll be fine for at least several hundred elements; my computer, at least, can run this on 1000 unique elements in about 50ms, more than fast enough if it isn't being executed repeatedly or in time-critical sections of code.
– insert_name_here
Jun 24 at 5:48
1
If you do have a large set of items AND the likeliness of doubles is high, you're better off not using all. Just have the for loop and break out as soon as you find a double.
– Falc
Jun 24 at 7:41
2
Doesn'tall
also short-circuit?
– Mees de Vries
Jun 24 at 9:18
|
show 1 more comment
Assuming hashing is not an option, use itertools.combinations
and all
.
from itertools import combinations
if all(x != y for x, y in combinations([a,b,c], 2)):
# All values are unique
8
worth noting that this is quadratic in the number of elements, while the hashable solution (if available) is linear
– Ant
Jun 23 at 18:38
4
Downvote from me for an unnecessarily slow (quadratic as opposed to linear, as noted by @Ant) solution. It'll be fine for single-digit-sized collections, but not much more. The set-based solution is faster, shorter, and requires no libraries. On the upside, this works for unhashable items, and might consume less memory (depending on the implementation ofcombinations
).
– Aasmund Eldhuset
Jun 23 at 19:30
1
@AasmundEldhuset "It'll be fine for single-digit-sized collections, but not much more" is an exaggeration. If you're running it in a tight loop, that could be true, but if you're not it'll be fine for at least several hundred elements; my computer, at least, can run this on 1000 unique elements in about 50ms, more than fast enough if it isn't being executed repeatedly or in time-critical sections of code.
– insert_name_here
Jun 24 at 5:48
1
If you do have a large set of items AND the likeliness of doubles is high, you're better off not using all. Just have the for loop and break out as soon as you find a double.
– Falc
Jun 24 at 7:41
2
Doesn'tall
also short-circuit?
– Mees de Vries
Jun 24 at 9:18
|
show 1 more comment
Assuming hashing is not an option, use itertools.combinations
and all
.
from itertools import combinations
if all(x != y for x, y in combinations([a,b,c], 2)):
# All values are unique
Assuming hashing is not an option, use itertools.combinations
and all
.
from itertools import combinations
if all(x != y for x, y in combinations([a,b,c], 2)):
# All values are unique
edited Jun 25 at 21:46
answered Jun 22 at 15:26
chepnerchepner
275k40 gold badges267 silver badges364 bronze badges
275k40 gold badges267 silver badges364 bronze badges
8
worth noting that this is quadratic in the number of elements, while the hashable solution (if available) is linear
– Ant
Jun 23 at 18:38
4
Downvote from me for an unnecessarily slow (quadratic as opposed to linear, as noted by @Ant) solution. It'll be fine for single-digit-sized collections, but not much more. The set-based solution is faster, shorter, and requires no libraries. On the upside, this works for unhashable items, and might consume less memory (depending on the implementation ofcombinations
).
– Aasmund Eldhuset
Jun 23 at 19:30
1
@AasmundEldhuset "It'll be fine for single-digit-sized collections, but not much more" is an exaggeration. If you're running it in a tight loop, that could be true, but if you're not it'll be fine for at least several hundred elements; my computer, at least, can run this on 1000 unique elements in about 50ms, more than fast enough if it isn't being executed repeatedly or in time-critical sections of code.
– insert_name_here
Jun 24 at 5:48
1
If you do have a large set of items AND the likeliness of doubles is high, you're better off not using all. Just have the for loop and break out as soon as you find a double.
– Falc
Jun 24 at 7:41
2
Doesn'tall
also short-circuit?
– Mees de Vries
Jun 24 at 9:18
|
show 1 more comment
8
worth noting that this is quadratic in the number of elements, while the hashable solution (if available) is linear
– Ant
Jun 23 at 18:38
4
Downvote from me for an unnecessarily slow (quadratic as opposed to linear, as noted by @Ant) solution. It'll be fine for single-digit-sized collections, but not much more. The set-based solution is faster, shorter, and requires no libraries. On the upside, this works for unhashable items, and might consume less memory (depending on the implementation ofcombinations
).
– Aasmund Eldhuset
Jun 23 at 19:30
1
@AasmundEldhuset "It'll be fine for single-digit-sized collections, but not much more" is an exaggeration. If you're running it in a tight loop, that could be true, but if you're not it'll be fine for at least several hundred elements; my computer, at least, can run this on 1000 unique elements in about 50ms, more than fast enough if it isn't being executed repeatedly or in time-critical sections of code.
– insert_name_here
Jun 24 at 5:48
1
If you do have a large set of items AND the likeliness of doubles is high, you're better off not using all. Just have the for loop and break out as soon as you find a double.
– Falc
Jun 24 at 7:41
2
Doesn'tall
also short-circuit?
– Mees de Vries
Jun 24 at 9:18
8
8
worth noting that this is quadratic in the number of elements, while the hashable solution (if available) is linear
– Ant
Jun 23 at 18:38
worth noting that this is quadratic in the number of elements, while the hashable solution (if available) is linear
– Ant
Jun 23 at 18:38
4
4
Downvote from me for an unnecessarily slow (quadratic as opposed to linear, as noted by @Ant) solution. It'll be fine for single-digit-sized collections, but not much more. The set-based solution is faster, shorter, and requires no libraries. On the upside, this works for unhashable items, and might consume less memory (depending on the implementation of
combinations
).– Aasmund Eldhuset
Jun 23 at 19:30
Downvote from me for an unnecessarily slow (quadratic as opposed to linear, as noted by @Ant) solution. It'll be fine for single-digit-sized collections, but not much more. The set-based solution is faster, shorter, and requires no libraries. On the upside, this works for unhashable items, and might consume less memory (depending on the implementation of
combinations
).– Aasmund Eldhuset
Jun 23 at 19:30
1
1
@AasmundEldhuset "It'll be fine for single-digit-sized collections, but not much more" is an exaggeration. If you're running it in a tight loop, that could be true, but if you're not it'll be fine for at least several hundred elements; my computer, at least, can run this on 1000 unique elements in about 50ms, more than fast enough if it isn't being executed repeatedly or in time-critical sections of code.
– insert_name_here
Jun 24 at 5:48
@AasmundEldhuset "It'll be fine for single-digit-sized collections, but not much more" is an exaggeration. If you're running it in a tight loop, that could be true, but if you're not it'll be fine for at least several hundred elements; my computer, at least, can run this on 1000 unique elements in about 50ms, more than fast enough if it isn't being executed repeatedly or in time-critical sections of code.
– insert_name_here
Jun 24 at 5:48
1
1
If you do have a large set of items AND the likeliness of doubles is high, you're better off not using all. Just have the for loop and break out as soon as you find a double.
– Falc
Jun 24 at 7:41
If you do have a large set of items AND the likeliness of doubles is high, you're better off not using all. Just have the for loop and break out as soon as you find a double.
– Falc
Jun 24 at 7:41
2
2
Doesn't
all
also short-circuit?– Mees de Vries
Jun 24 at 9:18
Doesn't
all
also short-circuit?– Mees de Vries
Jun 24 at 9:18
|
show 1 more comment
It depends a bit on the kind of values that you have.
If they are well-behaved and hashable then you can (as others already pointed out) simply use a set
to find out how many unique values you have and if that doesn't equal the number of total values you have at least two values that are equal.
def all_distinct(*values):
return len(set(values)) == len(values)
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
Hashable values and lazy
In case you really have lots of values and want to abort as soon as one match is found you could also lazily create the set. It's more complicated and probably slower if all values are distinct but it provides short-circuiting in case a duplicate is found:
def all_distinct(*values):
seen = set()
seen_add = seen.add
last_count = 0
for item in values:
seen_add(item)
new_count = len(seen)
if new_count == last_count:
return False
last_count = new_count
return True
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
However if the values are not hashable this will not work because set
requires hashable values.
Unhashable values
In case you don't have hashable values you could use a plain list to store the already processed values and just check if each new item is already in the list:
def all_distinct(*values):
seen = []
for item in values:
if item in seen:
return False
seen.append(item)
return True
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
all_distinct([1, 2], [2, 3], [3, 4]) # True
all_distinct([1, 2], [2, 3], [1, 2]) # False
This will be slower because checking if a value is in a list requires to compare it to each item in the list.
A (3rd-party) library solution
In case you don't mind an additional dependency you could also use one of my libraries (available on PyPi and conda-forge) for this task iteration_utilities.all_distinct
. This function can handle both hashable and unhashable values (and a mix of these):
from iteration_utilities import all_distinct
all_distinct([1, 2, 3]) # True
all_distinct([1, 2, 2]) # False
all_distinct([[1, 2], [2, 3], [3, 4]]) # True
all_distinct([[1, 2], [2, 3], [1, 2]]) # False
General comments
Note that all of the above mentioned approaches rely on the fact that equality means "not not-equal" which is the case for (almost) all built-in types but doesn't necessarily be the case!
However I want to point out chepners answers which doesn't require hashability of the values and doesn't rely on "equality means not not-equal" by explicitly checking for !=
. It's also short-circuiting so it behaves like your original and
approach.
Performance
To get a rough idea about the performance I'm using another of my libraries (simple_benchmark
)
I used distinct hashable inputs (left) and unhashable inputs (right). For hashable inputs the set-approaches performed best, while for unhashable inputs the list-approaches performed better. The combinations
-based approach seemed slowest in both cases:
I also tested the performance in case there are duplicates, for convenience I regarded the case when the first two elements were equal (otherwise the setup was identical to the previous case):
from iteration_utilities import all_distinct
from itertools import combinations
from simple_benchmark import BenchmarkBuilder
# First benchmark
b1 = BenchmarkBuilder()
@b1.add_function()
def all_distinct_set(values):
return len(set(values)) == len(values)
@b1.add_function()
def all_distinct_set_sc(values):
seen = set()
seen_add = seen.add
last_count = 0
for item in values:
seen_add(item)
new_count = len(seen)
if new_count == last_count:
return False
last_count = new_count
return True
@b1.add_function()
def all_distinct_list(values):
seen = []
for item in values:
if item in seen:
return False
seen.append(item)
return True
b1.add_function(alias='all_distinct_iu')(all_distinct)
@b1.add_function()
def all_distinct_combinations(values):
return all(x != y for x, y in combinations(values, 2))
@b1.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, range(size)
r1 = b1.run()
r1.plot()
# Second benchmark
b2 = BenchmarkBuilder()
b2.add_function(alias='all_distinct_iu')(all_distinct)
b2.add_functions([all_distinct_combinations, all_distinct_list])
@b2.add_arguments('number of unhashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [[i] for i in range(size)]
r2 = b2.run()
r2.plot()
# Third benchmark
b3 = BenchmarkBuilder()
b3.add_function(alias='all_distinct_iu')(all_distinct)
b3.add_functions([all_distinct_set, all_distinct_set_sc, all_distinct_combinations, all_distinct_list])
@b3.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [0, *range(size)]
r3 = b3.run()
r3.plot()
# Fourth benchmark
b4 = BenchmarkBuilder()
b4.add_function(alias='all_distinct_iu')(all_distinct)
b4.add_functions([all_distinct_combinations, all_distinct_list])
@b4.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [[0], *[[i] for i in range(size)]]
r4 = b4.run()
r4.plot()
Is comparing the length of the set to check whether theadd
ed element was new really the best (fastest?) way? It seems pretty cumbersome.
– Bergi
Jun 24 at 10:52
1
@Bergi Yes t's cumbersome. But checking forin
and doing theadd
invokes the hashing and (internal) find-slot-for-hash operation twice. Depending on the value to add and the state of the set it can be significantly faster to justadd
it and check the length. Maybe I'm missing something obvious but at least in my benchmarks theset.add
+len(set)
approach was faster thanin set
andset.add
.
– MSeifert
Jun 24 at 10:55
add a comment |
It depends a bit on the kind of values that you have.
If they are well-behaved and hashable then you can (as others already pointed out) simply use a set
to find out how many unique values you have and if that doesn't equal the number of total values you have at least two values that are equal.
def all_distinct(*values):
return len(set(values)) == len(values)
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
Hashable values and lazy
In case you really have lots of values and want to abort as soon as one match is found you could also lazily create the set. It's more complicated and probably slower if all values are distinct but it provides short-circuiting in case a duplicate is found:
def all_distinct(*values):
seen = set()
seen_add = seen.add
last_count = 0
for item in values:
seen_add(item)
new_count = len(seen)
if new_count == last_count:
return False
last_count = new_count
return True
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
However if the values are not hashable this will not work because set
requires hashable values.
Unhashable values
In case you don't have hashable values you could use a plain list to store the already processed values and just check if each new item is already in the list:
def all_distinct(*values):
seen = []
for item in values:
if item in seen:
return False
seen.append(item)
return True
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
all_distinct([1, 2], [2, 3], [3, 4]) # True
all_distinct([1, 2], [2, 3], [1, 2]) # False
This will be slower because checking if a value is in a list requires to compare it to each item in the list.
A (3rd-party) library solution
In case you don't mind an additional dependency you could also use one of my libraries (available on PyPi and conda-forge) for this task iteration_utilities.all_distinct
. This function can handle both hashable and unhashable values (and a mix of these):
from iteration_utilities import all_distinct
all_distinct([1, 2, 3]) # True
all_distinct([1, 2, 2]) # False
all_distinct([[1, 2], [2, 3], [3, 4]]) # True
all_distinct([[1, 2], [2, 3], [1, 2]]) # False
General comments
Note that all of the above mentioned approaches rely on the fact that equality means "not not-equal" which is the case for (almost) all built-in types but doesn't necessarily be the case!
However I want to point out chepners answers which doesn't require hashability of the values and doesn't rely on "equality means not not-equal" by explicitly checking for !=
. It's also short-circuiting so it behaves like your original and
approach.
Performance
To get a rough idea about the performance I'm using another of my libraries (simple_benchmark
)
I used distinct hashable inputs (left) and unhashable inputs (right). For hashable inputs the set-approaches performed best, while for unhashable inputs the list-approaches performed better. The combinations
-based approach seemed slowest in both cases:
I also tested the performance in case there are duplicates, for convenience I regarded the case when the first two elements were equal (otherwise the setup was identical to the previous case):
from iteration_utilities import all_distinct
from itertools import combinations
from simple_benchmark import BenchmarkBuilder
# First benchmark
b1 = BenchmarkBuilder()
@b1.add_function()
def all_distinct_set(values):
return len(set(values)) == len(values)
@b1.add_function()
def all_distinct_set_sc(values):
seen = set()
seen_add = seen.add
last_count = 0
for item in values:
seen_add(item)
new_count = len(seen)
if new_count == last_count:
return False
last_count = new_count
return True
@b1.add_function()
def all_distinct_list(values):
seen = []
for item in values:
if item in seen:
return False
seen.append(item)
return True
b1.add_function(alias='all_distinct_iu')(all_distinct)
@b1.add_function()
def all_distinct_combinations(values):
return all(x != y for x, y in combinations(values, 2))
@b1.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, range(size)
r1 = b1.run()
r1.plot()
# Second benchmark
b2 = BenchmarkBuilder()
b2.add_function(alias='all_distinct_iu')(all_distinct)
b2.add_functions([all_distinct_combinations, all_distinct_list])
@b2.add_arguments('number of unhashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [[i] for i in range(size)]
r2 = b2.run()
r2.plot()
# Third benchmark
b3 = BenchmarkBuilder()
b3.add_function(alias='all_distinct_iu')(all_distinct)
b3.add_functions([all_distinct_set, all_distinct_set_sc, all_distinct_combinations, all_distinct_list])
@b3.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [0, *range(size)]
r3 = b3.run()
r3.plot()
# Fourth benchmark
b4 = BenchmarkBuilder()
b4.add_function(alias='all_distinct_iu')(all_distinct)
b4.add_functions([all_distinct_combinations, all_distinct_list])
@b4.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [[0], *[[i] for i in range(size)]]
r4 = b4.run()
r4.plot()
Is comparing the length of the set to check whether theadd
ed element was new really the best (fastest?) way? It seems pretty cumbersome.
– Bergi
Jun 24 at 10:52
1
@Bergi Yes t's cumbersome. But checking forin
and doing theadd
invokes the hashing and (internal) find-slot-for-hash operation twice. Depending on the value to add and the state of the set it can be significantly faster to justadd
it and check the length. Maybe I'm missing something obvious but at least in my benchmarks theset.add
+len(set)
approach was faster thanin set
andset.add
.
– MSeifert
Jun 24 at 10:55
add a comment |
It depends a bit on the kind of values that you have.
If they are well-behaved and hashable then you can (as others already pointed out) simply use a set
to find out how many unique values you have and if that doesn't equal the number of total values you have at least two values that are equal.
def all_distinct(*values):
return len(set(values)) == len(values)
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
Hashable values and lazy
In case you really have lots of values and want to abort as soon as one match is found you could also lazily create the set. It's more complicated and probably slower if all values are distinct but it provides short-circuiting in case a duplicate is found:
def all_distinct(*values):
seen = set()
seen_add = seen.add
last_count = 0
for item in values:
seen_add(item)
new_count = len(seen)
if new_count == last_count:
return False
last_count = new_count
return True
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
However if the values are not hashable this will not work because set
requires hashable values.
Unhashable values
In case you don't have hashable values you could use a plain list to store the already processed values and just check if each new item is already in the list:
def all_distinct(*values):
seen = []
for item in values:
if item in seen:
return False
seen.append(item)
return True
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
all_distinct([1, 2], [2, 3], [3, 4]) # True
all_distinct([1, 2], [2, 3], [1, 2]) # False
This will be slower because checking if a value is in a list requires to compare it to each item in the list.
A (3rd-party) library solution
In case you don't mind an additional dependency you could also use one of my libraries (available on PyPi and conda-forge) for this task iteration_utilities.all_distinct
. This function can handle both hashable and unhashable values (and a mix of these):
from iteration_utilities import all_distinct
all_distinct([1, 2, 3]) # True
all_distinct([1, 2, 2]) # False
all_distinct([[1, 2], [2, 3], [3, 4]]) # True
all_distinct([[1, 2], [2, 3], [1, 2]]) # False
General comments
Note that all of the above mentioned approaches rely on the fact that equality means "not not-equal" which is the case for (almost) all built-in types but doesn't necessarily be the case!
However I want to point out chepners answers which doesn't require hashability of the values and doesn't rely on "equality means not not-equal" by explicitly checking for !=
. It's also short-circuiting so it behaves like your original and
approach.
Performance
To get a rough idea about the performance I'm using another of my libraries (simple_benchmark
)
I used distinct hashable inputs (left) and unhashable inputs (right). For hashable inputs the set-approaches performed best, while for unhashable inputs the list-approaches performed better. The combinations
-based approach seemed slowest in both cases:
I also tested the performance in case there are duplicates, for convenience I regarded the case when the first two elements were equal (otherwise the setup was identical to the previous case):
from iteration_utilities import all_distinct
from itertools import combinations
from simple_benchmark import BenchmarkBuilder
# First benchmark
b1 = BenchmarkBuilder()
@b1.add_function()
def all_distinct_set(values):
return len(set(values)) == len(values)
@b1.add_function()
def all_distinct_set_sc(values):
seen = set()
seen_add = seen.add
last_count = 0
for item in values:
seen_add(item)
new_count = len(seen)
if new_count == last_count:
return False
last_count = new_count
return True
@b1.add_function()
def all_distinct_list(values):
seen = []
for item in values:
if item in seen:
return False
seen.append(item)
return True
b1.add_function(alias='all_distinct_iu')(all_distinct)
@b1.add_function()
def all_distinct_combinations(values):
return all(x != y for x, y in combinations(values, 2))
@b1.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, range(size)
r1 = b1.run()
r1.plot()
# Second benchmark
b2 = BenchmarkBuilder()
b2.add_function(alias='all_distinct_iu')(all_distinct)
b2.add_functions([all_distinct_combinations, all_distinct_list])
@b2.add_arguments('number of unhashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [[i] for i in range(size)]
r2 = b2.run()
r2.plot()
# Third benchmark
b3 = BenchmarkBuilder()
b3.add_function(alias='all_distinct_iu')(all_distinct)
b3.add_functions([all_distinct_set, all_distinct_set_sc, all_distinct_combinations, all_distinct_list])
@b3.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [0, *range(size)]
r3 = b3.run()
r3.plot()
# Fourth benchmark
b4 = BenchmarkBuilder()
b4.add_function(alias='all_distinct_iu')(all_distinct)
b4.add_functions([all_distinct_combinations, all_distinct_list])
@b4.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [[0], *[[i] for i in range(size)]]
r4 = b4.run()
r4.plot()
It depends a bit on the kind of values that you have.
If they are well-behaved and hashable then you can (as others already pointed out) simply use a set
to find out how many unique values you have and if that doesn't equal the number of total values you have at least two values that are equal.
def all_distinct(*values):
return len(set(values)) == len(values)
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
Hashable values and lazy
In case you really have lots of values and want to abort as soon as one match is found you could also lazily create the set. It's more complicated and probably slower if all values are distinct but it provides short-circuiting in case a duplicate is found:
def all_distinct(*values):
seen = set()
seen_add = seen.add
last_count = 0
for item in values:
seen_add(item)
new_count = len(seen)
if new_count == last_count:
return False
last_count = new_count
return True
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
However if the values are not hashable this will not work because set
requires hashable values.
Unhashable values
In case you don't have hashable values you could use a plain list to store the already processed values and just check if each new item is already in the list:
def all_distinct(*values):
seen = []
for item in values:
if item in seen:
return False
seen.append(item)
return True
all_distinct(1, 2, 3) # True
all_distinct(1, 2, 2) # False
all_distinct([1, 2], [2, 3], [3, 4]) # True
all_distinct([1, 2], [2, 3], [1, 2]) # False
This will be slower because checking if a value is in a list requires to compare it to each item in the list.
A (3rd-party) library solution
In case you don't mind an additional dependency you could also use one of my libraries (available on PyPi and conda-forge) for this task iteration_utilities.all_distinct
. This function can handle both hashable and unhashable values (and a mix of these):
from iteration_utilities import all_distinct
all_distinct([1, 2, 3]) # True
all_distinct([1, 2, 2]) # False
all_distinct([[1, 2], [2, 3], [3, 4]]) # True
all_distinct([[1, 2], [2, 3], [1, 2]]) # False
General comments
Note that all of the above mentioned approaches rely on the fact that equality means "not not-equal" which is the case for (almost) all built-in types but doesn't necessarily be the case!
However I want to point out chepners answers which doesn't require hashability of the values and doesn't rely on "equality means not not-equal" by explicitly checking for !=
. It's also short-circuiting so it behaves like your original and
approach.
Performance
To get a rough idea about the performance I'm using another of my libraries (simple_benchmark
)
I used distinct hashable inputs (left) and unhashable inputs (right). For hashable inputs the set-approaches performed best, while for unhashable inputs the list-approaches performed better. The combinations
-based approach seemed slowest in both cases:
I also tested the performance in case there are duplicates, for convenience I regarded the case when the first two elements were equal (otherwise the setup was identical to the previous case):
from iteration_utilities import all_distinct
from itertools import combinations
from simple_benchmark import BenchmarkBuilder
# First benchmark
b1 = BenchmarkBuilder()
@b1.add_function()
def all_distinct_set(values):
return len(set(values)) == len(values)
@b1.add_function()
def all_distinct_set_sc(values):
seen = set()
seen_add = seen.add
last_count = 0
for item in values:
seen_add(item)
new_count = len(seen)
if new_count == last_count:
return False
last_count = new_count
return True
@b1.add_function()
def all_distinct_list(values):
seen = []
for item in values:
if item in seen:
return False
seen.append(item)
return True
b1.add_function(alias='all_distinct_iu')(all_distinct)
@b1.add_function()
def all_distinct_combinations(values):
return all(x != y for x, y in combinations(values, 2))
@b1.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, range(size)
r1 = b1.run()
r1.plot()
# Second benchmark
b2 = BenchmarkBuilder()
b2.add_function(alias='all_distinct_iu')(all_distinct)
b2.add_functions([all_distinct_combinations, all_distinct_list])
@b2.add_arguments('number of unhashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [[i] for i in range(size)]
r2 = b2.run()
r2.plot()
# Third benchmark
b3 = BenchmarkBuilder()
b3.add_function(alias='all_distinct_iu')(all_distinct)
b3.add_functions([all_distinct_set, all_distinct_set_sc, all_distinct_combinations, all_distinct_list])
@b3.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [0, *range(size)]
r3 = b3.run()
r3.plot()
# Fourth benchmark
b4 = BenchmarkBuilder()
b4.add_function(alias='all_distinct_iu')(all_distinct)
b4.add_functions([all_distinct_combinations, all_distinct_list])
@b4.add_arguments('number of hashable inputs')
def argument_provider():
for exp in range(1, 12):
size = 2**exp
yield size, [[0], *[[i] for i in range(size)]]
r4 = b4.run()
r4.plot()
edited Jun 23 at 13:24
answered Jun 23 at 10:53
MSeifertMSeifert
83.7k20 gold badges174 silver badges202 bronze badges
83.7k20 gold badges174 silver badges202 bronze badges
Is comparing the length of the set to check whether theadd
ed element was new really the best (fastest?) way? It seems pretty cumbersome.
– Bergi
Jun 24 at 10:52
1
@Bergi Yes t's cumbersome. But checking forin
and doing theadd
invokes the hashing and (internal) find-slot-for-hash operation twice. Depending on the value to add and the state of the set it can be significantly faster to justadd
it and check the length. Maybe I'm missing something obvious but at least in my benchmarks theset.add
+len(set)
approach was faster thanin set
andset.add
.
– MSeifert
Jun 24 at 10:55
add a comment |
Is comparing the length of the set to check whether theadd
ed element was new really the best (fastest?) way? It seems pretty cumbersome.
– Bergi
Jun 24 at 10:52
1
@Bergi Yes t's cumbersome. But checking forin
and doing theadd
invokes the hashing and (internal) find-slot-for-hash operation twice. Depending on the value to add and the state of the set it can be significantly faster to justadd
it and check the length. Maybe I'm missing something obvious but at least in my benchmarks theset.add
+len(set)
approach was faster thanin set
andset.add
.
– MSeifert
Jun 24 at 10:55
Is comparing the length of the set to check whether the
add
ed element was new really the best (fastest?) way? It seems pretty cumbersome.– Bergi
Jun 24 at 10:52
Is comparing the length of the set to check whether the
add
ed element was new really the best (fastest?) way? It seems pretty cumbersome.– Bergi
Jun 24 at 10:52
1
1
@Bergi Yes t's cumbersome. But checking for
in
and doing the add
invokes the hashing and (internal) find-slot-for-hash operation twice. Depending on the value to add and the state of the set it can be significantly faster to just add
it and check the length. Maybe I'm missing something obvious but at least in my benchmarks the set.add
+ len(set)
approach was faster than in set
and set.add
.– MSeifert
Jun 24 at 10:55
@Bergi Yes t's cumbersome. But checking for
in
and doing the add
invokes the hashing and (internal) find-slot-for-hash operation twice. Depending on the value to add and the state of the set it can be significantly faster to just add
it and check the length. Maybe I'm missing something obvious but at least in my benchmarks the set.add
+ len(set)
approach was faster than in set
and set.add
.– MSeifert
Jun 24 at 10:55
add a comment |
The slightly neater way is to stick all the variables in a list, then create a new set from the list. If the list and the set aren't the same length, some of the variables were equal, since sets can't contain duplicates:
vars = [a, b, c]
no_dupes = set(vars)
if len(vars) != len(no_dupes):
# Some of them had the same value
This assumes the values are hashable; which they are in your example.
2
This assumes the values are hashable.
– chepner
Jun 22 at 15:23
@chepner Yep. They're using numbers in their question. Added a note at the bottom though.
– Carcigenicate
Jun 22 at 15:25
add a comment |
The slightly neater way is to stick all the variables in a list, then create a new set from the list. If the list and the set aren't the same length, some of the variables were equal, since sets can't contain duplicates:
vars = [a, b, c]
no_dupes = set(vars)
if len(vars) != len(no_dupes):
# Some of them had the same value
This assumes the values are hashable; which they are in your example.
2
This assumes the values are hashable.
– chepner
Jun 22 at 15:23
@chepner Yep. They're using numbers in their question. Added a note at the bottom though.
– Carcigenicate
Jun 22 at 15:25
add a comment |
The slightly neater way is to stick all the variables in a list, then create a new set from the list. If the list and the set aren't the same length, some of the variables were equal, since sets can't contain duplicates:
vars = [a, b, c]
no_dupes = set(vars)
if len(vars) != len(no_dupes):
# Some of them had the same value
This assumes the values are hashable; which they are in your example.
The slightly neater way is to stick all the variables in a list, then create a new set from the list. If the list and the set aren't the same length, some of the variables were equal, since sets can't contain duplicates:
vars = [a, b, c]
no_dupes = set(vars)
if len(vars) != len(no_dupes):
# Some of them had the same value
This assumes the values are hashable; which they are in your example.
edited Jun 22 at 15:24
answered Jun 22 at 15:01
CarcigenicateCarcigenicate
20k5 gold badges32 silver badges62 bronze badges
20k5 gold badges32 silver badges62 bronze badges
2
This assumes the values are hashable.
– chepner
Jun 22 at 15:23
@chepner Yep. They're using numbers in their question. Added a note at the bottom though.
– Carcigenicate
Jun 22 at 15:25
add a comment |
2
This assumes the values are hashable.
– chepner
Jun 22 at 15:23
@chepner Yep. They're using numbers in their question. Added a note at the bottom though.
– Carcigenicate
Jun 22 at 15:25
2
2
This assumes the values are hashable.
– chepner
Jun 22 at 15:23
This assumes the values are hashable.
– chepner
Jun 22 at 15:23
@chepner Yep. They're using numbers in their question. Added a note at the bottom though.
– Carcigenicate
Jun 22 at 15:25
@chepner Yep. They're using numbers in their question. Added a note at the bottom though.
– Carcigenicate
Jun 22 at 15:25
add a comment |
You can use all
with list.count
as well, it is reasonable, may not be the best, but worth to answer:
>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
True
>>> a, b, c = 1, 2, 1
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
False
>>>
Also this solution works with unhashable objects in the list.
A way that only works with hashable objects in the list:
>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> len(*l) == len(l)
True
>>>
Actually:
>>> from timeit import timeit
>>> timeit(lambda: *l, number=1000000)
0.5163292075532642
>>> timeit(lambda: set(l), number=1000000)
0.7005311807841572
>>>
*l
is faster than set(l)
, more info here.
add a comment |
You can use all
with list.count
as well, it is reasonable, may not be the best, but worth to answer:
>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
True
>>> a, b, c = 1, 2, 1
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
False
>>>
Also this solution works with unhashable objects in the list.
A way that only works with hashable objects in the list:
>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> len(*l) == len(l)
True
>>>
Actually:
>>> from timeit import timeit
>>> timeit(lambda: *l, number=1000000)
0.5163292075532642
>>> timeit(lambda: set(l), number=1000000)
0.7005311807841572
>>>
*l
is faster than set(l)
, more info here.
add a comment |
You can use all
with list.count
as well, it is reasonable, may not be the best, but worth to answer:
>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
True
>>> a, b, c = 1, 2, 1
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
False
>>>
Also this solution works with unhashable objects in the list.
A way that only works with hashable objects in the list:
>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> len(*l) == len(l)
True
>>>
Actually:
>>> from timeit import timeit
>>> timeit(lambda: *l, number=1000000)
0.5163292075532642
>>> timeit(lambda: set(l), number=1000000)
0.7005311807841572
>>>
*l
is faster than set(l)
, more info here.
You can use all
with list.count
as well, it is reasonable, may not be the best, but worth to answer:
>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
True
>>> a, b, c = 1, 2, 1
>>> l = [a, b, c]
>>> all(l.count(i) < 2 for i in l)
False
>>>
Also this solution works with unhashable objects in the list.
A way that only works with hashable objects in the list:
>>> a, b, c = 1, 2, 3
>>> l = [a, b, c]
>>> len(*l) == len(l)
True
>>>
Actually:
>>> from timeit import timeit
>>> timeit(lambda: *l, number=1000000)
0.5163292075532642
>>> timeit(lambda: set(l), number=1000000)
0.7005311807841572
>>>
*l
is faster than set(l)
, more info here.
edited Jun 24 at 6:07
answered Jun 24 at 6:01
U9-ForwardU9-Forward
23.1k5 gold badges19 silver badges47 bronze badges
23.1k5 gold badges19 silver badges47 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56716492%2fis-there-a-short-way-to-check-uniqueness-of-values-without-using-if-and-multip%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I can't think of any way to do this easily in a 1-liner. But if this is something I needed to do often, I'd write a short function called something like
all_different
which takes an arbitrary number of arguments and tells you if they're all different or not. Then you can just doif all_different(a, b, c):
– Robin Zigmond
Jun 22 at 14:59
6
Try using length of set. That would make it fairly easy.
– susenj
Jun 22 at 15:00
3
If you make a list
[a, b, c]
, then this is a duplicate of How do I check if there are duplicates in a flat list?– wjandrea
Jun 23 at 1:31
1
What else do you know about the values? I don't believe it's even mathematically possible if equality testing is the only operation allowed.
– Mehrdad
Jun 23 at 3:21
Beware trading understandable for clever — creating a set just to throw it away for the purpose of (IMO) abusing its uniqueness properties means when you go look at it in a few months you’ll wonder “why on earth did I make this set?”
– thehole
Jun 25 at 16:45