Finding dictionary keys whose values are duplicatesLoop an array of dictionary with keys containg sets; comparing each key, value pair; and combining dictionaries
Was Self-modifying-code possible just using BASIC?
Housemarks (superimposed & combined letters, heraldry)
Does putting salt first make it easier for attacker to bruteforce the hash?
Trying to get (more) accurate readings from thermistor (electronics, math, and code inside)
Assigning function to function pointer, const argument correctness?
If absolute velocity does not exist, how can we say a rocket accelerates in empty space?
Why did the World Bank set the global poverty line at $1.90?
What is the Leave No Trace way to dispose of coffee grounds?
Why do the Tie-fighter pilot helmets have similar ridges as the rebels?
Is Dumbledore a human lie detector?
What plausible reason could I give for my FTL drive only working in space
How was the airlock installed on the Space Shuttle mid deck?
Extracting data from Plot
How to get depth and other lengths of a font?
How to befriend someone who doesn't like to talk?
Should I refuse to be named as co-author of a low quality paper?
Proving that a Russian cryptographic standard is too structured
How can one's career as a reviewer be ended?
Why isn't Bash trap working if output is redirected to stdout?
What is the reason for setting flaps 1 on the ground at high temperatures?
ASCII Meme Arrow Generator
Tikz-cd diagram arrow passing under a node - not crossing it
Augment Export function to support custom number formatting
Why did Intel abandon unified CPU cache?
Finding dictionary keys whose values are duplicates
Loop an array of dictionary with keys containg sets; comparing each key, value pair; and combining dictionaries
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I currently have a dictionary (Duplicate_combos
) that has a unique identifying number for the key value and the value is a list with two elements, a company code and then either a yes or no (both of these values are currently stored as strings). I am essentially just trying to see where the company code is equal and the second term is no for both.
So if this was my dictionary:
1234: ['123' , 'No'] , 1235:['123', 'No'], 1236: ['123','Yes'], 1237: [124,'No']
I would only want to return 1234 and 1235. The code below is what I currently have and I really need to optimize it because while it does work when I tested it on a small data set, I will need to use it on a much larger one (43,000 lines) and in early testing, it is taking 45+ minutes with seemingly no sign of ending soon.
def open_file():
in_file = open("./Data.csv","r")
blank = in_file.readline()
titles = in_file.readline()
titles = titles.strip()
titles = titles.split(',')
cost_center = [] # 0
cost_center_name = []# 1
management_site = [] # 15
sub_function = [] #19
LER = [] #41
Company_name = [] #3
Business_group = [] #7
Value_center = [] #9
Performance_center = [] #10
Profit_center = [] #11
total_lines =
for line in in_file:
line = line.strip()
line = line.split(',')
cost_center.append(line[0])
cost_center_name.append(line[1])
management_site.append(line[15])
sub_function.append(line[19])
LER.append(line[41])
Company_name.append(line[3])
Business_group.append(line[7])
Value_center.append(line[9])
Performance_center.append(line[10])
Profit_center.append(line[11])
# create a dictionary of all the lines with the key being the unique cost center number (cost_center list)
total_lines[line[0]] = line[1:]
return(cost_center, cost_center_name, management_site, sub_function, LER, Company_name, Business_group, total_lines, titles, Value_center, Performance_center, Profit_center)
def find_duplicates(Duplicate_combos):
Real_duplicates = []
archive_duplicates = []
# loop through the dictionary of duplicate combos by the keys
for key in Duplicate_combos:
code = Duplicate_combos[key][0]
for key2 in Duplicate_combos:
# if the two keys are equal to each other, it means you are comparing the key to itself, which we don't want to do so we continue
if key == key2:
continue
# if the company codes are the same and they are BOTH NOT going to be consolidated, we have found a real duplicate
elif Duplicate_combos[key2][0] == code and Duplicate_combos[key2][1] == 'No' and Duplicate_combos[key][1] == 'No':
# make sure that we haven't already dealt with this key before
if key not in archive_duplicates:
Real_duplicates.append(key)
archive_duplicates.append(key)
if key2 not in archive_duplicates:
Real_duplicates.append(key2)
archive_duplicates.append(key2)
continue
return(Real_duplicates)
python time-limit-exceeded dictionary
New contributor
$endgroup$
|
show 2 more comments
$begingroup$
I currently have a dictionary (Duplicate_combos
) that has a unique identifying number for the key value and the value is a list with two elements, a company code and then either a yes or no (both of these values are currently stored as strings). I am essentially just trying to see where the company code is equal and the second term is no for both.
So if this was my dictionary:
1234: ['123' , 'No'] , 1235:['123', 'No'], 1236: ['123','Yes'], 1237: [124,'No']
I would only want to return 1234 and 1235. The code below is what I currently have and I really need to optimize it because while it does work when I tested it on a small data set, I will need to use it on a much larger one (43,000 lines) and in early testing, it is taking 45+ minutes with seemingly no sign of ending soon.
def open_file():
in_file = open("./Data.csv","r")
blank = in_file.readline()
titles = in_file.readline()
titles = titles.strip()
titles = titles.split(',')
cost_center = [] # 0
cost_center_name = []# 1
management_site = [] # 15
sub_function = [] #19
LER = [] #41
Company_name = [] #3
Business_group = [] #7
Value_center = [] #9
Performance_center = [] #10
Profit_center = [] #11
total_lines =
for line in in_file:
line = line.strip()
line = line.split(',')
cost_center.append(line[0])
cost_center_name.append(line[1])
management_site.append(line[15])
sub_function.append(line[19])
LER.append(line[41])
Company_name.append(line[3])
Business_group.append(line[7])
Value_center.append(line[9])
Performance_center.append(line[10])
Profit_center.append(line[11])
# create a dictionary of all the lines with the key being the unique cost center number (cost_center list)
total_lines[line[0]] = line[1:]
return(cost_center, cost_center_name, management_site, sub_function, LER, Company_name, Business_group, total_lines, titles, Value_center, Performance_center, Profit_center)
def find_duplicates(Duplicate_combos):
Real_duplicates = []
archive_duplicates = []
# loop through the dictionary of duplicate combos by the keys
for key in Duplicate_combos:
code = Duplicate_combos[key][0]
for key2 in Duplicate_combos:
# if the two keys are equal to each other, it means you are comparing the key to itself, which we don't want to do so we continue
if key == key2:
continue
# if the company codes are the same and they are BOTH NOT going to be consolidated, we have found a real duplicate
elif Duplicate_combos[key2][0] == code and Duplicate_combos[key2][1] == 'No' and Duplicate_combos[key][1] == 'No':
# make sure that we haven't already dealt with this key before
if key not in archive_duplicates:
Real_duplicates.append(key)
archive_duplicates.append(key)
if key2 not in archive_duplicates:
Real_duplicates.append(key2)
archive_duplicates.append(key2)
continue
return(Real_duplicates)
python time-limit-exceeded dictionary
New contributor
$endgroup$
1
$begingroup$
Where does the data forDuplicate_combos
come from? The right performance fix would likely involve putting that data into a more appropriate data structure for this task.
$endgroup$
– 200_success
Jun 3 at 19:57
$begingroup$
The data comes from a csv file that I read in as part of earlier functions. Based on when I have been running it, this function seems to be the one that is taking significantly longer to run
$endgroup$
– Ben Naylor
Jun 3 at 20:00
$begingroup$
In that case, I recommend including the CSV-reading code, as well as an excerpt from the CSV file, so that we can give you the proper advice. Also, please fix your indentation. One easy way to post code is to paste it into the question editor, highlight it, and press Ctrl-K to mark it as a code block.
$endgroup$
– 200_success
Jun 3 at 20:09
$begingroup$
I added the open file function, a lot of the stuff that is returned is used elsewhere so idk if it helps at all. As for the data, I can't share that but from the testing that I did, I know that everything was being read in correctly and all that. At this point, the code that I have works, just REALLY NOT optimally so that's the main thing that I was looking for. I haven't had too much experience with optimization so I was hoping to get some ideas on how exactly to do that
$endgroup$
– Ben Naylor
Jun 3 at 20:17
1
$begingroup$
Interesting! That is a very unconventional way to read a CSV, and now I'm intrigued as to how you make use of those weird lists. You could probably benefit a lot from putting your entire program up for review.
$endgroup$
– 200_success
Jun 3 at 20:20
|
show 2 more comments
$begingroup$
I currently have a dictionary (Duplicate_combos
) that has a unique identifying number for the key value and the value is a list with two elements, a company code and then either a yes or no (both of these values are currently stored as strings). I am essentially just trying to see where the company code is equal and the second term is no for both.
So if this was my dictionary:
1234: ['123' , 'No'] , 1235:['123', 'No'], 1236: ['123','Yes'], 1237: [124,'No']
I would only want to return 1234 and 1235. The code below is what I currently have and I really need to optimize it because while it does work when I tested it on a small data set, I will need to use it on a much larger one (43,000 lines) and in early testing, it is taking 45+ minutes with seemingly no sign of ending soon.
def open_file():
in_file = open("./Data.csv","r")
blank = in_file.readline()
titles = in_file.readline()
titles = titles.strip()
titles = titles.split(',')
cost_center = [] # 0
cost_center_name = []# 1
management_site = [] # 15
sub_function = [] #19
LER = [] #41
Company_name = [] #3
Business_group = [] #7
Value_center = [] #9
Performance_center = [] #10
Profit_center = [] #11
total_lines =
for line in in_file:
line = line.strip()
line = line.split(',')
cost_center.append(line[0])
cost_center_name.append(line[1])
management_site.append(line[15])
sub_function.append(line[19])
LER.append(line[41])
Company_name.append(line[3])
Business_group.append(line[7])
Value_center.append(line[9])
Performance_center.append(line[10])
Profit_center.append(line[11])
# create a dictionary of all the lines with the key being the unique cost center number (cost_center list)
total_lines[line[0]] = line[1:]
return(cost_center, cost_center_name, management_site, sub_function, LER, Company_name, Business_group, total_lines, titles, Value_center, Performance_center, Profit_center)
def find_duplicates(Duplicate_combos):
Real_duplicates = []
archive_duplicates = []
# loop through the dictionary of duplicate combos by the keys
for key in Duplicate_combos:
code = Duplicate_combos[key][0]
for key2 in Duplicate_combos:
# if the two keys are equal to each other, it means you are comparing the key to itself, which we don't want to do so we continue
if key == key2:
continue
# if the company codes are the same and they are BOTH NOT going to be consolidated, we have found a real duplicate
elif Duplicate_combos[key2][0] == code and Duplicate_combos[key2][1] == 'No' and Duplicate_combos[key][1] == 'No':
# make sure that we haven't already dealt with this key before
if key not in archive_duplicates:
Real_duplicates.append(key)
archive_duplicates.append(key)
if key2 not in archive_duplicates:
Real_duplicates.append(key2)
archive_duplicates.append(key2)
continue
return(Real_duplicates)
python time-limit-exceeded dictionary
New contributor
$endgroup$
I currently have a dictionary (Duplicate_combos
) that has a unique identifying number for the key value and the value is a list with two elements, a company code and then either a yes or no (both of these values are currently stored as strings). I am essentially just trying to see where the company code is equal and the second term is no for both.
So if this was my dictionary:
1234: ['123' , 'No'] , 1235:['123', 'No'], 1236: ['123','Yes'], 1237: [124,'No']
I would only want to return 1234 and 1235. The code below is what I currently have and I really need to optimize it because while it does work when I tested it on a small data set, I will need to use it on a much larger one (43,000 lines) and in early testing, it is taking 45+ minutes with seemingly no sign of ending soon.
def open_file():
in_file = open("./Data.csv","r")
blank = in_file.readline()
titles = in_file.readline()
titles = titles.strip()
titles = titles.split(',')
cost_center = [] # 0
cost_center_name = []# 1
management_site = [] # 15
sub_function = [] #19
LER = [] #41
Company_name = [] #3
Business_group = [] #7
Value_center = [] #9
Performance_center = [] #10
Profit_center = [] #11
total_lines =
for line in in_file:
line = line.strip()
line = line.split(',')
cost_center.append(line[0])
cost_center_name.append(line[1])
management_site.append(line[15])
sub_function.append(line[19])
LER.append(line[41])
Company_name.append(line[3])
Business_group.append(line[7])
Value_center.append(line[9])
Performance_center.append(line[10])
Profit_center.append(line[11])
# create a dictionary of all the lines with the key being the unique cost center number (cost_center list)
total_lines[line[0]] = line[1:]
return(cost_center, cost_center_name, management_site, sub_function, LER, Company_name, Business_group, total_lines, titles, Value_center, Performance_center, Profit_center)
def find_duplicates(Duplicate_combos):
Real_duplicates = []
archive_duplicates = []
# loop through the dictionary of duplicate combos by the keys
for key in Duplicate_combos:
code = Duplicate_combos[key][0]
for key2 in Duplicate_combos:
# if the two keys are equal to each other, it means you are comparing the key to itself, which we don't want to do so we continue
if key == key2:
continue
# if the company codes are the same and they are BOTH NOT going to be consolidated, we have found a real duplicate
elif Duplicate_combos[key2][0] == code and Duplicate_combos[key2][1] == 'No' and Duplicate_combos[key][1] == 'No':
# make sure that we haven't already dealt with this key before
if key not in archive_duplicates:
Real_duplicates.append(key)
archive_duplicates.append(key)
if key2 not in archive_duplicates:
Real_duplicates.append(key2)
archive_duplicates.append(key2)
continue
return(Real_duplicates)
python time-limit-exceeded dictionary
python time-limit-exceeded dictionary
New contributor
New contributor
edited Jun 3 at 20:14
Ben Naylor
New contributor
asked Jun 3 at 19:52
Ben NaylorBen Naylor
385
385
New contributor
New contributor
1
$begingroup$
Where does the data forDuplicate_combos
come from? The right performance fix would likely involve putting that data into a more appropriate data structure for this task.
$endgroup$
– 200_success
Jun 3 at 19:57
$begingroup$
The data comes from a csv file that I read in as part of earlier functions. Based on when I have been running it, this function seems to be the one that is taking significantly longer to run
$endgroup$
– Ben Naylor
Jun 3 at 20:00
$begingroup$
In that case, I recommend including the CSV-reading code, as well as an excerpt from the CSV file, so that we can give you the proper advice. Also, please fix your indentation. One easy way to post code is to paste it into the question editor, highlight it, and press Ctrl-K to mark it as a code block.
$endgroup$
– 200_success
Jun 3 at 20:09
$begingroup$
I added the open file function, a lot of the stuff that is returned is used elsewhere so idk if it helps at all. As for the data, I can't share that but from the testing that I did, I know that everything was being read in correctly and all that. At this point, the code that I have works, just REALLY NOT optimally so that's the main thing that I was looking for. I haven't had too much experience with optimization so I was hoping to get some ideas on how exactly to do that
$endgroup$
– Ben Naylor
Jun 3 at 20:17
1
$begingroup$
Interesting! That is a very unconventional way to read a CSV, and now I'm intrigued as to how you make use of those weird lists. You could probably benefit a lot from putting your entire program up for review.
$endgroup$
– 200_success
Jun 3 at 20:20
|
show 2 more comments
1
$begingroup$
Where does the data forDuplicate_combos
come from? The right performance fix would likely involve putting that data into a more appropriate data structure for this task.
$endgroup$
– 200_success
Jun 3 at 19:57
$begingroup$
The data comes from a csv file that I read in as part of earlier functions. Based on when I have been running it, this function seems to be the one that is taking significantly longer to run
$endgroup$
– Ben Naylor
Jun 3 at 20:00
$begingroup$
In that case, I recommend including the CSV-reading code, as well as an excerpt from the CSV file, so that we can give you the proper advice. Also, please fix your indentation. One easy way to post code is to paste it into the question editor, highlight it, and press Ctrl-K to mark it as a code block.
$endgroup$
– 200_success
Jun 3 at 20:09
$begingroup$
I added the open file function, a lot of the stuff that is returned is used elsewhere so idk if it helps at all. As for the data, I can't share that but from the testing that I did, I know that everything was being read in correctly and all that. At this point, the code that I have works, just REALLY NOT optimally so that's the main thing that I was looking for. I haven't had too much experience with optimization so I was hoping to get some ideas on how exactly to do that
$endgroup$
– Ben Naylor
Jun 3 at 20:17
1
$begingroup$
Interesting! That is a very unconventional way to read a CSV, and now I'm intrigued as to how you make use of those weird lists. You could probably benefit a lot from putting your entire program up for review.
$endgroup$
– 200_success
Jun 3 at 20:20
1
1
$begingroup$
Where does the data for
Duplicate_combos
come from? The right performance fix would likely involve putting that data into a more appropriate data structure for this task.$endgroup$
– 200_success
Jun 3 at 19:57
$begingroup$
Where does the data for
Duplicate_combos
come from? The right performance fix would likely involve putting that data into a more appropriate data structure for this task.$endgroup$
– 200_success
Jun 3 at 19:57
$begingroup$
The data comes from a csv file that I read in as part of earlier functions. Based on when I have been running it, this function seems to be the one that is taking significantly longer to run
$endgroup$
– Ben Naylor
Jun 3 at 20:00
$begingroup$
The data comes from a csv file that I read in as part of earlier functions. Based on when I have been running it, this function seems to be the one that is taking significantly longer to run
$endgroup$
– Ben Naylor
Jun 3 at 20:00
$begingroup$
In that case, I recommend including the CSV-reading code, as well as an excerpt from the CSV file, so that we can give you the proper advice. Also, please fix your indentation. One easy way to post code is to paste it into the question editor, highlight it, and press Ctrl-K to mark it as a code block.
$endgroup$
– 200_success
Jun 3 at 20:09
$begingroup$
In that case, I recommend including the CSV-reading code, as well as an excerpt from the CSV file, so that we can give you the proper advice. Also, please fix your indentation. One easy way to post code is to paste it into the question editor, highlight it, and press Ctrl-K to mark it as a code block.
$endgroup$
– 200_success
Jun 3 at 20:09
$begingroup$
I added the open file function, a lot of the stuff that is returned is used elsewhere so idk if it helps at all. As for the data, I can't share that but from the testing that I did, I know that everything was being read in correctly and all that. At this point, the code that I have works, just REALLY NOT optimally so that's the main thing that I was looking for. I haven't had too much experience with optimization so I was hoping to get some ideas on how exactly to do that
$endgroup$
– Ben Naylor
Jun 3 at 20:17
$begingroup$
I added the open file function, a lot of the stuff that is returned is used elsewhere so idk if it helps at all. As for the data, I can't share that but from the testing that I did, I know that everything was being read in correctly and all that. At this point, the code that I have works, just REALLY NOT optimally so that's the main thing that I was looking for. I haven't had too much experience with optimization so I was hoping to get some ideas on how exactly to do that
$endgroup$
– Ben Naylor
Jun 3 at 20:17
1
1
$begingroup$
Interesting! That is a very unconventional way to read a CSV, and now I'm intrigued as to how you make use of those weird lists. You could probably benefit a lot from putting your entire program up for review.
$endgroup$
– 200_success
Jun 3 at 20:20
$begingroup$
Interesting! That is a very unconventional way to read a CSV, and now I'm intrigued as to how you make use of those weird lists. You could probably benefit a lot from putting your entire program up for review.
$endgroup$
– 200_success
Jun 3 at 20:20
|
show 2 more comments
3 Answers
3
active
oldest
votes
$begingroup$
It's easier to read code that tuple unpacks the values in the
for
fromdict.items()
.for key1, (code1, option1) in Duplicate_combos.items():
archive_duplicates
is a duplicate ofReal_duplicates
. There's no need for it.It doesn't seem like the output needs to be ordered, and so you can just make
Real_duplicates
a set. This means it won't have duplicates, and you don't have to loop through it twice each time you want to add a value.This alone speeds up your program from $O(n^3)$ to $O(n^2)$.
Your variable names are quite poor, and don't adhere to PEP8. I have changed them to somewhat generic names, but it'd be better if you replace, say,
items
with what it actually is.
def find_duplicates(items):
duplicates = set()
for key1, (code1, option1) in items.items():
for key2, (code2, option2) in items.items():
if key1 == key2:
continue
elif code1 == code2 and option1 == option2 == 'No':
duplicates.add(key1)
duplicates.add(key2)
return list(duplicates)
You don't need to loop over
Duplicate_combos
twice.To do this you need to make a new dictionary grouping by the code. And only adding to it if the option is
'No'
.After building the new dictionary you can iterate over it's values and return ones where the length of values is greater or equal to two.
def find_duplicates(items):
by_code =
for key, (code, option) in items.items():
if option == 'No':
by_code.setdefault(code, []).append(key)
return [
key
for keys in by_code.values()
if len(keys) >= 2
for key in keys
]
This now runs in $O(n)$ time rather than $O(n^3)$ time.
>>> find_duplicates(
101: ['1', 'No'], 102: ['1', 'No'],
103: ['1','Yes'], 104: ['1', 'No'],
201: ['2', 'No'], 202: ['2', 'No'],
301: ['3', 'No'], 401: ['4', 'No'],
)
[101, 102, 104, 201, 202]
$endgroup$
$begingroup$
so this would output all of the keys that have the duplicates not just one? I was iterating twice in order to compare each element to all the others so I would get all of the keys that share the duplicate values
$endgroup$
– Ben Naylor
Jun 3 at 20:34
$begingroup$
@BenNaylor Yes this would do that. Please see the update with the example showing this.
$endgroup$
– Peilonrayz
Jun 3 at 20:38
$begingroup$
Thank you so much, this really really helps!
$endgroup$
– Ben Naylor
Jun 4 at 12:20
add a comment |
$begingroup$
When reading your data, you open
a file but never .close()
it. You should take the habit to use the with
keyword to avoid this issue.
You should also benefit from the csv
module to read this file as it will remove boilerplate and handle special cases for you:
def open_file(filename='./Data.csv'):
cost_center = [] # 0
cost_center_name = []# 1
management_site = [] # 15
sub_function = [] #19
LER = [] #41
Company_name = [] #3
Business_group = [] #7
Value_center = [] #9
Performance_center = [] #10
Profit_center = [] #11
total_lines =
with open(filename) as in_file:
next(in_file) # skip blank line
reader = csv.reader(in_file, delimiter=',')
for line in reader:
cost_center.append(line[0])
cost_center_name.append(line[1])
management_site.append(line[15])
sub_function.append(line[19])
LER.append(line[41])
Company_name.append(line[3])
Business_group.append(line[7])
Value_center.append(line[9])
Performance_center.append(line[10])
Profit_center.append(line[11])
# create a dictionary of all the lines with the key being the unique cost center number (cost_center list)
total_lines[line[0]] = line[1:]
return cost_center, cost_center_name, management_site, sub_function, LER, Company_name, Business_group, total_lines, titles, Value_center, Performance_center, Profit_center
$endgroup$
$begingroup$
I'd personally use something likecolumns = zip(*reader)
and then define each value once.cost_center = columns[0]
. This would maketotal_lines
a bit more finicky tho.
$endgroup$
– Peilonrayz
Jun 4 at 10:39
$begingroup$
@Peilonrayz When I readLER.append(line[41])
and there is only 10 columns of interest, I’m not sure this is really worth it.
$endgroup$
– Mathias Ettinger
Jun 4 at 12:46
add a comment |
$begingroup$
Doing
def get_dupes(df):
if sum(df.loc[1]=='No')<2:
return None
else:
return list(df.loc[:,df.loc[1]=='No'].columns)
df.groupby(axis=1,by=df.loc[0]).apply(get_dupes)
Got me
0
124 None
123 [1234, 1235]
dtype: object
Your question wasn't quite clear on what you want the output to be if there are multiple company values with duplicate values (e.g. if the input is 1234: ['123' , 'No'] , 1235:['123', 'No'], 1236: ['123','Yes'], 1237: [124,'No'],1238: [124,'No']
do you want [1234, 1235, 1237, 1238]
or [[1234, 1235], [1237, 1238]]
), so you can modify this code accordingly.
$endgroup$
1
$begingroup$
You could just take a look at how the current code behaves to understand what output is expected...
$endgroup$
– Vogel612♦
Jun 4 at 10:05
2
$begingroup$
You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer.
$endgroup$
– Toby Speight
Jun 4 at 10:07
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Ben Naylor is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f221609%2ffinding-dictionary-keys-whose-values-are-duplicates%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
It's easier to read code that tuple unpacks the values in the
for
fromdict.items()
.for key1, (code1, option1) in Duplicate_combos.items():
archive_duplicates
is a duplicate ofReal_duplicates
. There's no need for it.It doesn't seem like the output needs to be ordered, and so you can just make
Real_duplicates
a set. This means it won't have duplicates, and you don't have to loop through it twice each time you want to add a value.This alone speeds up your program from $O(n^3)$ to $O(n^2)$.
Your variable names are quite poor, and don't adhere to PEP8. I have changed them to somewhat generic names, but it'd be better if you replace, say,
items
with what it actually is.
def find_duplicates(items):
duplicates = set()
for key1, (code1, option1) in items.items():
for key2, (code2, option2) in items.items():
if key1 == key2:
continue
elif code1 == code2 and option1 == option2 == 'No':
duplicates.add(key1)
duplicates.add(key2)
return list(duplicates)
You don't need to loop over
Duplicate_combos
twice.To do this you need to make a new dictionary grouping by the code. And only adding to it if the option is
'No'
.After building the new dictionary you can iterate over it's values and return ones where the length of values is greater or equal to two.
def find_duplicates(items):
by_code =
for key, (code, option) in items.items():
if option == 'No':
by_code.setdefault(code, []).append(key)
return [
key
for keys in by_code.values()
if len(keys) >= 2
for key in keys
]
This now runs in $O(n)$ time rather than $O(n^3)$ time.
>>> find_duplicates(
101: ['1', 'No'], 102: ['1', 'No'],
103: ['1','Yes'], 104: ['1', 'No'],
201: ['2', 'No'], 202: ['2', 'No'],
301: ['3', 'No'], 401: ['4', 'No'],
)
[101, 102, 104, 201, 202]
$endgroup$
$begingroup$
so this would output all of the keys that have the duplicates not just one? I was iterating twice in order to compare each element to all the others so I would get all of the keys that share the duplicate values
$endgroup$
– Ben Naylor
Jun 3 at 20:34
$begingroup$
@BenNaylor Yes this would do that. Please see the update with the example showing this.
$endgroup$
– Peilonrayz
Jun 3 at 20:38
$begingroup$
Thank you so much, this really really helps!
$endgroup$
– Ben Naylor
Jun 4 at 12:20
add a comment |
$begingroup$
It's easier to read code that tuple unpacks the values in the
for
fromdict.items()
.for key1, (code1, option1) in Duplicate_combos.items():
archive_duplicates
is a duplicate ofReal_duplicates
. There's no need for it.It doesn't seem like the output needs to be ordered, and so you can just make
Real_duplicates
a set. This means it won't have duplicates, and you don't have to loop through it twice each time you want to add a value.This alone speeds up your program from $O(n^3)$ to $O(n^2)$.
Your variable names are quite poor, and don't adhere to PEP8. I have changed them to somewhat generic names, but it'd be better if you replace, say,
items
with what it actually is.
def find_duplicates(items):
duplicates = set()
for key1, (code1, option1) in items.items():
for key2, (code2, option2) in items.items():
if key1 == key2:
continue
elif code1 == code2 and option1 == option2 == 'No':
duplicates.add(key1)
duplicates.add(key2)
return list(duplicates)
You don't need to loop over
Duplicate_combos
twice.To do this you need to make a new dictionary grouping by the code. And only adding to it if the option is
'No'
.After building the new dictionary you can iterate over it's values and return ones where the length of values is greater or equal to two.
def find_duplicates(items):
by_code =
for key, (code, option) in items.items():
if option == 'No':
by_code.setdefault(code, []).append(key)
return [
key
for keys in by_code.values()
if len(keys) >= 2
for key in keys
]
This now runs in $O(n)$ time rather than $O(n^3)$ time.
>>> find_duplicates(
101: ['1', 'No'], 102: ['1', 'No'],
103: ['1','Yes'], 104: ['1', 'No'],
201: ['2', 'No'], 202: ['2', 'No'],
301: ['3', 'No'], 401: ['4', 'No'],
)
[101, 102, 104, 201, 202]
$endgroup$
$begingroup$
so this would output all of the keys that have the duplicates not just one? I was iterating twice in order to compare each element to all the others so I would get all of the keys that share the duplicate values
$endgroup$
– Ben Naylor
Jun 3 at 20:34
$begingroup$
@BenNaylor Yes this would do that. Please see the update with the example showing this.
$endgroup$
– Peilonrayz
Jun 3 at 20:38
$begingroup$
Thank you so much, this really really helps!
$endgroup$
– Ben Naylor
Jun 4 at 12:20
add a comment |
$begingroup$
It's easier to read code that tuple unpacks the values in the
for
fromdict.items()
.for key1, (code1, option1) in Duplicate_combos.items():
archive_duplicates
is a duplicate ofReal_duplicates
. There's no need for it.It doesn't seem like the output needs to be ordered, and so you can just make
Real_duplicates
a set. This means it won't have duplicates, and you don't have to loop through it twice each time you want to add a value.This alone speeds up your program from $O(n^3)$ to $O(n^2)$.
Your variable names are quite poor, and don't adhere to PEP8. I have changed them to somewhat generic names, but it'd be better if you replace, say,
items
with what it actually is.
def find_duplicates(items):
duplicates = set()
for key1, (code1, option1) in items.items():
for key2, (code2, option2) in items.items():
if key1 == key2:
continue
elif code1 == code2 and option1 == option2 == 'No':
duplicates.add(key1)
duplicates.add(key2)
return list(duplicates)
You don't need to loop over
Duplicate_combos
twice.To do this you need to make a new dictionary grouping by the code. And only adding to it if the option is
'No'
.After building the new dictionary you can iterate over it's values and return ones where the length of values is greater or equal to two.
def find_duplicates(items):
by_code =
for key, (code, option) in items.items():
if option == 'No':
by_code.setdefault(code, []).append(key)
return [
key
for keys in by_code.values()
if len(keys) >= 2
for key in keys
]
This now runs in $O(n)$ time rather than $O(n^3)$ time.
>>> find_duplicates(
101: ['1', 'No'], 102: ['1', 'No'],
103: ['1','Yes'], 104: ['1', 'No'],
201: ['2', 'No'], 202: ['2', 'No'],
301: ['3', 'No'], 401: ['4', 'No'],
)
[101, 102, 104, 201, 202]
$endgroup$
It's easier to read code that tuple unpacks the values in the
for
fromdict.items()
.for key1, (code1, option1) in Duplicate_combos.items():
archive_duplicates
is a duplicate ofReal_duplicates
. There's no need for it.It doesn't seem like the output needs to be ordered, and so you can just make
Real_duplicates
a set. This means it won't have duplicates, and you don't have to loop through it twice each time you want to add a value.This alone speeds up your program from $O(n^3)$ to $O(n^2)$.
Your variable names are quite poor, and don't adhere to PEP8. I have changed them to somewhat generic names, but it'd be better if you replace, say,
items
with what it actually is.
def find_duplicates(items):
duplicates = set()
for key1, (code1, option1) in items.items():
for key2, (code2, option2) in items.items():
if key1 == key2:
continue
elif code1 == code2 and option1 == option2 == 'No':
duplicates.add(key1)
duplicates.add(key2)
return list(duplicates)
You don't need to loop over
Duplicate_combos
twice.To do this you need to make a new dictionary grouping by the code. And only adding to it if the option is
'No'
.After building the new dictionary you can iterate over it's values and return ones where the length of values is greater or equal to two.
def find_duplicates(items):
by_code =
for key, (code, option) in items.items():
if option == 'No':
by_code.setdefault(code, []).append(key)
return [
key
for keys in by_code.values()
if len(keys) >= 2
for key in keys
]
This now runs in $O(n)$ time rather than $O(n^3)$ time.
>>> find_duplicates(
101: ['1', 'No'], 102: ['1', 'No'],
103: ['1','Yes'], 104: ['1', 'No'],
201: ['2', 'No'], 202: ['2', 'No'],
301: ['3', 'No'], 401: ['4', 'No'],
)
[101, 102, 104, 201, 202]
edited Jun 4 at 10:07
answered Jun 3 at 20:24
PeilonrayzPeilonrayz
28.4k344118
28.4k344118
$begingroup$
so this would output all of the keys that have the duplicates not just one? I was iterating twice in order to compare each element to all the others so I would get all of the keys that share the duplicate values
$endgroup$
– Ben Naylor
Jun 3 at 20:34
$begingroup$
@BenNaylor Yes this would do that. Please see the update with the example showing this.
$endgroup$
– Peilonrayz
Jun 3 at 20:38
$begingroup$
Thank you so much, this really really helps!
$endgroup$
– Ben Naylor
Jun 4 at 12:20
add a comment |
$begingroup$
so this would output all of the keys that have the duplicates not just one? I was iterating twice in order to compare each element to all the others so I would get all of the keys that share the duplicate values
$endgroup$
– Ben Naylor
Jun 3 at 20:34
$begingroup$
@BenNaylor Yes this would do that. Please see the update with the example showing this.
$endgroup$
– Peilonrayz
Jun 3 at 20:38
$begingroup$
Thank you so much, this really really helps!
$endgroup$
– Ben Naylor
Jun 4 at 12:20
$begingroup$
so this would output all of the keys that have the duplicates not just one? I was iterating twice in order to compare each element to all the others so I would get all of the keys that share the duplicate values
$endgroup$
– Ben Naylor
Jun 3 at 20:34
$begingroup$
so this would output all of the keys that have the duplicates not just one? I was iterating twice in order to compare each element to all the others so I would get all of the keys that share the duplicate values
$endgroup$
– Ben Naylor
Jun 3 at 20:34
$begingroup$
@BenNaylor Yes this would do that. Please see the update with the example showing this.
$endgroup$
– Peilonrayz
Jun 3 at 20:38
$begingroup$
@BenNaylor Yes this would do that. Please see the update with the example showing this.
$endgroup$
– Peilonrayz
Jun 3 at 20:38
$begingroup$
Thank you so much, this really really helps!
$endgroup$
– Ben Naylor
Jun 4 at 12:20
$begingroup$
Thank you so much, this really really helps!
$endgroup$
– Ben Naylor
Jun 4 at 12:20
add a comment |
$begingroup$
When reading your data, you open
a file but never .close()
it. You should take the habit to use the with
keyword to avoid this issue.
You should also benefit from the csv
module to read this file as it will remove boilerplate and handle special cases for you:
def open_file(filename='./Data.csv'):
cost_center = [] # 0
cost_center_name = []# 1
management_site = [] # 15
sub_function = [] #19
LER = [] #41
Company_name = [] #3
Business_group = [] #7
Value_center = [] #9
Performance_center = [] #10
Profit_center = [] #11
total_lines =
with open(filename) as in_file:
next(in_file) # skip blank line
reader = csv.reader(in_file, delimiter=',')
for line in reader:
cost_center.append(line[0])
cost_center_name.append(line[1])
management_site.append(line[15])
sub_function.append(line[19])
LER.append(line[41])
Company_name.append(line[3])
Business_group.append(line[7])
Value_center.append(line[9])
Performance_center.append(line[10])
Profit_center.append(line[11])
# create a dictionary of all the lines with the key being the unique cost center number (cost_center list)
total_lines[line[0]] = line[1:]
return cost_center, cost_center_name, management_site, sub_function, LER, Company_name, Business_group, total_lines, titles, Value_center, Performance_center, Profit_center
$endgroup$
$begingroup$
I'd personally use something likecolumns = zip(*reader)
and then define each value once.cost_center = columns[0]
. This would maketotal_lines
a bit more finicky tho.
$endgroup$
– Peilonrayz
Jun 4 at 10:39
$begingroup$
@Peilonrayz When I readLER.append(line[41])
and there is only 10 columns of interest, I’m not sure this is really worth it.
$endgroup$
– Mathias Ettinger
Jun 4 at 12:46
add a comment |
$begingroup$
When reading your data, you open
a file but never .close()
it. You should take the habit to use the with
keyword to avoid this issue.
You should also benefit from the csv
module to read this file as it will remove boilerplate and handle special cases for you:
def open_file(filename='./Data.csv'):
cost_center = [] # 0
cost_center_name = []# 1
management_site = [] # 15
sub_function = [] #19
LER = [] #41
Company_name = [] #3
Business_group = [] #7
Value_center = [] #9
Performance_center = [] #10
Profit_center = [] #11
total_lines =
with open(filename) as in_file:
next(in_file) # skip blank line
reader = csv.reader(in_file, delimiter=',')
for line in reader:
cost_center.append(line[0])
cost_center_name.append(line[1])
management_site.append(line[15])
sub_function.append(line[19])
LER.append(line[41])
Company_name.append(line[3])
Business_group.append(line[7])
Value_center.append(line[9])
Performance_center.append(line[10])
Profit_center.append(line[11])
# create a dictionary of all the lines with the key being the unique cost center number (cost_center list)
total_lines[line[0]] = line[1:]
return cost_center, cost_center_name, management_site, sub_function, LER, Company_name, Business_group, total_lines, titles, Value_center, Performance_center, Profit_center
$endgroup$
$begingroup$
I'd personally use something likecolumns = zip(*reader)
and then define each value once.cost_center = columns[0]
. This would maketotal_lines
a bit more finicky tho.
$endgroup$
– Peilonrayz
Jun 4 at 10:39
$begingroup$
@Peilonrayz When I readLER.append(line[41])
and there is only 10 columns of interest, I’m not sure this is really worth it.
$endgroup$
– Mathias Ettinger
Jun 4 at 12:46
add a comment |
$begingroup$
When reading your data, you open
a file but never .close()
it. You should take the habit to use the with
keyword to avoid this issue.
You should also benefit from the csv
module to read this file as it will remove boilerplate and handle special cases for you:
def open_file(filename='./Data.csv'):
cost_center = [] # 0
cost_center_name = []# 1
management_site = [] # 15
sub_function = [] #19
LER = [] #41
Company_name = [] #3
Business_group = [] #7
Value_center = [] #9
Performance_center = [] #10
Profit_center = [] #11
total_lines =
with open(filename) as in_file:
next(in_file) # skip blank line
reader = csv.reader(in_file, delimiter=',')
for line in reader:
cost_center.append(line[0])
cost_center_name.append(line[1])
management_site.append(line[15])
sub_function.append(line[19])
LER.append(line[41])
Company_name.append(line[3])
Business_group.append(line[7])
Value_center.append(line[9])
Performance_center.append(line[10])
Profit_center.append(line[11])
# create a dictionary of all the lines with the key being the unique cost center number (cost_center list)
total_lines[line[0]] = line[1:]
return cost_center, cost_center_name, management_site, sub_function, LER, Company_name, Business_group, total_lines, titles, Value_center, Performance_center, Profit_center
$endgroup$
When reading your data, you open
a file but never .close()
it. You should take the habit to use the with
keyword to avoid this issue.
You should also benefit from the csv
module to read this file as it will remove boilerplate and handle special cases for you:
def open_file(filename='./Data.csv'):
cost_center = [] # 0
cost_center_name = []# 1
management_site = [] # 15
sub_function = [] #19
LER = [] #41
Company_name = [] #3
Business_group = [] #7
Value_center = [] #9
Performance_center = [] #10
Profit_center = [] #11
total_lines =
with open(filename) as in_file:
next(in_file) # skip blank line
reader = csv.reader(in_file, delimiter=',')
for line in reader:
cost_center.append(line[0])
cost_center_name.append(line[1])
management_site.append(line[15])
sub_function.append(line[19])
LER.append(line[41])
Company_name.append(line[3])
Business_group.append(line[7])
Value_center.append(line[9])
Performance_center.append(line[10])
Profit_center.append(line[11])
# create a dictionary of all the lines with the key being the unique cost center number (cost_center list)
total_lines[line[0]] = line[1:]
return cost_center, cost_center_name, management_site, sub_function, LER, Company_name, Business_group, total_lines, titles, Value_center, Performance_center, Profit_center
answered Jun 4 at 8:15
Mathias EttingerMathias Ettinger
25.6k33387
25.6k33387
$begingroup$
I'd personally use something likecolumns = zip(*reader)
and then define each value once.cost_center = columns[0]
. This would maketotal_lines
a bit more finicky tho.
$endgroup$
– Peilonrayz
Jun 4 at 10:39
$begingroup$
@Peilonrayz When I readLER.append(line[41])
and there is only 10 columns of interest, I’m not sure this is really worth it.
$endgroup$
– Mathias Ettinger
Jun 4 at 12:46
add a comment |
$begingroup$
I'd personally use something likecolumns = zip(*reader)
and then define each value once.cost_center = columns[0]
. This would maketotal_lines
a bit more finicky tho.
$endgroup$
– Peilonrayz
Jun 4 at 10:39
$begingroup$
@Peilonrayz When I readLER.append(line[41])
and there is only 10 columns of interest, I’m not sure this is really worth it.
$endgroup$
– Mathias Ettinger
Jun 4 at 12:46
$begingroup$
I'd personally use something like
columns = zip(*reader)
and then define each value once. cost_center = columns[0]
. This would make total_lines
a bit more finicky tho.$endgroup$
– Peilonrayz
Jun 4 at 10:39
$begingroup$
I'd personally use something like
columns = zip(*reader)
and then define each value once. cost_center = columns[0]
. This would make total_lines
a bit more finicky tho.$endgroup$
– Peilonrayz
Jun 4 at 10:39
$begingroup$
@Peilonrayz When I read
LER.append(line[41])
and there is only 10 columns of interest, I’m not sure this is really worth it.$endgroup$
– Mathias Ettinger
Jun 4 at 12:46
$begingroup$
@Peilonrayz When I read
LER.append(line[41])
and there is only 10 columns of interest, I’m not sure this is really worth it.$endgroup$
– Mathias Ettinger
Jun 4 at 12:46
add a comment |
$begingroup$
Doing
def get_dupes(df):
if sum(df.loc[1]=='No')<2:
return None
else:
return list(df.loc[:,df.loc[1]=='No'].columns)
df.groupby(axis=1,by=df.loc[0]).apply(get_dupes)
Got me
0
124 None
123 [1234, 1235]
dtype: object
Your question wasn't quite clear on what you want the output to be if there are multiple company values with duplicate values (e.g. if the input is 1234: ['123' , 'No'] , 1235:['123', 'No'], 1236: ['123','Yes'], 1237: [124,'No'],1238: [124,'No']
do you want [1234, 1235, 1237, 1238]
or [[1234, 1235], [1237, 1238]]
), so you can modify this code accordingly.
$endgroup$
1
$begingroup$
You could just take a look at how the current code behaves to understand what output is expected...
$endgroup$
– Vogel612♦
Jun 4 at 10:05
2
$begingroup$
You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer.
$endgroup$
– Toby Speight
Jun 4 at 10:07
add a comment |
$begingroup$
Doing
def get_dupes(df):
if sum(df.loc[1]=='No')<2:
return None
else:
return list(df.loc[:,df.loc[1]=='No'].columns)
df.groupby(axis=1,by=df.loc[0]).apply(get_dupes)
Got me
0
124 None
123 [1234, 1235]
dtype: object
Your question wasn't quite clear on what you want the output to be if there are multiple company values with duplicate values (e.g. if the input is 1234: ['123' , 'No'] , 1235:['123', 'No'], 1236: ['123','Yes'], 1237: [124,'No'],1238: [124,'No']
do you want [1234, 1235, 1237, 1238]
or [[1234, 1235], [1237, 1238]]
), so you can modify this code accordingly.
$endgroup$
1
$begingroup$
You could just take a look at how the current code behaves to understand what output is expected...
$endgroup$
– Vogel612♦
Jun 4 at 10:05
2
$begingroup$
You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer.
$endgroup$
– Toby Speight
Jun 4 at 10:07
add a comment |
$begingroup$
Doing
def get_dupes(df):
if sum(df.loc[1]=='No')<2:
return None
else:
return list(df.loc[:,df.loc[1]=='No'].columns)
df.groupby(axis=1,by=df.loc[0]).apply(get_dupes)
Got me
0
124 None
123 [1234, 1235]
dtype: object
Your question wasn't quite clear on what you want the output to be if there are multiple company values with duplicate values (e.g. if the input is 1234: ['123' , 'No'] , 1235:['123', 'No'], 1236: ['123','Yes'], 1237: [124,'No'],1238: [124,'No']
do you want [1234, 1235, 1237, 1238]
or [[1234, 1235], [1237, 1238]]
), so you can modify this code accordingly.
$endgroup$
Doing
def get_dupes(df):
if sum(df.loc[1]=='No')<2:
return None
else:
return list(df.loc[:,df.loc[1]=='No'].columns)
df.groupby(axis=1,by=df.loc[0]).apply(get_dupes)
Got me
0
124 None
123 [1234, 1235]
dtype: object
Your question wasn't quite clear on what you want the output to be if there are multiple company values with duplicate values (e.g. if the input is 1234: ['123' , 'No'] , 1235:['123', 'No'], 1236: ['123','Yes'], 1237: [124,'No'],1238: [124,'No']
do you want [1234, 1235, 1237, 1238]
or [[1234, 1235], [1237, 1238]]
), so you can modify this code accordingly.
answered Jun 3 at 23:16
AcccumulationAcccumulation
1,12515
1,12515
1
$begingroup$
You could just take a look at how the current code behaves to understand what output is expected...
$endgroup$
– Vogel612♦
Jun 4 at 10:05
2
$begingroup$
You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer.
$endgroup$
– Toby Speight
Jun 4 at 10:07
add a comment |
1
$begingroup$
You could just take a look at how the current code behaves to understand what output is expected...
$endgroup$
– Vogel612♦
Jun 4 at 10:05
2
$begingroup$
You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer.
$endgroup$
– Toby Speight
Jun 4 at 10:07
1
1
$begingroup$
You could just take a look at how the current code behaves to understand what output is expected...
$endgroup$
– Vogel612♦
Jun 4 at 10:05
$begingroup$
You could just take a look at how the current code behaves to understand what output is expected...
$endgroup$
– Vogel612♦
Jun 4 at 10:05
2
2
$begingroup$
You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer.
$endgroup$
– Toby Speight
Jun 4 at 10:07
$begingroup$
You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer.
$endgroup$
– Toby Speight
Jun 4 at 10:07
add a comment |
Ben Naylor is a new contributor. Be nice, and check out our Code of Conduct.
Ben Naylor is a new contributor. Be nice, and check out our Code of Conduct.
Ben Naylor is a new contributor. Be nice, and check out our Code of Conduct.
Ben Naylor is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f221609%2ffinding-dictionary-keys-whose-values-are-duplicates%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Where does the data for
Duplicate_combos
come from? The right performance fix would likely involve putting that data into a more appropriate data structure for this task.$endgroup$
– 200_success
Jun 3 at 19:57
$begingroup$
The data comes from a csv file that I read in as part of earlier functions. Based on when I have been running it, this function seems to be the one that is taking significantly longer to run
$endgroup$
– Ben Naylor
Jun 3 at 20:00
$begingroup$
In that case, I recommend including the CSV-reading code, as well as an excerpt from the CSV file, so that we can give you the proper advice. Also, please fix your indentation. One easy way to post code is to paste it into the question editor, highlight it, and press Ctrl-K to mark it as a code block.
$endgroup$
– 200_success
Jun 3 at 20:09
$begingroup$
I added the open file function, a lot of the stuff that is returned is used elsewhere so idk if it helps at all. As for the data, I can't share that but from the testing that I did, I know that everything was being read in correctly and all that. At this point, the code that I have works, just REALLY NOT optimally so that's the main thing that I was looking for. I haven't had too much experience with optimization so I was hoping to get some ideas on how exactly to do that
$endgroup$
– Ben Naylor
Jun 3 at 20:17
1
$begingroup$
Interesting! That is a very unconventional way to read a CSV, and now I'm intrigued as to how you make use of those weird lists. You could probably benefit a lot from putting your entire program up for review.
$endgroup$
– 200_success
Jun 3 at 20:20