Markov-chain sentence generator in PythonMarkov Chain generatorShakespeare and dictionariesMarkov chain-based word salad generatorMarkov chain text generation in PythonCumulative transition matrix Markov Chain simulationMarkov Chain in PythonSentence generation using Markov ChainsCalculate probability of word occurenceGiven a string and a word dict, find all possible sentences

Network helper class with retry logic on failure

How do I make my image comply with the requirements of this photography competition?

How would a Creature that needs to be seen by Humans evolve?

How to respectfully refuse to assist co-workers with IT issues?

Could George I (of Great Britain) speak English?

Why do all fields in a QFT transform like *irreducible* representations of some group?

Another solution to create a set with two conditions

What does zitch dog mean?

Compelling story with the world as a villain

Numbers Decrease while Letters Increase

"There were either twelve sexes or none."

moon structure ownership

Did anyone try to find the little box that held Professor Moriarty and his wife after the crash?

What would make bones be of different colors?

Was there ever a treaty between 2 entities with significantly different translations to the detriment of one party?

Why is there a difference between predicting on Validation set and Test set?

Improving Performance of an XY Monte Carlo

Can you answer this logic non-math puzzle question correctly?

Can I add a link to my website in a paper I'm coauthoring?

Papers on arXiv solving the same problem at the same time

Duplicate instruments in unison in an orchestra

How do I prevent other wifi networks from showing up on my computer?

Where can/should I, as a high schooler, publish a paper regarding the derivation of a formula?

Is gzip atomic?

Markov-chain sentence generator in Python

Markov Chain generatorShakespeare and dictionariesMarkov chain-based word salad generatorMarkov chain text generation in PythonCumulative transition matrix Markov Chain simulationMarkov Chain in PythonSentence generation using Markov ChainsCalculate probability of word occurenceGiven a string and a word dict, find all possible sentences

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I wrote a Markov-chain based sentence generator as my first non-trivial Python program. I mainly used C before, so I probably have ignored a lot of Python conventions and features, so any advice would be appreciated.

text-gen.py

import sys
import random

class MarkovChain:
 # Class constant that serves as an initial state for the Markov chain
 START = ""

 # The Markov chain is modelled as a directed graph,
 # with the START state acting as the only source,
 # and the tranisition probabilities as the graph weights.
 #
 # The graph is implemented using an adjacency list,
 # which in turn is implemented as a dictionary of dictionaries.
 #
 # self.adjList is a dictionary keyed by all words of the text
 # or START (the states). For each key/state, it contains
 # another dictionary indexed by the words of the text
 # that succeed the key in the text (the next states in the chain),
 # and for each of those words/next states the dictionary contains
 # the transition probability from the present state to them.
 #
 # This implementation was chosen because of it being easy to code,
 # and offering an easy way to iterate on both the probabilities and
 # the words/next states of each dictionary using items().
 def __init__(self, file):
 self.adjList = 

 # The totals dictionary is used in calculating the probabilities,
 # for every word in the text/chain state it contains the total
 # number of transitions from it to another state.
 totals = 


 # Start by insering the initial state to the structures
 self.adjList[MarkovChain.START] = 
 totals[MarkovChain.START] = 0

 # prev: Contains the previously encountered word or the START state,
 # initialized to the START state.
 prev = MarkovChain.START


 for line in file:
 for word in line.split():
 # If the word ends with a terminating punctuation mark,
 # ignore the mark, and treat the word as a terminating state as
 # it does not preceed another word in the current sentence.
 # So prev is set to START, in order for the text model
 # to account for the fact that some words start sentences
 # more frequently than others (not all words are next states of START).
 endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
 if (endsTerm):
 word = word[0:-1]

 # If this is the first time the word is encountered,
 # add it to the adjacency list, and initialize its dictionary
 # and transition total.
 if (word not in self.adjList):
 self.adjList[word] = 
 totals[word] = 0

 # If this is the first time the prev->word transition
 # was detected, initialize the prev->word transition frequency to 1,
 # else increment it.
 if (word in self.adjList[prev]):
 self.adjList[prev][word] += 1 
 else:
 self.adjList[prev][word] = 1

 # There is a prev->word state transition, so increment
 # the total transition number of the prev state.
 totals[prev] += 1

 if (endsTerm):
 prev = START

 # Using total, convert the transition frequencies
 # to transition probabilities.
 for word, neighbors in self.adjList.items():
 for name in neighbors:
 neighbors[name] /= totals[word]


 # chooseNextWord: Chooses the next state/word,
 # by sampling the non uniform transition probability distribution
 # of the current word/state. 
 def chooseNextWord(self, curWord):
 # Convert the dict_keys object to a list
 # to use indexing
 nextWords = list(self.adjList[curWord].keys())

 # Sampling is done through linear search.
 for word in nextWords[0:-1]:
 prob = self.adjList[curWord][word]
 roll = random.random()
 if (roll <= prob):
 return word

 # If none of the first N-1 words were chosen,
 # only the last one was left.
 return nextWords[-1]


 # genSentence: Generates a sentence. If a positive
 # limit is not provided by the caller, the sentences grow to
 # an arbitrary number of words, until the last word of a sentence/a terminal state
 # is reached.
 def genSentence(self, limit = 0):
 sentence = ""

 curWord = self.chooseNextWord(MarkovChain.START)
 sentence += curWord + " "

 if (limit > 0):
 wordsUsed = 1
 while (wordsUsed < limit and self.adjList[curWord]):
 curWord = self.chooseNextWord(curWord)
 sentence += curWord + " "
 wordsUsed += 1
 else:
 while (self.adjList[curWord]):
 curWord = self.chooseNextWord(curWord)
 sentence += curWord + " "

 return sentence



if (__name__ == "__main__"):
 if (len(sys.argv) < 3):
 print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
 sys.exit(1)

 try:
 with open(sys.argv[1], "r") as f:
 markov = MarkovChain(f)

 except OSError as error:
 print(error.strerror)
 sys.exit(1)

 # Generate and print as many sentences as asked.
 for k in range(0, int(sys.argv[2])):
 print(markov.genSentence(20) + "n")

edited Aug 14 at 6:58

Mast

7,9157 gold badges38 silver badges92 bronze badges

asked Aug 12 at 18:06

Hashew

1011 silver badge6 bronze badges

$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57

$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21

add a comment |

text-gen.py

import sys
import random

class MarkovChain:
 # Class constant that serves as an initial state for the Markov chain
 START = ""

 # The Markov chain is modelled as a directed graph,
 # with the START state acting as the only source,
 # and the tranisition probabilities as the graph weights.
 #
 # The graph is implemented using an adjacency list,
 # which in turn is implemented as a dictionary of dictionaries.
 #
 # self.adjList is a dictionary keyed by all words of the text
 # or START (the states). For each key/state, it contains
 # another dictionary indexed by the words of the text
 # that succeed the key in the text (the next states in the chain),
 # and for each of those words/next states the dictionary contains
 # the transition probability from the present state to them.
 #
 # This implementation was chosen because of it being easy to code,
 # and offering an easy way to iterate on both the probabilities and
 # the words/next states of each dictionary using items().
 def __init__(self, file):
 self.adjList = 

 # The totals dictionary is used in calculating the probabilities,
 # for every word in the text/chain state it contains the total
 # number of transitions from it to another state.
 totals = 


 # Start by insering the initial state to the structures
 self.adjList[MarkovChain.START] = 
 totals[MarkovChain.START] = 0

 # prev: Contains the previously encountered word or the START state,
 # initialized to the START state.
 prev = MarkovChain.START


 for line in file:
 for word in line.split():
 # If the word ends with a terminating punctuation mark,
 # ignore the mark, and treat the word as a terminating state as
 # it does not preceed another word in the current sentence.
 # So prev is set to START, in order for the text model
 # to account for the fact that some words start sentences
 # more frequently than others (not all words are next states of START).
 endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
 if (endsTerm):
 word = word[0:-1]

 # If this is the first time the word is encountered,
 # add it to the adjacency list, and initialize its dictionary
 # and transition total.
 if (word not in self.adjList):
 self.adjList[word] = 
 totals[word] = 0

 # If this is the first time the prev->word transition
 # was detected, initialize the prev->word transition frequency to 1,
 # else increment it.
 if (word in self.adjList[prev]):
 self.adjList[prev][word] += 1 
 else:
 self.adjList[prev][word] = 1

 # There is a prev->word state transition, so increment
 # the total transition number of the prev state.
 totals[prev] += 1

 if (endsTerm):
 prev = START

 # Using total, convert the transition frequencies
 # to transition probabilities.
 for word, neighbors in self.adjList.items():
 for name in neighbors:
 neighbors[name] /= totals[word]


 # chooseNextWord: Chooses the next state/word,
 # by sampling the non uniform transition probability distribution
 # of the current word/state. 
 def chooseNextWord(self, curWord):
 # Convert the dict_keys object to a list
 # to use indexing
 nextWords = list(self.adjList[curWord].keys())

 # Sampling is done through linear search.
 for word in nextWords[0:-1]:
 prob = self.adjList[curWord][word]
 roll = random.random()
 if (roll <= prob):
 return word

 # If none of the first N-1 words were chosen,
 # only the last one was left.
 return nextWords[-1]


 # genSentence: Generates a sentence. If a positive
 # limit is not provided by the caller, the sentences grow to
 # an arbitrary number of words, until the last word of a sentence/a terminal state
 # is reached.
 def genSentence(self, limit = 0):
 sentence = ""

 curWord = self.chooseNextWord(MarkovChain.START)
 sentence += curWord + " "

 if (limit > 0):
 wordsUsed = 1
 while (wordsUsed < limit and self.adjList[curWord]):
 curWord = self.chooseNextWord(curWord)
 sentence += curWord + " "
 wordsUsed += 1
 else:
 while (self.adjList[curWord]):
 curWord = self.chooseNextWord(curWord)
 sentence += curWord + " "

 return sentence



if (__name__ == "__main__"):
 if (len(sys.argv) < 3):
 print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
 sys.exit(1)

 try:
 with open(sys.argv[1], "r") as f:
 markov = MarkovChain(f)

 except OSError as error:
 print(error.strerror)
 sys.exit(1)

 # Generate and print as many sentences as asked.
 for k in range(0, int(sys.argv[2])):
 print(markov.genSentence(20) + "n")

edited Aug 14 at 6:58

Mast

7,9157 gold badges38 silver badges92 bronze badges

asked Aug 12 at 18:06

Hashew

1011 silver badge6 bronze badges

$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57

$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21

add a comment |

text-gen.py

import sys
import random

class MarkovChain:
 # Class constant that serves as an initial state for the Markov chain
 START = ""

 # The Markov chain is modelled as a directed graph,
 # with the START state acting as the only source,
 # and the tranisition probabilities as the graph weights.
 #
 # The graph is implemented using an adjacency list,
 # which in turn is implemented as a dictionary of dictionaries.
 #
 # self.adjList is a dictionary keyed by all words of the text
 # or START (the states). For each key/state, it contains
 # another dictionary indexed by the words of the text
 # that succeed the key in the text (the next states in the chain),
 # and for each of those words/next states the dictionary contains
 # the transition probability from the present state to them.
 #
 # This implementation was chosen because of it being easy to code,
 # and offering an easy way to iterate on both the probabilities and
 # the words/next states of each dictionary using items().
 def __init__(self, file):
 self.adjList = 

 # The totals dictionary is used in calculating the probabilities,
 # for every word in the text/chain state it contains the total
 # number of transitions from it to another state.
 totals = 


 # Start by insering the initial state to the structures
 self.adjList[MarkovChain.START] = 
 totals[MarkovChain.START] = 0

 # prev: Contains the previously encountered word or the START state,
 # initialized to the START state.
 prev = MarkovChain.START


 for line in file:
 for word in line.split():
 # If the word ends with a terminating punctuation mark,
 # ignore the mark, and treat the word as a terminating state as
 # it does not preceed another word in the current sentence.
 # So prev is set to START, in order for the text model
 # to account for the fact that some words start sentences
 # more frequently than others (not all words are next states of START).
 endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
 if (endsTerm):
 word = word[0:-1]

 # If this is the first time the word is encountered,
 # add it to the adjacency list, and initialize its dictionary
 # and transition total.
 if (word not in self.adjList):
 self.adjList[word] = 
 totals[word] = 0

 # If this is the first time the prev->word transition
 # was detected, initialize the prev->word transition frequency to 1,
 # else increment it.
 if (word in self.adjList[prev]):
 self.adjList[prev][word] += 1 
 else:
 self.adjList[prev][word] = 1

 # There is a prev->word state transition, so increment
 # the total transition number of the prev state.
 totals[prev] += 1

 if (endsTerm):
 prev = START

 # Using total, convert the transition frequencies
 # to transition probabilities.
 for word, neighbors in self.adjList.items():
 for name in neighbors:
 neighbors[name] /= totals[word]


 # chooseNextWord: Chooses the next state/word,
 # by sampling the non uniform transition probability distribution
 # of the current word/state. 
 def chooseNextWord(self, curWord):
 # Convert the dict_keys object to a list
 # to use indexing
 nextWords = list(self.adjList[curWord].keys())

 # Sampling is done through linear search.
 for word in nextWords[0:-1]:
 prob = self.adjList[curWord][word]
 roll = random.random()
 if (roll <= prob):
 return word

 # If none of the first N-1 words were chosen,
 # only the last one was left.
 return nextWords[-1]


 # genSentence: Generates a sentence. If a positive
 # limit is not provided by the caller, the sentences grow to
 # an arbitrary number of words, until the last word of a sentence/a terminal state
 # is reached.
 def genSentence(self, limit = 0):
 sentence = ""

 curWord = self.chooseNextWord(MarkovChain.START)
 sentence += curWord + " "

 if (limit > 0):
 wordsUsed = 1
 while (wordsUsed < limit and self.adjList[curWord]):
 curWord = self.chooseNextWord(curWord)
 sentence += curWord + " "
 wordsUsed += 1
 else:
 while (self.adjList[curWord]):
 curWord = self.chooseNextWord(curWord)
 sentence += curWord + " "

 return sentence



if (__name__ == "__main__"):
 if (len(sys.argv) < 3):
 print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
 sys.exit(1)

 try:
 with open(sys.argv[1], "r") as f:
 markov = MarkovChain(f)

 except OSError as error:
 print(error.strerror)
 sys.exit(1)

 # Generate and print as many sentences as asked.
 for k in range(0, int(sys.argv[2])):
 print(markov.genSentence(20) + "n")

edited Aug 14 at 6:58

Mast

7,9157 gold badges38 silver badges92 bronze badges

asked Aug 12 at 18:06

Hashew

1011 silver badge6 bronze badges

text-gen.py

import sys
import random

class MarkovChain:
 # Class constant that serves as an initial state for the Markov chain
 START = ""

 # The Markov chain is modelled as a directed graph,
 # with the START state acting as the only source,
 # and the tranisition probabilities as the graph weights.
 #
 # The graph is implemented using an adjacency list,
 # which in turn is implemented as a dictionary of dictionaries.
 #
 # self.adjList is a dictionary keyed by all words of the text
 # or START (the states). For each key/state, it contains
 # another dictionary indexed by the words of the text
 # that succeed the key in the text (the next states in the chain),
 # and for each of those words/next states the dictionary contains
 # the transition probability from the present state to them.
 #
 # This implementation was chosen because of it being easy to code,
 # and offering an easy way to iterate on both the probabilities and
 # the words/next states of each dictionary using items().
 def __init__(self, file):
 self.adjList = 

 # The totals dictionary is used in calculating the probabilities,
 # for every word in the text/chain state it contains the total
 # number of transitions from it to another state.
 totals = 


 # Start by insering the initial state to the structures
 self.adjList[MarkovChain.START] = 
 totals[MarkovChain.START] = 0

 # prev: Contains the previously encountered word or the START state,
 # initialized to the START state.
 prev = MarkovChain.START


 for line in file:
 for word in line.split():
 # If the word ends with a terminating punctuation mark,
 # ignore the mark, and treat the word as a terminating state as
 # it does not preceed another word in the current sentence.
 # So prev is set to START, in order for the text model
 # to account for the fact that some words start sentences
 # more frequently than others (not all words are next states of START).
 endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
 if (endsTerm):
 word = word[0:-1]

 # If this is the first time the word is encountered,
 # add it to the adjacency list, and initialize its dictionary
 # and transition total.
 if (word not in self.adjList):
 self.adjList[word] = 
 totals[word] = 0

 # If this is the first time the prev->word transition
 # was detected, initialize the prev->word transition frequency to 1,
 # else increment it.
 if (word in self.adjList[prev]):
 self.adjList[prev][word] += 1 
 else:
 self.adjList[prev][word] = 1

 # There is a prev->word state transition, so increment
 # the total transition number of the prev state.
 totals[prev] += 1

 if (endsTerm):
 prev = START

 # Using total, convert the transition frequencies
 # to transition probabilities.
 for word, neighbors in self.adjList.items():
 for name in neighbors:
 neighbors[name] /= totals[word]


 # chooseNextWord: Chooses the next state/word,
 # by sampling the non uniform transition probability distribution
 # of the current word/state. 
 def chooseNextWord(self, curWord):
 # Convert the dict_keys object to a list
 # to use indexing
 nextWords = list(self.adjList[curWord].keys())

 # Sampling is done through linear search.
 for word in nextWords[0:-1]:
 prob = self.adjList[curWord][word]
 roll = random.random()
 if (roll <= prob):
 return word

 # If none of the first N-1 words were chosen,
 # only the last one was left.
 return nextWords[-1]


 # genSentence: Generates a sentence. If a positive
 # limit is not provided by the caller, the sentences grow to
 # an arbitrary number of words, until the last word of a sentence/a terminal state
 # is reached.
 def genSentence(self, limit = 0):
 sentence = ""

 curWord = self.chooseNextWord(MarkovChain.START)
 sentence += curWord + " "

 if (limit > 0):
 wordsUsed = 1
 while (wordsUsed < limit and self.adjList[curWord]):
 curWord = self.chooseNextWord(curWord)
 sentence += curWord + " "
 wordsUsed += 1
 else:
 while (self.adjList[curWord]):
 curWord = self.chooseNextWord(curWord)
 sentence += curWord + " "

 return sentence



if (__name__ == "__main__"):
 if (len(sys.argv) < 3):
 print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
 sys.exit(1)

 try:
 with open(sys.argv[1], "r") as f:
 markov = MarkovChain(f)

 except OSError as error:
 print(error.strerror)
 sys.exit(1)

 # Generate and print as many sentences as asked.
 for k in range(0, int(sys.argv[2])):
 print(markov.genSentence(20) + "n")

python python-3.x statistics markov-chain

edited Aug 14 at 6:58

Mast

7,9157 gold badges38 silver badges92 bronze badges

asked Aug 12 at 18:06

Hashew

1011 silver badge6 bronze badges

edited Aug 14 at 6:58

Mast

7,9157 gold badges38 silver badges92 bronze badges

asked Aug 12 at 18:06

Hashew

1011 silver badge6 bronze badges

edited Aug 14 at 6:58

Mast

7,9157 gold badges38 silver badges92 bronze badges

edited Aug 14 at 6:58

Mast

7,9157 gold badges38 silver badges92 bronze badges

edited Aug 14 at 6:58

Mast

7,9157 gold badges38 silver badges92 bronze badges

asked Aug 12 at 18:06

Hashew

1011 silver badge6 bronze badges

asked Aug 12 at 18:06

Hashew

1011 silver badge6 bronze badges

asked Aug 12 at 18:06

Hashew

1011 silver badge6 bronze badges

$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57

$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21

add a comment |

$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57

$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21

What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?

– Mast
Aug 14 at 6:57

I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed

– ChatterOne
Aug 14 at 13:21

add a comment |

5 Answers
5

active

oldest

votes

chooseNextWord distorts the probabilities.

For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.

A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.

The code may benefit from using defaultdict rather than a plain dictionaries. Lesser ifs is better.

Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.

edited Aug 13 at 3:02

answered Aug 12 at 18:39

vnp

43k2 gold badges36 silver badges110 bronze badges

1

$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00

5

$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26

1

$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
$endgroup$
– Hashew
Aug 12 at 19:33

2

$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03

1

$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
$endgroup$
– Hashew
Aug 13 at 9:23

add a comment |

endTerms

When a word ends with an endTerm, think you need to include an START or END symbol in adjList. Most words can appear anywhere in a sentence. So it is unlikely that you can end a sentence only when words don't have any follow-on words. Include the START/END symbol in adjList and the Markov process can also end a sentence.

the standard library

collections.defaultdict provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.

collections.Counter provides a dictionary that counts things.

random.choices selects items from a population according to specified weights.

import collections
import random

class MarkovChain:
 START = ""

 def __init__(self, file):
 adjList = collections.defaultdict(collections.Counter)

 # this inserts START into the defaultdict
 adjList[MarkovChain.START]

 prev = MarkovChain.START

 for line in file:
 for word in line.split():
 endsTerm = word[-1] in ('.', '?', '!')

 if (endsTerm):
 word = word[:-1]

 adjList[prev].update([word])

 if endsTerm:
 # mark the end of a sentence
 adjList[word].update([MarkovChain.START])
 prev = MarkovChain.START
 else:
 prev = word

 # convert defaultdict to a regular dict
 # the values are a tuple: ([follow words], [counts])
 # for use in random.choices() in chooseNextWord()
 self.adjList = k:(list(v.keys()), list(v.values()))
 for k,v in adjList.items()

 #print(self.adjList)


 def chooseNextWord(self, word):
 # random.choices returns a list, hence the [0]
 return random.choices(*self.adjList[word])[0]


 def genSentence(self, limit = 0):
 sentence = []
 curWord = MarkovChain.START

 while True:
 curWord = self.chooseNextWord(curWord)
 sentence.append(curWord)

 if 0 < limit < len(sentence) or curWord == MarkovChain.START:
 break

 return ' '.join(sentence).strip()

answered Aug 13 at 1:37

RootTwo

1,5943 silver badges9 bronze badges

add a comment |

Style

The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.

Avoid variables in all caps, like START (unless they are constants).

Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.

answered Aug 13 at 6:29

qwr

3382 silver badges11 bronze badges

add a comment |

Let's look at this one line, which shows quite a few areas where your code could be improved:

if (word in self.adjList[prev]):

As noted elsewhere, Python has an official style-guide, PEP8, which recommends using lower_case for variables and functions.

Basically all Python keywords (at least if, elif, else, for, while, in, with, except, return, yield, yield from, print in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses.

Whenever you do if X in Y you should know what the time complexity of that statement is. For tuple, str and list it is just a linear scan, so it is linear. For set and dict it is (amortized) constant due to the use of hashes.

adjList is not actually a list, but a dict! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call it adjacancies.

Take an afternoon and work through the Python standard library. I would recommend at least collections (where you can find defaultdict and Counter as recommend Ed in other answers, itertools (and more_itertools for bonus points, although it is not in the standard library), str, functools, math, random and pathlib as a start.

edited Aug 13 at 9:05

answered Aug 13 at 8:43

Graipher

30k6 gold badges47 silver badges104 bronze badges

add a comment |

For long blocks of informational text at the beginning of a class or method definition, you should use docstrings instead of comments, as per PEP 8. This makes your descriptions automatically available from the help() function.

Example:

def foo(bar):
 """Foo does something.
 This description can span multiple lines
 """
 return bar

answered Aug 13 at 14:50

Delya Erricson

412 bronze badges

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f225992%2fmarkov-chain-sentence-generator-in-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

chooseNextWord distorts the probabilities.

For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.

A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.

The code may benefit from using defaultdict rather than a plain dictionaries. Lesser ifs is better.

Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.

edited Aug 13 at 3:02

answered Aug 12 at 18:39

vnp

43k2 gold badges36 silver badges110 bronze badges

1

$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00

5

$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26

1

$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
$endgroup$
– Hashew
Aug 12 at 19:33

2

$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03

1

$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
$endgroup$
– Hashew
Aug 13 at 9:23

add a comment |

chooseNextWord distorts the probabilities.

For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.

A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.

The code may benefit from using defaultdict rather than a plain dictionaries. Lesser ifs is better.

Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.

edited Aug 13 at 3:02

answered Aug 12 at 18:39

vnp

43k2 gold badges36 silver badges110 bronze badges

1

$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00

5

$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26

1

$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
$endgroup$
– Hashew
Aug 12 at 19:33

2

$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03

1

$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
$endgroup$
– Hashew
Aug 13 at 9:23

add a comment |

chooseNextWord distorts the probabilities.

For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.

A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.

The code may benefit from using defaultdict rather than a plain dictionaries. Lesser ifs is better.

Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.

edited Aug 13 at 3:02

answered Aug 12 at 18:39

vnp

43k2 gold badges36 silver badges110 bronze badges

chooseNextWord distorts the probabilities.

For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.

A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.

The code may benefit from using defaultdict rather than a plain dictionaries. Lesser ifs is better.

Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.

edited Aug 13 at 3:02

answered Aug 12 at 18:39

vnp

43k2 gold badges36 silver badges110 bronze badges

edited Aug 13 at 3:02

answered Aug 12 at 18:39

vnp

43k2 gold badges36 silver badges110 bronze badges

answered Aug 12 at 18:39

vnp

43k2 gold badges36 silver badges110 bronze badges

answered Aug 12 at 18:39

vnp

43k2 gold badges36 silver badges110 bronze badges

1

$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00

5

$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26

1

$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
$endgroup$
– Hashew
Aug 12 at 19:33

2

$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03

1

$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
$endgroup$
– Hashew
Aug 13 at 9:23

add a comment |

1

$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00

5

$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26

1

$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
$endgroup$
– Hashew
Aug 12 at 19:33

2

$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03

1

$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
$endgroup$
– Hashew
Aug 13 at 9:23

I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.

– Hashew
Aug 12 at 19:00

@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.

– vnp
Aug 12 at 19:26

I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is

P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen)

which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?

– Hashew
Aug 12 at 19:33

I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is

P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen)

@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.

– vnp
Aug 13 at 3:03

Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.

– Hashew
Aug 13 at 9:23

add a comment |

endTerms

the standard library

collections.defaultdict provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.

collections.Counter provides a dictionary that counts things.

random.choices selects items from a population according to specified weights.

import collections
import random

class MarkovChain:
 START = ""

 def __init__(self, file):
 adjList = collections.defaultdict(collections.Counter)

 # this inserts START into the defaultdict
 adjList[MarkovChain.START]

 prev = MarkovChain.START

 for line in file:
 for word in line.split():
 endsTerm = word[-1] in ('.', '?', '!')

 if (endsTerm):
 word = word[:-1]

 adjList[prev].update([word])

 if endsTerm:
 # mark the end of a sentence
 adjList[word].update([MarkovChain.START])
 prev = MarkovChain.START
 else:
 prev = word

 # convert defaultdict to a regular dict
 # the values are a tuple: ([follow words], [counts])
 # for use in random.choices() in chooseNextWord()
 self.adjList = k:(list(v.keys()), list(v.values()))
 for k,v in adjList.items()

 #print(self.adjList)


 def chooseNextWord(self, word):
 # random.choices returns a list, hence the [0]
 return random.choices(*self.adjList[word])[0]


 def genSentence(self, limit = 0):
 sentence = []
 curWord = MarkovChain.START

 while True:
 curWord = self.chooseNextWord(curWord)
 sentence.append(curWord)

 if 0 < limit < len(sentence) or curWord == MarkovChain.START:
 break

 return ' '.join(sentence).strip()

answered Aug 13 at 1:37

RootTwo

1,5943 silver badges9 bronze badges

add a comment |

endTerms

the standard library

collections.defaultdict provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.

collections.Counter provides a dictionary that counts things.

random.choices selects items from a population according to specified weights.

import collections
import random

class MarkovChain:
 START = ""

 def __init__(self, file):
 adjList = collections.defaultdict(collections.Counter)

 # this inserts START into the defaultdict
 adjList[MarkovChain.START]

 prev = MarkovChain.START

 for line in file:
 for word in line.split():
 endsTerm = word[-1] in ('.', '?', '!')

 if (endsTerm):
 word = word[:-1]

 adjList[prev].update([word])

 if endsTerm:
 # mark the end of a sentence
 adjList[word].update([MarkovChain.START])
 prev = MarkovChain.START
 else:
 prev = word

 # convert defaultdict to a regular dict
 # the values are a tuple: ([follow words], [counts])
 # for use in random.choices() in chooseNextWord()
 self.adjList = k:(list(v.keys()), list(v.values()))
 for k,v in adjList.items()

 #print(self.adjList)


 def chooseNextWord(self, word):
 # random.choices returns a list, hence the [0]
 return random.choices(*self.adjList[word])[0]


 def genSentence(self, limit = 0):
 sentence = []
 curWord = MarkovChain.START

 while True:
 curWord = self.chooseNextWord(curWord)
 sentence.append(curWord)

 if 0 < limit < len(sentence) or curWord == MarkovChain.START:
 break

 return ' '.join(sentence).strip()

answered Aug 13 at 1:37

RootTwo

1,5943 silver badges9 bronze badges

add a comment |

endTerms

the standard library

collections.defaultdict provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.

collections.Counter provides a dictionary that counts things.

random.choices selects items from a population according to specified weights.

import collections
import random

class MarkovChain:
 START = ""

 def __init__(self, file):
 adjList = collections.defaultdict(collections.Counter)

 # this inserts START into the defaultdict
 adjList[MarkovChain.START]

 prev = MarkovChain.START

 for line in file:
 for word in line.split():
 endsTerm = word[-1] in ('.', '?', '!')

 if (endsTerm):
 word = word[:-1]

 adjList[prev].update([word])

 if endsTerm:
 # mark the end of a sentence
 adjList[word].update([MarkovChain.START])
 prev = MarkovChain.START
 else:
 prev = word

 # convert defaultdict to a regular dict
 # the values are a tuple: ([follow words], [counts])
 # for use in random.choices() in chooseNextWord()
 self.adjList = k:(list(v.keys()), list(v.values()))
 for k,v in adjList.items()

 #print(self.adjList)


 def chooseNextWord(self, word):
 # random.choices returns a list, hence the [0]
 return random.choices(*self.adjList[word])[0]


 def genSentence(self, limit = 0):
 sentence = []
 curWord = MarkovChain.START

 while True:
 curWord = self.chooseNextWord(curWord)
 sentence.append(curWord)

 if 0 < limit < len(sentence) or curWord == MarkovChain.START:
 break

 return ' '.join(sentence).strip()

answered Aug 13 at 1:37

RootTwo

1,5943 silver badges9 bronze badges

endTerms

the standard library

collections.defaultdict provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.

collections.Counter provides a dictionary that counts things.

random.choices selects items from a population according to specified weights.

import collections
import random

class MarkovChain:
 START = ""

 def __init__(self, file):
 adjList = collections.defaultdict(collections.Counter)

 # this inserts START into the defaultdict
 adjList[MarkovChain.START]

 prev = MarkovChain.START

 for line in file:
 for word in line.split():
 endsTerm = word[-1] in ('.', '?', '!')

 if (endsTerm):
 word = word[:-1]

 adjList[prev].update([word])

 if endsTerm:
 # mark the end of a sentence
 adjList[word].update([MarkovChain.START])
 prev = MarkovChain.START
 else:
 prev = word

 # convert defaultdict to a regular dict
 # the values are a tuple: ([follow words], [counts])
 # for use in random.choices() in chooseNextWord()
 self.adjList = k:(list(v.keys()), list(v.values()))
 for k,v in adjList.items()

 #print(self.adjList)


 def chooseNextWord(self, word):
 # random.choices returns a list, hence the [0]
 return random.choices(*self.adjList[word])[0]


 def genSentence(self, limit = 0):
 sentence = []
 curWord = MarkovChain.START

 while True:
 curWord = self.chooseNextWord(curWord)
 sentence.append(curWord)

 if 0 < limit < len(sentence) or curWord == MarkovChain.START:
 break

 return ' '.join(sentence).strip()

answered Aug 13 at 1:37

RootTwo

1,5943 silver badges9 bronze badges

answered Aug 13 at 1:37

RootTwo

1,5943 silver badges9 bronze badges

answered Aug 13 at 1:37

RootTwo

1,5943 silver badges9 bronze badges

answered Aug 13 at 1:37

RootTwo

1,5943 silver badges9 bronze badges

add a comment |

Style

The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.

Avoid variables in all caps, like START (unless they are constants).

Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.

answered Aug 13 at 6:29

qwr

3382 silver badges11 bronze badges

add a comment |

Style

The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.

Avoid variables in all caps, like START (unless they are constants).

Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.

answered Aug 13 at 6:29

qwr

3382 silver badges11 bronze badges

add a comment |

Style

The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.

Avoid variables in all caps, like START (unless they are constants).

Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.

answered Aug 13 at 6:29

qwr

3382 silver badges11 bronze badges

Style

The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.

Avoid variables in all caps, like START (unless they are constants).

Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.

answered Aug 13 at 6:29

qwr

3382 silver badges11 bronze badges

answered Aug 13 at 6:29

qwr

3382 silver badges11 bronze badges

answered Aug 13 at 6:29

qwr

3382 silver badges11 bronze badges

answered Aug 13 at 6:29

qwr

3382 silver badges11 bronze badges

add a comment |

Let's look at this one line, which shows quite a few areas where your code could be improved:

if (word in self.adjList[prev]):

As noted elsewhere, Python has an official style-guide, PEP8, which recommends using lower_case for variables and functions.

Basically all Python keywords (at least if, elif, else, for, while, in, with, except, return, yield, yield from, print in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses.

Whenever you do if X in Y you should know what the time complexity of that statement is. For tuple, str and list it is just a linear scan, so it is linear. For set and dict it is (amortized) constant due to the use of hashes.

adjList is not actually a list, but a dict! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call it adjacancies.

Take an afternoon and work through the Python standard library. I would recommend at least collections (where you can find defaultdict and Counter as recommend Ed in other answers, itertools (and more_itertools for bonus points, although it is not in the standard library), str, functools, math, random and pathlib as a start.

edited Aug 13 at 9:05

answered Aug 13 at 8:43

Graipher

30k6 gold badges47 silver badges104 bronze badges

add a comment |

Let's look at this one line, which shows quite a few areas where your code could be improved:

if (word in self.adjList[prev]):

As noted elsewhere, Python has an official style-guide, PEP8, which recommends using lower_case for variables and functions.

Basically all Python keywords (at least if, elif, else, for, while, in, with, except, return, yield, yield from, print in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses.

Whenever you do if X in Y you should know what the time complexity of that statement is. For tuple, str and list it is just a linear scan, so it is linear. For set and dict it is (amortized) constant due to the use of hashes.

adjList is not actually a list, but a dict! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call it adjacancies.

Take an afternoon and work through the Python standard library. I would recommend at least collections (where you can find defaultdict and Counter as recommend Ed in other answers, itertools (and more_itertools for bonus points, although it is not in the standard library), str, functools, math, random and pathlib as a start.

edited Aug 13 at 9:05

answered Aug 13 at 8:43

Graipher

30k6 gold badges47 silver badges104 bronze badges

add a comment |

Let's look at this one line, which shows quite a few areas where your code could be improved:

if (word in self.adjList[prev]):

As noted elsewhere, Python has an official style-guide, PEP8, which recommends using lower_case for variables and functions.

Basically all Python keywords (at least if, elif, else, for, while, in, with, except, return, yield, yield from, print in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses.

Whenever you do if X in Y you should know what the time complexity of that statement is. For tuple, str and list it is just a linear scan, so it is linear. For set and dict it is (amortized) constant due to the use of hashes.

adjList is not actually a list, but a dict! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call it adjacancies.

Take an afternoon and work through the Python standard library. I would recommend at least collections (where you can find defaultdict and Counter as recommend Ed in other answers, itertools (and more_itertools for bonus points, although it is not in the standard library), str, functools, math, random and pathlib as a start.

edited Aug 13 at 9:05

answered Aug 13 at 8:43

Graipher

30k6 gold badges47 silver badges104 bronze badges

Let's look at this one line, which shows quite a few areas where your code could be improved:

if (word in self.adjList[prev]):

As noted elsewhere, Python has an official style-guide, PEP8, which recommends using lower_case for variables and functions.

Basically all Python keywords (at least if, elif, else, for, while, in, with, except, return, yield, yield from, print in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses.

Whenever you do if X in Y you should know what the time complexity of that statement is. For tuple, str and list it is just a linear scan, so it is linear. For set and dict it is (amortized) constant due to the use of hashes.

adjList is not actually a list, but a dict! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call it adjacancies.

Take an afternoon and work through the Python standard library. I would recommend at least collections (where you can find defaultdict and Counter as recommend Ed in other answers, itertools (and more_itertools for bonus points, although it is not in the standard library), str, functools, math, random and pathlib as a start.

edited Aug 13 at 9:05

answered Aug 13 at 8:43

Graipher

30k6 gold badges47 silver badges104 bronze badges

edited Aug 13 at 9:05

answered Aug 13 at 8:43

Graipher

30k6 gold badges47 silver badges104 bronze badges

answered Aug 13 at 8:43

Graipher

30k6 gold badges47 silver badges104 bronze badges

answered Aug 13 at 8:43

Graipher

30k6 gold badges47 silver badges104 bronze badges

add a comment |

Example:

def foo(bar):
 """Foo does something.
 This description can span multiple lines
 """
 return bar

answered Aug 13 at 14:50

Delya Erricson

412 bronze badges

add a comment |

Example:

def foo(bar):
 """Foo does something.
 This description can span multiple lines
 """
 return bar

answered Aug 13 at 14:50

Delya Erricson

412 bronze badges

add a comment |

Example:

def foo(bar):
 """Foo does something.
 This description can span multiple lines
 """
 return bar

answered Aug 13 at 14:50

Delya Erricson

412 bronze badges

Example:

def foo(bar):
 """Foo does something.
 This description can span multiple lines
 """
 return bar

answered Aug 13 at 14:50

Delya Erricson

412 bronze badges

answered Aug 13 at 14:50

Delya Erricson

412 bronze badges

answered Aug 13 at 14:50

Delya Erricson

412 bronze badges

answered Aug 13 at 14:50

Delya Erricson

412 bronze badges

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

9zHRmb0ShA hAs9nF0TzIOLGIPw,2HfN A9,q21m,s

搜尋此網誌

Ttdfjt

text-gen.py

text-gen.py

text-gen.py

text-gen.py

5 Answers
5

endTerms

the standard library

Style

Your Answer

Post as a guest

5 Answers
5

5 Answers
5

endTerms

the standard library

endTerms

the standard library

endTerms

the standard library

endTerms

the standard library

Style

Style

Style

Style

Post as a guest

Popular posts from this blog

text-gen.py

text-gen.py

text-gen.py

text-gen.py

5 Answers 5

endTerms

the standard library

Style

Your Answer

Sign up or log in

Post as a guest

Post as a guest

5 Answers 5

5 Answers 5

endTerms

the standard library

endTerms

the standard library

endTerms

the standard library

endTerms

the standard library

Style

Style

Style

Style

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

5 Answers
5

5 Answers
5

5 Answers
5