Markov-chain sentence generator in PythonMarkov Chain generatorShakespeare and dictionariesMarkov chain-based word salad generatorMarkov chain text generation in PythonCumulative transition matrix Markov Chain simulationMarkov Chain in PythonSentence generation using Markov ChainsCalculate probability of word occurenceGiven a string and a word dict, find all possible sentences
Network helper class with retry logic on failure
How do I make my image comply with the requirements of this photography competition?
How would a Creature that needs to be seen by Humans evolve?
How to respectfully refuse to assist co-workers with IT issues?
Could George I (of Great Britain) speak English?
Why do all fields in a QFT transform like *irreducible* representations of some group?
Another solution to create a set with two conditions
What does zitch dog mean?
Compelling story with the world as a villain
Numbers Decrease while Letters Increase
"There were either twelve sexes or none."
moon structure ownership
Did anyone try to find the little box that held Professor Moriarty and his wife after the crash?
What would make bones be of different colors?
Was there ever a treaty between 2 entities with significantly different translations to the detriment of one party?
Why is there a difference between predicting on Validation set and Test set?
Improving Performance of an XY Monte Carlo
Can you answer this logic non-math puzzle question correctly?
Can I add a link to my website in a paper I'm coauthoring?
Papers on arXiv solving the same problem at the same time
Duplicate instruments in unison in an orchestra
How do I prevent other wifi networks from showing up on my computer?
Where can/should I, as a high schooler, publish a paper regarding the derivation of a formula?
Is gzip atomic?
Markov-chain sentence generator in Python
Markov Chain generatorShakespeare and dictionariesMarkov chain-based word salad generatorMarkov chain text generation in PythonCumulative transition matrix Markov Chain simulationMarkov Chain in PythonSentence generation using Markov ChainsCalculate probability of word occurenceGiven a string and a word dict, find all possible sentences
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I wrote a Markov-chain based sentence generator as my first non-trivial Python program. I mainly used C before, so I probably have ignored a lot of Python conventions and features, so any advice would be appreciated.
text-gen.py
import sys
import random
class MarkovChain:
# Class constant that serves as an initial state for the Markov chain
START = ""
# The Markov chain is modelled as a directed graph,
# with the START state acting as the only source,
# and the tranisition probabilities as the graph weights.
#
# The graph is implemented using an adjacency list,
# which in turn is implemented as a dictionary of dictionaries.
#
# self.adjList is a dictionary keyed by all words of the text
# or START (the states). For each key/state, it contains
# another dictionary indexed by the words of the text
# that succeed the key in the text (the next states in the chain),
# and for each of those words/next states the dictionary contains
# the transition probability from the present state to them.
#
# This implementation was chosen because of it being easy to code,
# and offering an easy way to iterate on both the probabilities and
# the words/next states of each dictionary using items().
def __init__(self, file):
self.adjList =
# The totals dictionary is used in calculating the probabilities,
# for every word in the text/chain state it contains the total
# number of transitions from it to another state.
totals =
# Start by insering the initial state to the structures
self.adjList[MarkovChain.START] =
totals[MarkovChain.START] = 0
# prev: Contains the previously encountered word or the START state,
# initialized to the START state.
prev = MarkovChain.START
for line in file:
for word in line.split():
# If the word ends with a terminating punctuation mark,
# ignore the mark, and treat the word as a terminating state as
# it does not preceed another word in the current sentence.
# So prev is set to START, in order for the text model
# to account for the fact that some words start sentences
# more frequently than others (not all words are next states of START).
endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
if (endsTerm):
word = word[0:-1]
# If this is the first time the word is encountered,
# add it to the adjacency list, and initialize its dictionary
# and transition total.
if (word not in self.adjList):
self.adjList[word] =
totals[word] = 0
# If this is the first time the prev->word transition
# was detected, initialize the prev->word transition frequency to 1,
# else increment it.
if (word in self.adjList[prev]):
self.adjList[prev][word] += 1
else:
self.adjList[prev][word] = 1
# There is a prev->word state transition, so increment
# the total transition number of the prev state.
totals[prev] += 1
if (endsTerm):
prev = START
# Using total, convert the transition frequencies
# to transition probabilities.
for word, neighbors in self.adjList.items():
for name in neighbors:
neighbors[name] /= totals[word]
# chooseNextWord: Chooses the next state/word,
# by sampling the non uniform transition probability distribution
# of the current word/state.
def chooseNextWord(self, curWord):
# Convert the dict_keys object to a list
# to use indexing
nextWords = list(self.adjList[curWord].keys())
# Sampling is done through linear search.
for word in nextWords[0:-1]:
prob = self.adjList[curWord][word]
roll = random.random()
if (roll <= prob):
return word
# If none of the first N-1 words were chosen,
# only the last one was left.
return nextWords[-1]
# genSentence: Generates a sentence. If a positive
# limit is not provided by the caller, the sentences grow to
# an arbitrary number of words, until the last word of a sentence/a terminal state
# is reached.
def genSentence(self, limit = 0):
sentence = ""
curWord = self.chooseNextWord(MarkovChain.START)
sentence += curWord + " "
if (limit > 0):
wordsUsed = 1
while (wordsUsed < limit and self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
wordsUsed += 1
else:
while (self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
return sentence
if (__name__ == "__main__"):
if (len(sys.argv) < 3):
print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
sys.exit(1)
try:
with open(sys.argv[1], "r") as f:
markov = MarkovChain(f)
except OSError as error:
print(error.strerror)
sys.exit(1)
# Generate and print as many sentences as asked.
for k in range(0, int(sys.argv[2])):
print(markov.genSentence(20) + "n")
python python-3.x statistics markov-chain
$endgroup$
add a comment |
$begingroup$
I wrote a Markov-chain based sentence generator as my first non-trivial Python program. I mainly used C before, so I probably have ignored a lot of Python conventions and features, so any advice would be appreciated.
text-gen.py
import sys
import random
class MarkovChain:
# Class constant that serves as an initial state for the Markov chain
START = ""
# The Markov chain is modelled as a directed graph,
# with the START state acting as the only source,
# and the tranisition probabilities as the graph weights.
#
# The graph is implemented using an adjacency list,
# which in turn is implemented as a dictionary of dictionaries.
#
# self.adjList is a dictionary keyed by all words of the text
# or START (the states). For each key/state, it contains
# another dictionary indexed by the words of the text
# that succeed the key in the text (the next states in the chain),
# and for each of those words/next states the dictionary contains
# the transition probability from the present state to them.
#
# This implementation was chosen because of it being easy to code,
# and offering an easy way to iterate on both the probabilities and
# the words/next states of each dictionary using items().
def __init__(self, file):
self.adjList =
# The totals dictionary is used in calculating the probabilities,
# for every word in the text/chain state it contains the total
# number of transitions from it to another state.
totals =
# Start by insering the initial state to the structures
self.adjList[MarkovChain.START] =
totals[MarkovChain.START] = 0
# prev: Contains the previously encountered word or the START state,
# initialized to the START state.
prev = MarkovChain.START
for line in file:
for word in line.split():
# If the word ends with a terminating punctuation mark,
# ignore the mark, and treat the word as a terminating state as
# it does not preceed another word in the current sentence.
# So prev is set to START, in order for the text model
# to account for the fact that some words start sentences
# more frequently than others (not all words are next states of START).
endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
if (endsTerm):
word = word[0:-1]
# If this is the first time the word is encountered,
# add it to the adjacency list, and initialize its dictionary
# and transition total.
if (word not in self.adjList):
self.adjList[word] =
totals[word] = 0
# If this is the first time the prev->word transition
# was detected, initialize the prev->word transition frequency to 1,
# else increment it.
if (word in self.adjList[prev]):
self.adjList[prev][word] += 1
else:
self.adjList[prev][word] = 1
# There is a prev->word state transition, so increment
# the total transition number of the prev state.
totals[prev] += 1
if (endsTerm):
prev = START
# Using total, convert the transition frequencies
# to transition probabilities.
for word, neighbors in self.adjList.items():
for name in neighbors:
neighbors[name] /= totals[word]
# chooseNextWord: Chooses the next state/word,
# by sampling the non uniform transition probability distribution
# of the current word/state.
def chooseNextWord(self, curWord):
# Convert the dict_keys object to a list
# to use indexing
nextWords = list(self.adjList[curWord].keys())
# Sampling is done through linear search.
for word in nextWords[0:-1]:
prob = self.adjList[curWord][word]
roll = random.random()
if (roll <= prob):
return word
# If none of the first N-1 words were chosen,
# only the last one was left.
return nextWords[-1]
# genSentence: Generates a sentence. If a positive
# limit is not provided by the caller, the sentences grow to
# an arbitrary number of words, until the last word of a sentence/a terminal state
# is reached.
def genSentence(self, limit = 0):
sentence = ""
curWord = self.chooseNextWord(MarkovChain.START)
sentence += curWord + " "
if (limit > 0):
wordsUsed = 1
while (wordsUsed < limit and self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
wordsUsed += 1
else:
while (self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
return sentence
if (__name__ == "__main__"):
if (len(sys.argv) < 3):
print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
sys.exit(1)
try:
with open(sys.argv[1], "r") as f:
markov = MarkovChain(f)
except OSError as error:
print(error.strerror)
sys.exit(1)
# Generate and print as many sentences as asked.
for k in range(0, int(sys.argv[2])):
print(markov.genSentence(20) + "n")
python python-3.x statistics markov-chain
$endgroup$
$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57
$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21
add a comment |
$begingroup$
I wrote a Markov-chain based sentence generator as my first non-trivial Python program. I mainly used C before, so I probably have ignored a lot of Python conventions and features, so any advice would be appreciated.
text-gen.py
import sys
import random
class MarkovChain:
# Class constant that serves as an initial state for the Markov chain
START = ""
# The Markov chain is modelled as a directed graph,
# with the START state acting as the only source,
# and the tranisition probabilities as the graph weights.
#
# The graph is implemented using an adjacency list,
# which in turn is implemented as a dictionary of dictionaries.
#
# self.adjList is a dictionary keyed by all words of the text
# or START (the states). For each key/state, it contains
# another dictionary indexed by the words of the text
# that succeed the key in the text (the next states in the chain),
# and for each of those words/next states the dictionary contains
# the transition probability from the present state to them.
#
# This implementation was chosen because of it being easy to code,
# and offering an easy way to iterate on both the probabilities and
# the words/next states of each dictionary using items().
def __init__(self, file):
self.adjList =
# The totals dictionary is used in calculating the probabilities,
# for every word in the text/chain state it contains the total
# number of transitions from it to another state.
totals =
# Start by insering the initial state to the structures
self.adjList[MarkovChain.START] =
totals[MarkovChain.START] = 0
# prev: Contains the previously encountered word or the START state,
# initialized to the START state.
prev = MarkovChain.START
for line in file:
for word in line.split():
# If the word ends with a terminating punctuation mark,
# ignore the mark, and treat the word as a terminating state as
# it does not preceed another word in the current sentence.
# So prev is set to START, in order for the text model
# to account for the fact that some words start sentences
# more frequently than others (not all words are next states of START).
endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
if (endsTerm):
word = word[0:-1]
# If this is the first time the word is encountered,
# add it to the adjacency list, and initialize its dictionary
# and transition total.
if (word not in self.adjList):
self.adjList[word] =
totals[word] = 0
# If this is the first time the prev->word transition
# was detected, initialize the prev->word transition frequency to 1,
# else increment it.
if (word in self.adjList[prev]):
self.adjList[prev][word] += 1
else:
self.adjList[prev][word] = 1
# There is a prev->word state transition, so increment
# the total transition number of the prev state.
totals[prev] += 1
if (endsTerm):
prev = START
# Using total, convert the transition frequencies
# to transition probabilities.
for word, neighbors in self.adjList.items():
for name in neighbors:
neighbors[name] /= totals[word]
# chooseNextWord: Chooses the next state/word,
# by sampling the non uniform transition probability distribution
# of the current word/state.
def chooseNextWord(self, curWord):
# Convert the dict_keys object to a list
# to use indexing
nextWords = list(self.adjList[curWord].keys())
# Sampling is done through linear search.
for word in nextWords[0:-1]:
prob = self.adjList[curWord][word]
roll = random.random()
if (roll <= prob):
return word
# If none of the first N-1 words were chosen,
# only the last one was left.
return nextWords[-1]
# genSentence: Generates a sentence. If a positive
# limit is not provided by the caller, the sentences grow to
# an arbitrary number of words, until the last word of a sentence/a terminal state
# is reached.
def genSentence(self, limit = 0):
sentence = ""
curWord = self.chooseNextWord(MarkovChain.START)
sentence += curWord + " "
if (limit > 0):
wordsUsed = 1
while (wordsUsed < limit and self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
wordsUsed += 1
else:
while (self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
return sentence
if (__name__ == "__main__"):
if (len(sys.argv) < 3):
print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
sys.exit(1)
try:
with open(sys.argv[1], "r") as f:
markov = MarkovChain(f)
except OSError as error:
print(error.strerror)
sys.exit(1)
# Generate and print as many sentences as asked.
for k in range(0, int(sys.argv[2])):
print(markov.genSentence(20) + "n")
python python-3.x statistics markov-chain
$endgroup$
I wrote a Markov-chain based sentence generator as my first non-trivial Python program. I mainly used C before, so I probably have ignored a lot of Python conventions and features, so any advice would be appreciated.
text-gen.py
import sys
import random
class MarkovChain:
# Class constant that serves as an initial state for the Markov chain
START = ""
# The Markov chain is modelled as a directed graph,
# with the START state acting as the only source,
# and the tranisition probabilities as the graph weights.
#
# The graph is implemented using an adjacency list,
# which in turn is implemented as a dictionary of dictionaries.
#
# self.adjList is a dictionary keyed by all words of the text
# or START (the states). For each key/state, it contains
# another dictionary indexed by the words of the text
# that succeed the key in the text (the next states in the chain),
# and for each of those words/next states the dictionary contains
# the transition probability from the present state to them.
#
# This implementation was chosen because of it being easy to code,
# and offering an easy way to iterate on both the probabilities and
# the words/next states of each dictionary using items().
def __init__(self, file):
self.adjList =
# The totals dictionary is used in calculating the probabilities,
# for every word in the text/chain state it contains the total
# number of transitions from it to another state.
totals =
# Start by insering the initial state to the structures
self.adjList[MarkovChain.START] =
totals[MarkovChain.START] = 0
# prev: Contains the previously encountered word or the START state,
# initialized to the START state.
prev = MarkovChain.START
for line in file:
for word in line.split():
# If the word ends with a terminating punctuation mark,
# ignore the mark, and treat the word as a terminating state as
# it does not preceed another word in the current sentence.
# So prev is set to START, in order for the text model
# to account for the fact that some words start sentences
# more frequently than others (not all words are next states of START).
endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
if (endsTerm):
word = word[0:-1]
# If this is the first time the word is encountered,
# add it to the adjacency list, and initialize its dictionary
# and transition total.
if (word not in self.adjList):
self.adjList[word] =
totals[word] = 0
# If this is the first time the prev->word transition
# was detected, initialize the prev->word transition frequency to 1,
# else increment it.
if (word in self.adjList[prev]):
self.adjList[prev][word] += 1
else:
self.adjList[prev][word] = 1
# There is a prev->word state transition, so increment
# the total transition number of the prev state.
totals[prev] += 1
if (endsTerm):
prev = START
# Using total, convert the transition frequencies
# to transition probabilities.
for word, neighbors in self.adjList.items():
for name in neighbors:
neighbors[name] /= totals[word]
# chooseNextWord: Chooses the next state/word,
# by sampling the non uniform transition probability distribution
# of the current word/state.
def chooseNextWord(self, curWord):
# Convert the dict_keys object to a list
# to use indexing
nextWords = list(self.adjList[curWord].keys())
# Sampling is done through linear search.
for word in nextWords[0:-1]:
prob = self.adjList[curWord][word]
roll = random.random()
if (roll <= prob):
return word
# If none of the first N-1 words were chosen,
# only the last one was left.
return nextWords[-1]
# genSentence: Generates a sentence. If a positive
# limit is not provided by the caller, the sentences grow to
# an arbitrary number of words, until the last word of a sentence/a terminal state
# is reached.
def genSentence(self, limit = 0):
sentence = ""
curWord = self.chooseNextWord(MarkovChain.START)
sentence += curWord + " "
if (limit > 0):
wordsUsed = 1
while (wordsUsed < limit and self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
wordsUsed += 1
else:
while (self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
return sentence
if (__name__ == "__main__"):
if (len(sys.argv) < 3):
print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
sys.exit(1)
try:
with open(sys.argv[1], "r") as f:
markov = MarkovChain(f)
except OSError as error:
print(error.strerror)
sys.exit(1)
# Generate and print as many sentences as asked.
for k in range(0, int(sys.argv[2])):
print(markov.genSentence(20) + "n")
python python-3.x statistics markov-chain
python python-3.x statistics markov-chain
edited Aug 14 at 6:58
Mast
7,9157 gold badges38 silver badges92 bronze badges
7,9157 gold badges38 silver badges92 bronze badges
asked Aug 12 at 18:06
HashewHashew
1011 silver badge6 bronze badges
1011 silver badge6 bronze badges
$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57
$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21
add a comment |
$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57
$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21
$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57
$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57
$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21
$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21
add a comment |
5 Answers
5
active
oldest
votes
$begingroup$
chooseNextWord
distorts the probabilities.For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.
A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.
The code may benefit from using
defaultdict
rather than a plain dictionaries. Lesserif
s is better.Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.
$endgroup$
1
$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00
5
$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26
1
$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't isP = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen)
which is equal to(probability that the word was chosen) / (probability that the previous weren't)
. So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
$endgroup$
– Hashew
Aug 12 at 19:33
2
$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03
1
$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1
) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
$endgroup$
– Hashew
Aug 13 at 9:23
add a comment |
$begingroup$
endTerms
When a word ends with an endTerm, think you need to include an START or END symbol in adjList. Most words can appear anywhere in a sentence. So it is unlikely that you can end a sentence only when words don't have any follow-on words. Include the START/END symbol in adjList and the Markov process can also end a sentence.
the standard library
collections.defaultdict
provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.
collections.Counter
provides a dictionary that counts things.
random.choices
selects items from a population according to specified weights.
import collections
import random
class MarkovChain:
START = ""
def __init__(self, file):
adjList = collections.defaultdict(collections.Counter)
# this inserts START into the defaultdict
adjList[MarkovChain.START]
prev = MarkovChain.START
for line in file:
for word in line.split():
endsTerm = word[-1] in ('.', '?', '!')
if (endsTerm):
word = word[:-1]
adjList[prev].update([word])
if endsTerm:
# mark the end of a sentence
adjList[word].update([MarkovChain.START])
prev = MarkovChain.START
else:
prev = word
# convert defaultdict to a regular dict
# the values are a tuple: ([follow words], [counts])
# for use in random.choices() in chooseNextWord()
self.adjList = k:(list(v.keys()), list(v.values()))
for k,v in adjList.items()
#print(self.adjList)
def chooseNextWord(self, word):
# random.choices returns a list, hence the [0]
return random.choices(*self.adjList[word])[0]
def genSentence(self, limit = 0):
sentence = []
curWord = MarkovChain.START
while True:
curWord = self.chooseNextWord(curWord)
sentence.append(curWord)
if 0 < limit < len(sentence) or curWord == MarkovChain.START:
break
return ' '.join(sentence).strip()
$endgroup$
add a comment |
$begingroup$
Style
- The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.
- Avoid variables in all caps, like
START
(unless they are constants). - Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.
$endgroup$
add a comment |
$begingroup$
Let's look at this one line, which shows quite a few areas where your code could be improved:
if (word in self.adjList[prev]):
- As noted elsewhere, Python has an official style-guide, PEP8, which recommends using
lower_case
for variables and functions. - Basically all Python keywords (at least
if
,elif
,else
,for
,while
,in
,with
,except
,return
,yield
,yield from
,print
in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses. - Whenever you do
if X in Y
you should know what the time complexity of that statement is. Fortuple
,str
andlist
it is just a linear scan, so it is linear. Forset
anddict
it is (amortized) constant due to the use of hashes. adjList
is not actually alist
, but adict
! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call itadjacancies
.- Take an afternoon and work through the Python standard library. I would recommend at least
collections
(where you can finddefaultdict
andCounter
as recommend Ed in other answers,itertools
(andmore_itertools
for bonus points, although it is not in the standard library),str
,functools
,math
,random
andpathlib
as a start.
$endgroup$
add a comment |
$begingroup$
For long blocks of informational text at the beginning of a class or method definition, you should use docstrings instead of comments, as per PEP 8. This makes your descriptions automatically available from the help() function.
Example:
def foo(bar):
"""Foo does something.
This description can span multiple lines
"""
return bar
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f225992%2fmarkov-chain-sentence-generator-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
chooseNextWord
distorts the probabilities.For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.
A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.
The code may benefit from using
defaultdict
rather than a plain dictionaries. Lesserif
s is better.Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.
$endgroup$
1
$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00
5
$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26
1
$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't isP = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen)
which is equal to(probability that the word was chosen) / (probability that the previous weren't)
. So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
$endgroup$
– Hashew
Aug 12 at 19:33
2
$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03
1
$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1
) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
$endgroup$
– Hashew
Aug 13 at 9:23
add a comment |
$begingroup$
chooseNextWord
distorts the probabilities.For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.
A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.
The code may benefit from using
defaultdict
rather than a plain dictionaries. Lesserif
s is better.Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.
$endgroup$
1
$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00
5
$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26
1
$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't isP = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen)
which is equal to(probability that the word was chosen) / (probability that the previous weren't)
. So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
$endgroup$
– Hashew
Aug 12 at 19:33
2
$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03
1
$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1
) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
$endgroup$
– Hashew
Aug 13 at 9:23
add a comment |
$begingroup$
chooseNextWord
distorts the probabilities.For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.
A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.
The code may benefit from using
defaultdict
rather than a plain dictionaries. Lesserif
s is better.Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.
$endgroup$
chooseNextWord
distorts the probabilities.For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.
A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.
The code may benefit from using
defaultdict
rather than a plain dictionaries. Lesserif
s is better.Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.
edited Aug 13 at 3:02
answered Aug 12 at 18:39
vnpvnp
43k2 gold badges36 silver badges110 bronze badges
43k2 gold badges36 silver badges110 bronze badges
1
$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00
5
$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26
1
$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't isP = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen)
which is equal to(probability that the word was chosen) / (probability that the previous weren't)
. So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
$endgroup$
– Hashew
Aug 12 at 19:33
2
$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03
1
$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1
) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
$endgroup$
– Hashew
Aug 13 at 9:23
add a comment |
1
$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00
5
$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26
1
$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't isP = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen)
which is equal to(probability that the word was chosen) / (probability that the previous weren't)
. So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
$endgroup$
– Hashew
Aug 12 at 19:33
2
$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03
1
$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1
) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
$endgroup$
– Hashew
Aug 13 at 9:23
1
1
$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00
$begingroup$
I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
$endgroup$
– Hashew
Aug 12 at 19:00
5
5
$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26
$begingroup$
@Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
$endgroup$
– vnp
Aug 12 at 19:26
1
1
$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is
P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen)
which is equal to (probability that the word was chosen) / (probability that the previous weren't)
. So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?$endgroup$
– Hashew
Aug 12 at 19:33
$begingroup$
I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is
P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen)
which is equal to (probability that the word was chosen) / (probability that the previous weren't)
. So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?$endgroup$
– Hashew
Aug 12 at 19:33
2
2
$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03
$begingroup$
@Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
$endgroup$
– vnp
Aug 13 at 3:03
1
1
$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (
1/3, 1/2, 1
) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.$endgroup$
– Hashew
Aug 13 at 9:23
$begingroup$
Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (
1/3, 1/2, 1
) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.$endgroup$
– Hashew
Aug 13 at 9:23
add a comment |
$begingroup$
endTerms
When a word ends with an endTerm, think you need to include an START or END symbol in adjList. Most words can appear anywhere in a sentence. So it is unlikely that you can end a sentence only when words don't have any follow-on words. Include the START/END symbol in adjList and the Markov process can also end a sentence.
the standard library
collections.defaultdict
provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.
collections.Counter
provides a dictionary that counts things.
random.choices
selects items from a population according to specified weights.
import collections
import random
class MarkovChain:
START = ""
def __init__(self, file):
adjList = collections.defaultdict(collections.Counter)
# this inserts START into the defaultdict
adjList[MarkovChain.START]
prev = MarkovChain.START
for line in file:
for word in line.split():
endsTerm = word[-1] in ('.', '?', '!')
if (endsTerm):
word = word[:-1]
adjList[prev].update([word])
if endsTerm:
# mark the end of a sentence
adjList[word].update([MarkovChain.START])
prev = MarkovChain.START
else:
prev = word
# convert defaultdict to a regular dict
# the values are a tuple: ([follow words], [counts])
# for use in random.choices() in chooseNextWord()
self.adjList = k:(list(v.keys()), list(v.values()))
for k,v in adjList.items()
#print(self.adjList)
def chooseNextWord(self, word):
# random.choices returns a list, hence the [0]
return random.choices(*self.adjList[word])[0]
def genSentence(self, limit = 0):
sentence = []
curWord = MarkovChain.START
while True:
curWord = self.chooseNextWord(curWord)
sentence.append(curWord)
if 0 < limit < len(sentence) or curWord == MarkovChain.START:
break
return ' '.join(sentence).strip()
$endgroup$
add a comment |
$begingroup$
endTerms
When a word ends with an endTerm, think you need to include an START or END symbol in adjList. Most words can appear anywhere in a sentence. So it is unlikely that you can end a sentence only when words don't have any follow-on words. Include the START/END symbol in adjList and the Markov process can also end a sentence.
the standard library
collections.defaultdict
provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.
collections.Counter
provides a dictionary that counts things.
random.choices
selects items from a population according to specified weights.
import collections
import random
class MarkovChain:
START = ""
def __init__(self, file):
adjList = collections.defaultdict(collections.Counter)
# this inserts START into the defaultdict
adjList[MarkovChain.START]
prev = MarkovChain.START
for line in file:
for word in line.split():
endsTerm = word[-1] in ('.', '?', '!')
if (endsTerm):
word = word[:-1]
adjList[prev].update([word])
if endsTerm:
# mark the end of a sentence
adjList[word].update([MarkovChain.START])
prev = MarkovChain.START
else:
prev = word
# convert defaultdict to a regular dict
# the values are a tuple: ([follow words], [counts])
# for use in random.choices() in chooseNextWord()
self.adjList = k:(list(v.keys()), list(v.values()))
for k,v in adjList.items()
#print(self.adjList)
def chooseNextWord(self, word):
# random.choices returns a list, hence the [0]
return random.choices(*self.adjList[word])[0]
def genSentence(self, limit = 0):
sentence = []
curWord = MarkovChain.START
while True:
curWord = self.chooseNextWord(curWord)
sentence.append(curWord)
if 0 < limit < len(sentence) or curWord == MarkovChain.START:
break
return ' '.join(sentence).strip()
$endgroup$
add a comment |
$begingroup$
endTerms
When a word ends with an endTerm, think you need to include an START or END symbol in adjList. Most words can appear anywhere in a sentence. So it is unlikely that you can end a sentence only when words don't have any follow-on words. Include the START/END symbol in adjList and the Markov process can also end a sentence.
the standard library
collections.defaultdict
provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.
collections.Counter
provides a dictionary that counts things.
random.choices
selects items from a population according to specified weights.
import collections
import random
class MarkovChain:
START = ""
def __init__(self, file):
adjList = collections.defaultdict(collections.Counter)
# this inserts START into the defaultdict
adjList[MarkovChain.START]
prev = MarkovChain.START
for line in file:
for word in line.split():
endsTerm = word[-1] in ('.', '?', '!')
if (endsTerm):
word = word[:-1]
adjList[prev].update([word])
if endsTerm:
# mark the end of a sentence
adjList[word].update([MarkovChain.START])
prev = MarkovChain.START
else:
prev = word
# convert defaultdict to a regular dict
# the values are a tuple: ([follow words], [counts])
# for use in random.choices() in chooseNextWord()
self.adjList = k:(list(v.keys()), list(v.values()))
for k,v in adjList.items()
#print(self.adjList)
def chooseNextWord(self, word):
# random.choices returns a list, hence the [0]
return random.choices(*self.adjList[word])[0]
def genSentence(self, limit = 0):
sentence = []
curWord = MarkovChain.START
while True:
curWord = self.chooseNextWord(curWord)
sentence.append(curWord)
if 0 < limit < len(sentence) or curWord == MarkovChain.START:
break
return ' '.join(sentence).strip()
$endgroup$
endTerms
When a word ends with an endTerm, think you need to include an START or END symbol in adjList. Most words can appear anywhere in a sentence. So it is unlikely that you can end a sentence only when words don't have any follow-on words. Include the START/END symbol in adjList and the Markov process can also end a sentence.
the standard library
collections.defaultdict
provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.
collections.Counter
provides a dictionary that counts things.
random.choices
selects items from a population according to specified weights.
import collections
import random
class MarkovChain:
START = ""
def __init__(self, file):
adjList = collections.defaultdict(collections.Counter)
# this inserts START into the defaultdict
adjList[MarkovChain.START]
prev = MarkovChain.START
for line in file:
for word in line.split():
endsTerm = word[-1] in ('.', '?', '!')
if (endsTerm):
word = word[:-1]
adjList[prev].update([word])
if endsTerm:
# mark the end of a sentence
adjList[word].update([MarkovChain.START])
prev = MarkovChain.START
else:
prev = word
# convert defaultdict to a regular dict
# the values are a tuple: ([follow words], [counts])
# for use in random.choices() in chooseNextWord()
self.adjList = k:(list(v.keys()), list(v.values()))
for k,v in adjList.items()
#print(self.adjList)
def chooseNextWord(self, word):
# random.choices returns a list, hence the [0]
return random.choices(*self.adjList[word])[0]
def genSentence(self, limit = 0):
sentence = []
curWord = MarkovChain.START
while True:
curWord = self.chooseNextWord(curWord)
sentence.append(curWord)
if 0 < limit < len(sentence) or curWord == MarkovChain.START:
break
return ' '.join(sentence).strip()
answered Aug 13 at 1:37
RootTwoRootTwo
1,5943 silver badges9 bronze badges
1,5943 silver badges9 bronze badges
add a comment |
add a comment |
$begingroup$
Style
- The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.
- Avoid variables in all caps, like
START
(unless they are constants). - Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.
$endgroup$
add a comment |
$begingroup$
Style
- The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.
- Avoid variables in all caps, like
START
(unless they are constants). - Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.
$endgroup$
add a comment |
$begingroup$
Style
- The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.
- Avoid variables in all caps, like
START
(unless they are constants). - Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.
$endgroup$
Style
- The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.
- Avoid variables in all caps, like
START
(unless they are constants). - Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.
answered Aug 13 at 6:29
qwrqwr
3382 silver badges11 bronze badges
3382 silver badges11 bronze badges
add a comment |
add a comment |
$begingroup$
Let's look at this one line, which shows quite a few areas where your code could be improved:
if (word in self.adjList[prev]):
- As noted elsewhere, Python has an official style-guide, PEP8, which recommends using
lower_case
for variables and functions. - Basically all Python keywords (at least
if
,elif
,else
,for
,while
,in
,with
,except
,return
,yield
,yield from
,print
in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses. - Whenever you do
if X in Y
you should know what the time complexity of that statement is. Fortuple
,str
andlist
it is just a linear scan, so it is linear. Forset
anddict
it is (amortized) constant due to the use of hashes. adjList
is not actually alist
, but adict
! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call itadjacancies
.- Take an afternoon and work through the Python standard library. I would recommend at least
collections
(where you can finddefaultdict
andCounter
as recommend Ed in other answers,itertools
(andmore_itertools
for bonus points, although it is not in the standard library),str
,functools
,math
,random
andpathlib
as a start.
$endgroup$
add a comment |
$begingroup$
Let's look at this one line, which shows quite a few areas where your code could be improved:
if (word in self.adjList[prev]):
- As noted elsewhere, Python has an official style-guide, PEP8, which recommends using
lower_case
for variables and functions. - Basically all Python keywords (at least
if
,elif
,else
,for
,while
,in
,with
,except
,return
,yield
,yield from
,print
in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses. - Whenever you do
if X in Y
you should know what the time complexity of that statement is. Fortuple
,str
andlist
it is just a linear scan, so it is linear. Forset
anddict
it is (amortized) constant due to the use of hashes. adjList
is not actually alist
, but adict
! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call itadjacancies
.- Take an afternoon and work through the Python standard library. I would recommend at least
collections
(where you can finddefaultdict
andCounter
as recommend Ed in other answers,itertools
(andmore_itertools
for bonus points, although it is not in the standard library),str
,functools
,math
,random
andpathlib
as a start.
$endgroup$
add a comment |
$begingroup$
Let's look at this one line, which shows quite a few areas where your code could be improved:
if (word in self.adjList[prev]):
- As noted elsewhere, Python has an official style-guide, PEP8, which recommends using
lower_case
for variables and functions. - Basically all Python keywords (at least
if
,elif
,else
,for
,while
,in
,with
,except
,return
,yield
,yield from
,print
in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses. - Whenever you do
if X in Y
you should know what the time complexity of that statement is. Fortuple
,str
andlist
it is just a linear scan, so it is linear. Forset
anddict
it is (amortized) constant due to the use of hashes. adjList
is not actually alist
, but adict
! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call itadjacancies
.- Take an afternoon and work through the Python standard library. I would recommend at least
collections
(where you can finddefaultdict
andCounter
as recommend Ed in other answers,itertools
(andmore_itertools
for bonus points, although it is not in the standard library),str
,functools
,math
,random
andpathlib
as a start.
$endgroup$
Let's look at this one line, which shows quite a few areas where your code could be improved:
if (word in self.adjList[prev]):
- As noted elsewhere, Python has an official style-guide, PEP8, which recommends using
lower_case
for variables and functions. - Basically all Python keywords (at least
if
,elif
,else
,for
,while
,in
,with
,except
,return
,yield
,yield from
,print
in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses. - Whenever you do
if X in Y
you should know what the time complexity of that statement is. Fortuple
,str
andlist
it is just a linear scan, so it is linear. Forset
anddict
it is (amortized) constant due to the use of hashes. adjList
is not actually alist
, but adict
! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call itadjacancies
.- Take an afternoon and work through the Python standard library. I would recommend at least
collections
(where you can finddefaultdict
andCounter
as recommend Ed in other answers,itertools
(andmore_itertools
for bonus points, although it is not in the standard library),str
,functools
,math
,random
andpathlib
as a start.
edited Aug 13 at 9:05
answered Aug 13 at 8:43
GraipherGraipher
30k6 gold badges47 silver badges104 bronze badges
30k6 gold badges47 silver badges104 bronze badges
add a comment |
add a comment |
$begingroup$
For long blocks of informational text at the beginning of a class or method definition, you should use docstrings instead of comments, as per PEP 8. This makes your descriptions automatically available from the help() function.
Example:
def foo(bar):
"""Foo does something.
This description can span multiple lines
"""
return bar
$endgroup$
add a comment |
$begingroup$
For long blocks of informational text at the beginning of a class or method definition, you should use docstrings instead of comments, as per PEP 8. This makes your descriptions automatically available from the help() function.
Example:
def foo(bar):
"""Foo does something.
This description can span multiple lines
"""
return bar
$endgroup$
add a comment |
$begingroup$
For long blocks of informational text at the beginning of a class or method definition, you should use docstrings instead of comments, as per PEP 8. This makes your descriptions automatically available from the help() function.
Example:
def foo(bar):
"""Foo does something.
This description can span multiple lines
"""
return bar
$endgroup$
For long blocks of informational text at the beginning of a class or method definition, you should use docstrings instead of comments, as per PEP 8. This makes your descriptions automatically available from the help() function.
Example:
def foo(bar):
"""Foo does something.
This description can span multiple lines
"""
return bar
answered Aug 13 at 14:50
Delya ErricsonDelya Erricson
412 bronze badges
412 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f225992%2fmarkov-chain-sentence-generator-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57
$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21