Markov-chain sentence generator in PythonMarkov Chain generatorShakespeare and dictionariesMarkov chain-based word salad generatorMarkov chain text generation in PythonCumulative transition matrix Markov Chain simulationMarkov Chain in PythonSentence generation using Markov ChainsCalculate probability of word occurenceGiven a string and a word dict, find all possible sentences

Network helper class with retry logic on failure

How do I make my image comply with the requirements of this photography competition?

How would a Creature that needs to be seen by Humans evolve?

How to respectfully refuse to assist co-workers with IT issues?

Could George I (of Great Britain) speak English?

Why do all fields in a QFT transform like *irreducible* representations of some group?

Another solution to create a set with two conditions

What does zitch dog mean?

Compelling story with the world as a villain

Numbers Decrease while Letters Increase

"There were either twelve sexes or none."

moon structure ownership

Did anyone try to find the little box that held Professor Moriarty and his wife after the crash?

What would make bones be of different colors?

Was there ever a treaty between 2 entities with significantly different translations to the detriment of one party?

Why is there a difference between predicting on Validation set and Test set?

Improving Performance of an XY Monte Carlo

Can you answer this logic non-math puzzle question correctly?

Can I add a link to my website in a paper I'm coauthoring?

Papers on arXiv solving the same problem at the same time

Duplicate instruments in unison in an orchestra

How do I prevent other wifi networks from showing up on my computer?

Where can/should I, as a high schooler, publish a paper regarding the derivation of a formula?

Is gzip atomic?



Markov-chain sentence generator in Python


Markov Chain generatorShakespeare and dictionariesMarkov chain-based word salad generatorMarkov chain text generation in PythonCumulative transition matrix Markov Chain simulationMarkov Chain in PythonSentence generation using Markov ChainsCalculate probability of word occurenceGiven a string and a word dict, find all possible sentences






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








14












$begingroup$


I wrote a Markov-chain based sentence generator as my first non-trivial Python program. I mainly used C before, so I probably have ignored a lot of Python conventions and features, so any advice would be appreciated.



text-gen.py



import sys
import random

class MarkovChain:
# Class constant that serves as an initial state for the Markov chain
START = ""

# The Markov chain is modelled as a directed graph,
# with the START state acting as the only source,
# and the tranisition probabilities as the graph weights.
#
# The graph is implemented using an adjacency list,
# which in turn is implemented as a dictionary of dictionaries.
#
# self.adjList is a dictionary keyed by all words of the text
# or START (the states). For each key/state, it contains
# another dictionary indexed by the words of the text
# that succeed the key in the text (the next states in the chain),
# and for each of those words/next states the dictionary contains
# the transition probability from the present state to them.
#
# This implementation was chosen because of it being easy to code,
# and offering an easy way to iterate on both the probabilities and
# the words/next states of each dictionary using items().
def __init__(self, file):
self.adjList =

# The totals dictionary is used in calculating the probabilities,
# for every word in the text/chain state it contains the total
# number of transitions from it to another state.
totals =


# Start by insering the initial state to the structures
self.adjList[MarkovChain.START] =
totals[MarkovChain.START] = 0

# prev: Contains the previously encountered word or the START state,
# initialized to the START state.
prev = MarkovChain.START


for line in file:
for word in line.split():
# If the word ends with a terminating punctuation mark,
# ignore the mark, and treat the word as a terminating state as
# it does not preceed another word in the current sentence.
# So prev is set to START, in order for the text model
# to account for the fact that some words start sentences
# more frequently than others (not all words are next states of START).
endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
if (endsTerm):
word = word[0:-1]

# If this is the first time the word is encountered,
# add it to the adjacency list, and initialize its dictionary
# and transition total.
if (word not in self.adjList):
self.adjList[word] =
totals[word] = 0

# If this is the first time the prev->word transition
# was detected, initialize the prev->word transition frequency to 1,
# else increment it.
if (word in self.adjList[prev]):
self.adjList[prev][word] += 1
else:
self.adjList[prev][word] = 1

# There is a prev->word state transition, so increment
# the total transition number of the prev state.
totals[prev] += 1

if (endsTerm):
prev = START

# Using total, convert the transition frequencies
# to transition probabilities.
for word, neighbors in self.adjList.items():
for name in neighbors:
neighbors[name] /= totals[word]


# chooseNextWord: Chooses the next state/word,
# by sampling the non uniform transition probability distribution
# of the current word/state.
def chooseNextWord(self, curWord):
# Convert the dict_keys object to a list
# to use indexing
nextWords = list(self.adjList[curWord].keys())

# Sampling is done through linear search.
for word in nextWords[0:-1]:
prob = self.adjList[curWord][word]
roll = random.random()
if (roll <= prob):
return word

# If none of the first N-1 words were chosen,
# only the last one was left.
return nextWords[-1]


# genSentence: Generates a sentence. If a positive
# limit is not provided by the caller, the sentences grow to
# an arbitrary number of words, until the last word of a sentence/a terminal state
# is reached.
def genSentence(self, limit = 0):
sentence = ""

curWord = self.chooseNextWord(MarkovChain.START)
sentence += curWord + " "

if (limit > 0):
wordsUsed = 1
while (wordsUsed < limit and self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
wordsUsed += 1
else:
while (self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "

return sentence



if (__name__ == "__main__"):
if (len(sys.argv) < 3):
print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
sys.exit(1)

try:
with open(sys.argv[1], "r") as f:
markov = MarkovChain(f)

except OSError as error:
print(error.strerror)
sys.exit(1)

# Generate and print as many sentences as asked.
for k in range(0, int(sys.argv[2])):
print(markov.genSentence(20) + "n")









share|improve this question











$endgroup$













  • $begingroup$
    What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
    $endgroup$
    – Mast
    Aug 14 at 6:57











  • $begingroup$
    I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
    $endgroup$
    – ChatterOne
    Aug 14 at 13:21

















14












$begingroup$


I wrote a Markov-chain based sentence generator as my first non-trivial Python program. I mainly used C before, so I probably have ignored a lot of Python conventions and features, so any advice would be appreciated.



text-gen.py



import sys
import random

class MarkovChain:
# Class constant that serves as an initial state for the Markov chain
START = ""

# The Markov chain is modelled as a directed graph,
# with the START state acting as the only source,
# and the tranisition probabilities as the graph weights.
#
# The graph is implemented using an adjacency list,
# which in turn is implemented as a dictionary of dictionaries.
#
# self.adjList is a dictionary keyed by all words of the text
# or START (the states). For each key/state, it contains
# another dictionary indexed by the words of the text
# that succeed the key in the text (the next states in the chain),
# and for each of those words/next states the dictionary contains
# the transition probability from the present state to them.
#
# This implementation was chosen because of it being easy to code,
# and offering an easy way to iterate on both the probabilities and
# the words/next states of each dictionary using items().
def __init__(self, file):
self.adjList =

# The totals dictionary is used in calculating the probabilities,
# for every word in the text/chain state it contains the total
# number of transitions from it to another state.
totals =


# Start by insering the initial state to the structures
self.adjList[MarkovChain.START] =
totals[MarkovChain.START] = 0

# prev: Contains the previously encountered word or the START state,
# initialized to the START state.
prev = MarkovChain.START


for line in file:
for word in line.split():
# If the word ends with a terminating punctuation mark,
# ignore the mark, and treat the word as a terminating state as
# it does not preceed another word in the current sentence.
# So prev is set to START, in order for the text model
# to account for the fact that some words start sentences
# more frequently than others (not all words are next states of START).
endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
if (endsTerm):
word = word[0:-1]

# If this is the first time the word is encountered,
# add it to the adjacency list, and initialize its dictionary
# and transition total.
if (word not in self.adjList):
self.adjList[word] =
totals[word] = 0

# If this is the first time the prev->word transition
# was detected, initialize the prev->word transition frequency to 1,
# else increment it.
if (word in self.adjList[prev]):
self.adjList[prev][word] += 1
else:
self.adjList[prev][word] = 1

# There is a prev->word state transition, so increment
# the total transition number of the prev state.
totals[prev] += 1

if (endsTerm):
prev = START

# Using total, convert the transition frequencies
# to transition probabilities.
for word, neighbors in self.adjList.items():
for name in neighbors:
neighbors[name] /= totals[word]


# chooseNextWord: Chooses the next state/word,
# by sampling the non uniform transition probability distribution
# of the current word/state.
def chooseNextWord(self, curWord):
# Convert the dict_keys object to a list
# to use indexing
nextWords = list(self.adjList[curWord].keys())

# Sampling is done through linear search.
for word in nextWords[0:-1]:
prob = self.adjList[curWord][word]
roll = random.random()
if (roll <= prob):
return word

# If none of the first N-1 words were chosen,
# only the last one was left.
return nextWords[-1]


# genSentence: Generates a sentence. If a positive
# limit is not provided by the caller, the sentences grow to
# an arbitrary number of words, until the last word of a sentence/a terminal state
# is reached.
def genSentence(self, limit = 0):
sentence = ""

curWord = self.chooseNextWord(MarkovChain.START)
sentence += curWord + " "

if (limit > 0):
wordsUsed = 1
while (wordsUsed < limit and self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
wordsUsed += 1
else:
while (self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "

return sentence



if (__name__ == "__main__"):
if (len(sys.argv) < 3):
print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
sys.exit(1)

try:
with open(sys.argv[1], "r") as f:
markov = MarkovChain(f)

except OSError as error:
print(error.strerror)
sys.exit(1)

# Generate and print as many sentences as asked.
for k in range(0, int(sys.argv[2])):
print(markov.genSentence(20) + "n")









share|improve this question











$endgroup$













  • $begingroup$
    What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
    $endgroup$
    – Mast
    Aug 14 at 6:57











  • $begingroup$
    I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
    $endgroup$
    – ChatterOne
    Aug 14 at 13:21













14












14








14


2



$begingroup$


I wrote a Markov-chain based sentence generator as my first non-trivial Python program. I mainly used C before, so I probably have ignored a lot of Python conventions and features, so any advice would be appreciated.



text-gen.py



import sys
import random

class MarkovChain:
# Class constant that serves as an initial state for the Markov chain
START = ""

# The Markov chain is modelled as a directed graph,
# with the START state acting as the only source,
# and the tranisition probabilities as the graph weights.
#
# The graph is implemented using an adjacency list,
# which in turn is implemented as a dictionary of dictionaries.
#
# self.adjList is a dictionary keyed by all words of the text
# or START (the states). For each key/state, it contains
# another dictionary indexed by the words of the text
# that succeed the key in the text (the next states in the chain),
# and for each of those words/next states the dictionary contains
# the transition probability from the present state to them.
#
# This implementation was chosen because of it being easy to code,
# and offering an easy way to iterate on both the probabilities and
# the words/next states of each dictionary using items().
def __init__(self, file):
self.adjList =

# The totals dictionary is used in calculating the probabilities,
# for every word in the text/chain state it contains the total
# number of transitions from it to another state.
totals =


# Start by insering the initial state to the structures
self.adjList[MarkovChain.START] =
totals[MarkovChain.START] = 0

# prev: Contains the previously encountered word or the START state,
# initialized to the START state.
prev = MarkovChain.START


for line in file:
for word in line.split():
# If the word ends with a terminating punctuation mark,
# ignore the mark, and treat the word as a terminating state as
# it does not preceed another word in the current sentence.
# So prev is set to START, in order for the text model
# to account for the fact that some words start sentences
# more frequently than others (not all words are next states of START).
endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
if (endsTerm):
word = word[0:-1]

# If this is the first time the word is encountered,
# add it to the adjacency list, and initialize its dictionary
# and transition total.
if (word not in self.adjList):
self.adjList[word] =
totals[word] = 0

# If this is the first time the prev->word transition
# was detected, initialize the prev->word transition frequency to 1,
# else increment it.
if (word in self.adjList[prev]):
self.adjList[prev][word] += 1
else:
self.adjList[prev][word] = 1

# There is a prev->word state transition, so increment
# the total transition number of the prev state.
totals[prev] += 1

if (endsTerm):
prev = START

# Using total, convert the transition frequencies
# to transition probabilities.
for word, neighbors in self.adjList.items():
for name in neighbors:
neighbors[name] /= totals[word]


# chooseNextWord: Chooses the next state/word,
# by sampling the non uniform transition probability distribution
# of the current word/state.
def chooseNextWord(self, curWord):
# Convert the dict_keys object to a list
# to use indexing
nextWords = list(self.adjList[curWord].keys())

# Sampling is done through linear search.
for word in nextWords[0:-1]:
prob = self.adjList[curWord][word]
roll = random.random()
if (roll <= prob):
return word

# If none of the first N-1 words were chosen,
# only the last one was left.
return nextWords[-1]


# genSentence: Generates a sentence. If a positive
# limit is not provided by the caller, the sentences grow to
# an arbitrary number of words, until the last word of a sentence/a terminal state
# is reached.
def genSentence(self, limit = 0):
sentence = ""

curWord = self.chooseNextWord(MarkovChain.START)
sentence += curWord + " "

if (limit > 0):
wordsUsed = 1
while (wordsUsed < limit and self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
wordsUsed += 1
else:
while (self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "

return sentence



if (__name__ == "__main__"):
if (len(sys.argv) < 3):
print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
sys.exit(1)

try:
with open(sys.argv[1], "r") as f:
markov = MarkovChain(f)

except OSError as error:
print(error.strerror)
sys.exit(1)

# Generate and print as many sentences as asked.
for k in range(0, int(sys.argv[2])):
print(markov.genSentence(20) + "n")









share|improve this question











$endgroup$




I wrote a Markov-chain based sentence generator as my first non-trivial Python program. I mainly used C before, so I probably have ignored a lot of Python conventions and features, so any advice would be appreciated.



text-gen.py



import sys
import random

class MarkovChain:
# Class constant that serves as an initial state for the Markov chain
START = ""

# The Markov chain is modelled as a directed graph,
# with the START state acting as the only source,
# and the tranisition probabilities as the graph weights.
#
# The graph is implemented using an adjacency list,
# which in turn is implemented as a dictionary of dictionaries.
#
# self.adjList is a dictionary keyed by all words of the text
# or START (the states). For each key/state, it contains
# another dictionary indexed by the words of the text
# that succeed the key in the text (the next states in the chain),
# and for each of those words/next states the dictionary contains
# the transition probability from the present state to them.
#
# This implementation was chosen because of it being easy to code,
# and offering an easy way to iterate on both the probabilities and
# the words/next states of each dictionary using items().
def __init__(self, file):
self.adjList =

# The totals dictionary is used in calculating the probabilities,
# for every word in the text/chain state it contains the total
# number of transitions from it to another state.
totals =


# Start by insering the initial state to the structures
self.adjList[MarkovChain.START] =
totals[MarkovChain.START] = 0

# prev: Contains the previously encountered word or the START state,
# initialized to the START state.
prev = MarkovChain.START


for line in file:
for word in line.split():
# If the word ends with a terminating punctuation mark,
# ignore the mark, and treat the word as a terminating state as
# it does not preceed another word in the current sentence.
# So prev is set to START, in order for the text model
# to account for the fact that some words start sentences
# more frequently than others (not all words are next states of START).
endsTerm = word[-1] == "." or word[-1] == "?" or word[-1] == "!"
if (endsTerm):
word = word[0:-1]

# If this is the first time the word is encountered,
# add it to the adjacency list, and initialize its dictionary
# and transition total.
if (word not in self.adjList):
self.adjList[word] =
totals[word] = 0

# If this is the first time the prev->word transition
# was detected, initialize the prev->word transition frequency to 1,
# else increment it.
if (word in self.adjList[prev]):
self.adjList[prev][word] += 1
else:
self.adjList[prev][word] = 1

# There is a prev->word state transition, so increment
# the total transition number of the prev state.
totals[prev] += 1

if (endsTerm):
prev = START

# Using total, convert the transition frequencies
# to transition probabilities.
for word, neighbors in self.adjList.items():
for name in neighbors:
neighbors[name] /= totals[word]


# chooseNextWord: Chooses the next state/word,
# by sampling the non uniform transition probability distribution
# of the current word/state.
def chooseNextWord(self, curWord):
# Convert the dict_keys object to a list
# to use indexing
nextWords = list(self.adjList[curWord].keys())

# Sampling is done through linear search.
for word in nextWords[0:-1]:
prob = self.adjList[curWord][word]
roll = random.random()
if (roll <= prob):
return word

# If none of the first N-1 words were chosen,
# only the last one was left.
return nextWords[-1]


# genSentence: Generates a sentence. If a positive
# limit is not provided by the caller, the sentences grow to
# an arbitrary number of words, until the last word of a sentence/a terminal state
# is reached.
def genSentence(self, limit = 0):
sentence = ""

curWord = self.chooseNextWord(MarkovChain.START)
sentence += curWord + " "

if (limit > 0):
wordsUsed = 1
while (wordsUsed < limit and self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "
wordsUsed += 1
else:
while (self.adjList[curWord]):
curWord = self.chooseNextWord(curWord)
sentence += curWord + " "

return sentence



if (__name__ == "__main__"):
if (len(sys.argv) < 3):
print("Not enough arguements, run with python3 text-gen.py <input-filename> <sentence-num>")
sys.exit(1)

try:
with open(sys.argv[1], "r") as f:
markov = MarkovChain(f)

except OSError as error:
print(error.strerror)
sys.exit(1)

# Generate and print as many sentences as asked.
for k in range(0, int(sys.argv[2])):
print(markov.genSentence(20) + "n")






python python-3.x statistics markov-chain






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 14 at 6:58









Mast

7,9157 gold badges38 silver badges92 bronze badges




7,9157 gold badges38 silver badges92 bronze badges










asked Aug 12 at 18:06









HashewHashew

1011 silver badge6 bronze badges




1011 silver badge6 bronze badges














  • $begingroup$
    What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
    $endgroup$
    – Mast
    Aug 14 at 6:57











  • $begingroup$
    I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
    $endgroup$
    – ChatterOne
    Aug 14 at 13:21
















  • $begingroup$
    What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
    $endgroup$
    – Mast
    Aug 14 at 6:57











  • $begingroup$
    I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
    $endgroup$
    – ChatterOne
    Aug 14 at 13:21















$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57





$begingroup$
What's the purpose of these chains? Will another program use the result of this? If so, how does it expect the chains to be delivered (formatted)?
$endgroup$
– Mast
Aug 14 at 6:57













$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21




$begingroup$
I'm probably a bit strict when it comes to this, but when I see that 63 lines of code require 53 lines of comments, I think that either there is something wrong with the code or that some of the comments should be removed
$endgroup$
– ChatterOne
Aug 14 at 13:21










5 Answers
5






active

oldest

votes


















15













$begingroup$


  • chooseNextWord distorts the probabilities.



    For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.



    A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.



  • The code may benefit from using defaultdict rather than a plain dictionaries. Lesser ifs is better.


  • Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.






share|improve this answer











$endgroup$










  • 1




    $begingroup$
    I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
    $endgroup$
    – Hashew
    Aug 12 at 19:00







  • 5




    $begingroup$
    @Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
    $endgroup$
    – vnp
    Aug 12 at 19:26






  • 1




    $begingroup$
    I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
    $endgroup$
    – Hashew
    Aug 12 at 19:33







  • 2




    $begingroup$
    @Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
    $endgroup$
    – vnp
    Aug 13 at 3:03






  • 1




    $begingroup$
    Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
    $endgroup$
    – Hashew
    Aug 13 at 9:23


















6













$begingroup$

endTerms



When a word ends with an endTerm, think you need to include an START or END symbol in adjList. Most words can appear anywhere in a sentence. So it is unlikely that you can end a sentence only when words don't have any follow-on words. Include the START/END symbol in adjList and the Markov process can also end a sentence.



the standard library



collections.defaultdict provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.



collections.Counter provides a dictionary that counts things.



random.choices selects items from a population according to specified weights.



import collections
import random

class MarkovChain:
START = ""

def __init__(self, file):
adjList = collections.defaultdict(collections.Counter)

# this inserts START into the defaultdict
adjList[MarkovChain.START]

prev = MarkovChain.START

for line in file:
for word in line.split():
endsTerm = word[-1] in ('.', '?', '!')

if (endsTerm):
word = word[:-1]

adjList[prev].update([word])

if endsTerm:
# mark the end of a sentence
adjList[word].update([MarkovChain.START])
prev = MarkovChain.START
else:
prev = word

# convert defaultdict to a regular dict
# the values are a tuple: ([follow words], [counts])
# for use in random.choices() in chooseNextWord()
self.adjList = k:(list(v.keys()), list(v.values()))
for k,v in adjList.items()

#print(self.adjList)


def chooseNextWord(self, word):
# random.choices returns a list, hence the [0]
return random.choices(*self.adjList[word])[0]


def genSentence(self, limit = 0):
sentence = []
curWord = MarkovChain.START

while True:
curWord = self.chooseNextWord(curWord)
sentence.append(curWord)

if 0 < limit < len(sentence) or curWord == MarkovChain.START:
break

return ' '.join(sentence).strip()





share|improve this answer









$endgroup$






















    4













    $begingroup$

    Style



    • The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.

    • Avoid variables in all caps, like START (unless they are constants).

    • Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.





    share|improve this answer









    $endgroup$






















      4













      $begingroup$

      Let's look at this one line, which shows quite a few areas where your code could be improved:



      if (word in self.adjList[prev]):


      • As noted elsewhere, Python has an official style-guide, PEP8, which recommends using lower_case for variables and functions.

      • Basically all Python keywords (at least if, elif, else, for, while, in, with, except, return, yield, yield from, print in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses.

      • Whenever you do if X in Y you should know what the time complexity of that statement is. For tuple, str and list it is just a linear scan, so it is linear. For set and dict it is (amortized) constant due to the use of hashes.


      • adjList is not actually a list, but a dict! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call it adjacancies.

      • Take an afternoon and work through the Python standard library. I would recommend at least collections (where you can find defaultdict and Counter as recommend Ed in other answers, itertools (and more_itertools for bonus points, although it is not in the standard library), str, functools, math, random and pathlib as a start.





      share|improve this answer











      $endgroup$






















        4













        $begingroup$

        For long blocks of informational text at the beginning of a class or method definition, you should use docstrings instead of comments, as per PEP 8. This makes your descriptions automatically available from the help() function.



        Example:



        def foo(bar):
        """Foo does something.
        This description can span multiple lines
        """
        return bar





        share|improve this answer









        $endgroup$

















          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f225992%2fmarkov-chain-sentence-generator-in-python%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          5 Answers
          5






          active

          oldest

          votes








          5 Answers
          5






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          15













          $begingroup$


          • chooseNextWord distorts the probabilities.



            For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.



            A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.



          • The code may benefit from using defaultdict rather than a plain dictionaries. Lesser ifs is better.


          • Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.






          share|improve this answer











          $endgroup$










          • 1




            $begingroup$
            I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
            $endgroup$
            – Hashew
            Aug 12 at 19:00







          • 5




            $begingroup$
            @Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
            $endgroup$
            – vnp
            Aug 12 at 19:26






          • 1




            $begingroup$
            I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
            $endgroup$
            – Hashew
            Aug 12 at 19:33







          • 2




            $begingroup$
            @Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
            $endgroup$
            – vnp
            Aug 13 at 3:03






          • 1




            $begingroup$
            Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
            $endgroup$
            – Hashew
            Aug 13 at 9:23















          15













          $begingroup$


          • chooseNextWord distorts the probabilities.



            For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.



            A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.



          • The code may benefit from using defaultdict rather than a plain dictionaries. Lesser ifs is better.


          • Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.






          share|improve this answer











          $endgroup$










          • 1




            $begingroup$
            I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
            $endgroup$
            – Hashew
            Aug 12 at 19:00







          • 5




            $begingroup$
            @Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
            $endgroup$
            – vnp
            Aug 12 at 19:26






          • 1




            $begingroup$
            I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
            $endgroup$
            – Hashew
            Aug 12 at 19:33







          • 2




            $begingroup$
            @Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
            $endgroup$
            – vnp
            Aug 13 at 3:03






          • 1




            $begingroup$
            Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
            $endgroup$
            – Hashew
            Aug 13 at 9:23













          15














          15










          15







          $begingroup$


          • chooseNextWord distorts the probabilities.



            For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.



            A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.



          • The code may benefit from using defaultdict rather than a plain dictionaries. Lesser ifs is better.


          • Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.






          share|improve this answer











          $endgroup$




          • chooseNextWord distorts the probabilities.



            For example, consider a list of 3 words with the inherent probabilities $frac13$, $frac13$, $frac13$. The first word is selected with the probability $frac13$. The second, however is selected with probability $frac29$ ($frac23$ that the first word was not selected, times $frac13$ that it is selected in the second round). The third one has $1 - frac13 - frac29 = frac49$ chance.



            A standard approach is to compute an accumulated sums of probabilities (in the constructor), then to choose a word roll once, and search for a value just above the rolled one.



          • The code may benefit from using defaultdict rather than a plain dictionaries. Lesser ifs is better.


          • Nitpicking. You may want to account for possible typos, such as a space between a word and a terminating punctuation.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Aug 13 at 3:02

























          answered Aug 12 at 18:39









          vnpvnp

          43k2 gold badges36 silver badges110 bronze badges




          43k2 gold badges36 silver badges110 bronze badges










          • 1




            $begingroup$
            I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
            $endgroup$
            – Hashew
            Aug 12 at 19:00







          • 5




            $begingroup$
            @Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
            $endgroup$
            – vnp
            Aug 12 at 19:26






          • 1




            $begingroup$
            I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
            $endgroup$
            – Hashew
            Aug 12 at 19:33







          • 2




            $begingroup$
            @Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
            $endgroup$
            – vnp
            Aug 13 at 3:03






          • 1




            $begingroup$
            Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
            $endgroup$
            – Hashew
            Aug 13 at 9:23












          • 1




            $begingroup$
            I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
            $endgroup$
            – Hashew
            Aug 12 at 19:00







          • 5




            $begingroup$
            @Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
            $endgroup$
            – vnp
            Aug 12 at 19:26






          • 1




            $begingroup$
            I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
            $endgroup$
            – Hashew
            Aug 12 at 19:33







          • 2




            $begingroup$
            @Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
            $endgroup$
            – vnp
            Aug 13 at 3:03






          • 1




            $begingroup$
            Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
            $endgroup$
            – Hashew
            Aug 13 at 9:23







          1




          1




          $begingroup$
          I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
          $endgroup$
          – Hashew
          Aug 12 at 19:00





          $begingroup$
          I completely forgot about conditional probability, I will implement sampling using the CDF then. I will look into the rest, thank you.
          $endgroup$
          – Hashew
          Aug 12 at 19:00





          5




          5




          $begingroup$
          @Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
          $endgroup$
          – vnp
          Aug 12 at 19:26




          $begingroup$
          @Hashew I rolled back your last edit. It is against the CR chapter to edit the code after a review was posted, because it invalidates the review. You are welcome to post a separate follow-up question.
          $endgroup$
          – vnp
          Aug 12 at 19:26




          1




          1




          $begingroup$
          I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
          $endgroup$
          – Hashew
          Aug 12 at 19:33





          $begingroup$
          I think this approach based on conditional probability is correct: The probability that a word is chosen on the condition that all of the previous ones weren't is P = (probability that the word was chosen and the previous weren't) / (probability that the previous ones weren't chosen) which is equal to (probability that the word was chosen) / (probability that the previous weren't). So if I initialize a variable (say notPrevProb) to 1 and subtract the probability of each word once I am done with it, I should be able to produce the correct probabilities. What do you think?
          $endgroup$
          – Hashew
          Aug 12 at 19:33





          2




          2




          $begingroup$
          @Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
          $endgroup$
          – vnp
          Aug 13 at 3:03




          $begingroup$
          @Hashew I rolled your edit of my answer back. Sorry, but you got the math completely wrong. If you don't trust me, run an experiment.
          $endgroup$
          – vnp
          Aug 13 at 3:03




          1




          1




          $begingroup$
          Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
          $endgroup$
          – Hashew
          Aug 13 at 9:23




          $begingroup$
          Why are you multiplying the probabilities? The problem calls for conditional probability, while you compute the intersection (leaving aside the fact that you compute the intersection by multiplying when the events are not independent). By using the probabilities I specified (1/3, 1/2, 1) to choose between one of three items with probability 1/3, by sequentially testing for roll < prob, and repeating the process 100.000 times (for the Law of llarge numbers to take effect), the number of times each item is chosen is 1/3 of the total number of choices, as expected.
          $endgroup$
          – Hashew
          Aug 13 at 9:23













          6













          $begingroup$

          endTerms



          When a word ends with an endTerm, think you need to include an START or END symbol in adjList. Most words can appear anywhere in a sentence. So it is unlikely that you can end a sentence only when words don't have any follow-on words. Include the START/END symbol in adjList and the Markov process can also end a sentence.



          the standard library



          collections.defaultdict provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.



          collections.Counter provides a dictionary that counts things.



          random.choices selects items from a population according to specified weights.



          import collections
          import random

          class MarkovChain:
          START = ""

          def __init__(self, file):
          adjList = collections.defaultdict(collections.Counter)

          # this inserts START into the defaultdict
          adjList[MarkovChain.START]

          prev = MarkovChain.START

          for line in file:
          for word in line.split():
          endsTerm = word[-1] in ('.', '?', '!')

          if (endsTerm):
          word = word[:-1]

          adjList[prev].update([word])

          if endsTerm:
          # mark the end of a sentence
          adjList[word].update([MarkovChain.START])
          prev = MarkovChain.START
          else:
          prev = word

          # convert defaultdict to a regular dict
          # the values are a tuple: ([follow words], [counts])
          # for use in random.choices() in chooseNextWord()
          self.adjList = k:(list(v.keys()), list(v.values()))
          for k,v in adjList.items()

          #print(self.adjList)


          def chooseNextWord(self, word):
          # random.choices returns a list, hence the [0]
          return random.choices(*self.adjList[word])[0]


          def genSentence(self, limit = 0):
          sentence = []
          curWord = MarkovChain.START

          while True:
          curWord = self.chooseNextWord(curWord)
          sentence.append(curWord)

          if 0 < limit < len(sentence) or curWord == MarkovChain.START:
          break

          return ' '.join(sentence).strip()





          share|improve this answer









          $endgroup$



















            6













            $begingroup$

            endTerms



            When a word ends with an endTerm, think you need to include an START or END symbol in adjList. Most words can appear anywhere in a sentence. So it is unlikely that you can end a sentence only when words don't have any follow-on words. Include the START/END symbol in adjList and the Markov process can also end a sentence.



            the standard library



            collections.defaultdict provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.



            collections.Counter provides a dictionary that counts things.



            random.choices selects items from a population according to specified weights.



            import collections
            import random

            class MarkovChain:
            START = ""

            def __init__(self, file):
            adjList = collections.defaultdict(collections.Counter)

            # this inserts START into the defaultdict
            adjList[MarkovChain.START]

            prev = MarkovChain.START

            for line in file:
            for word in line.split():
            endsTerm = word[-1] in ('.', '?', '!')

            if (endsTerm):
            word = word[:-1]

            adjList[prev].update([word])

            if endsTerm:
            # mark the end of a sentence
            adjList[word].update([MarkovChain.START])
            prev = MarkovChain.START
            else:
            prev = word

            # convert defaultdict to a regular dict
            # the values are a tuple: ([follow words], [counts])
            # for use in random.choices() in chooseNextWord()
            self.adjList = k:(list(v.keys()), list(v.values()))
            for k,v in adjList.items()

            #print(self.adjList)


            def chooseNextWord(self, word):
            # random.choices returns a list, hence the [0]
            return random.choices(*self.adjList[word])[0]


            def genSentence(self, limit = 0):
            sentence = []
            curWord = MarkovChain.START

            while True:
            curWord = self.chooseNextWord(curWord)
            sentence.append(curWord)

            if 0 < limit < len(sentence) or curWord == MarkovChain.START:
            break

            return ' '.join(sentence).strip()





            share|improve this answer









            $endgroup$

















              6














              6










              6







              $begingroup$

              endTerms



              When a word ends with an endTerm, think you need to include an START or END symbol in adjList. Most words can appear anywhere in a sentence. So it is unlikely that you can end a sentence only when words don't have any follow-on words. Include the START/END symbol in adjList and the Markov process can also end a sentence.



              the standard library



              collections.defaultdict provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.



              collections.Counter provides a dictionary that counts things.



              random.choices selects items from a population according to specified weights.



              import collections
              import random

              class MarkovChain:
              START = ""

              def __init__(self, file):
              adjList = collections.defaultdict(collections.Counter)

              # this inserts START into the defaultdict
              adjList[MarkovChain.START]

              prev = MarkovChain.START

              for line in file:
              for word in line.split():
              endsTerm = word[-1] in ('.', '?', '!')

              if (endsTerm):
              word = word[:-1]

              adjList[prev].update([word])

              if endsTerm:
              # mark the end of a sentence
              adjList[word].update([MarkovChain.START])
              prev = MarkovChain.START
              else:
              prev = word

              # convert defaultdict to a regular dict
              # the values are a tuple: ([follow words], [counts])
              # for use in random.choices() in chooseNextWord()
              self.adjList = k:(list(v.keys()), list(v.values()))
              for k,v in adjList.items()

              #print(self.adjList)


              def chooseNextWord(self, word):
              # random.choices returns a list, hence the [0]
              return random.choices(*self.adjList[word])[0]


              def genSentence(self, limit = 0):
              sentence = []
              curWord = MarkovChain.START

              while True:
              curWord = self.chooseNextWord(curWord)
              sentence.append(curWord)

              if 0 < limit < len(sentence) or curWord == MarkovChain.START:
              break

              return ' '.join(sentence).strip()





              share|improve this answer









              $endgroup$



              endTerms



              When a word ends with an endTerm, think you need to include an START or END symbol in adjList. Most words can appear anywhere in a sentence. So it is unlikely that you can end a sentence only when words don't have any follow-on words. Include the START/END symbol in adjList and the Markov process can also end a sentence.



              the standard library



              collections.defaultdict provides a dictionary that when you attempt to access a new key automatically initializes the new key to a default value.



              collections.Counter provides a dictionary that counts things.



              random.choices selects items from a population according to specified weights.



              import collections
              import random

              class MarkovChain:
              START = ""

              def __init__(self, file):
              adjList = collections.defaultdict(collections.Counter)

              # this inserts START into the defaultdict
              adjList[MarkovChain.START]

              prev = MarkovChain.START

              for line in file:
              for word in line.split():
              endsTerm = word[-1] in ('.', '?', '!')

              if (endsTerm):
              word = word[:-1]

              adjList[prev].update([word])

              if endsTerm:
              # mark the end of a sentence
              adjList[word].update([MarkovChain.START])
              prev = MarkovChain.START
              else:
              prev = word

              # convert defaultdict to a regular dict
              # the values are a tuple: ([follow words], [counts])
              # for use in random.choices() in chooseNextWord()
              self.adjList = k:(list(v.keys()), list(v.values()))
              for k,v in adjList.items()

              #print(self.adjList)


              def chooseNextWord(self, word):
              # random.choices returns a list, hence the [0]
              return random.choices(*self.adjList[word])[0]


              def genSentence(self, limit = 0):
              sentence = []
              curWord = MarkovChain.START

              while True:
              curWord = self.chooseNextWord(curWord)
              sentence.append(curWord)

              if 0 < limit < len(sentence) or curWord == MarkovChain.START:
              break

              return ' '.join(sentence).strip()






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Aug 13 at 1:37









              RootTwoRootTwo

              1,5943 silver badges9 bronze badges




              1,5943 silver badges9 bronze badges
























                  4













                  $begingroup$

                  Style



                  • The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.

                  • Avoid variables in all caps, like START (unless they are constants).

                  • Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.





                  share|improve this answer









                  $endgroup$



















                    4













                    $begingroup$

                    Style



                    • The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.

                    • Avoid variables in all caps, like START (unless they are constants).

                    • Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.





                    share|improve this answer









                    $endgroup$

















                      4














                      4










                      4







                      $begingroup$

                      Style



                      • The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.

                      • Avoid variables in all caps, like START (unless they are constants).

                      • Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.





                      share|improve this answer









                      $endgroup$



                      Style



                      • The long comment at the beginning of the class is good class documentation and should be made into a docstring. The same applies for functions.

                      • Avoid variables in all caps, like START (unless they are constants).

                      • Functions and variables should generally use snake-case, if you want to follow Python conventions as in PEP 8.






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Aug 13 at 6:29









                      qwrqwr

                      3382 silver badges11 bronze badges




                      3382 silver badges11 bronze badges
























                          4













                          $begingroup$

                          Let's look at this one line, which shows quite a few areas where your code could be improved:



                          if (word in self.adjList[prev]):


                          • As noted elsewhere, Python has an official style-guide, PEP8, which recommends using lower_case for variables and functions.

                          • Basically all Python keywords (at least if, elif, else, for, while, in, with, except, return, yield, yield from, print in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses.

                          • Whenever you do if X in Y you should know what the time complexity of that statement is. For tuple, str and list it is just a linear scan, so it is linear. For set and dict it is (amortized) constant due to the use of hashes.


                          • adjList is not actually a list, but a dict! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call it adjacancies.

                          • Take an afternoon and work through the Python standard library. I would recommend at least collections (where you can find defaultdict and Counter as recommend Ed in other answers, itertools (and more_itertools for bonus points, although it is not in the standard library), str, functools, math, random and pathlib as a start.





                          share|improve this answer











                          $endgroup$



















                            4













                            $begingroup$

                            Let's look at this one line, which shows quite a few areas where your code could be improved:



                            if (word in self.adjList[prev]):


                            • As noted elsewhere, Python has an official style-guide, PEP8, which recommends using lower_case for variables and functions.

                            • Basically all Python keywords (at least if, elif, else, for, while, in, with, except, return, yield, yield from, print in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses.

                            • Whenever you do if X in Y you should know what the time complexity of that statement is. For tuple, str and list it is just a linear scan, so it is linear. For set and dict it is (amortized) constant due to the use of hashes.


                            • adjList is not actually a list, but a dict! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call it adjacancies.

                            • Take an afternoon and work through the Python standard library. I would recommend at least collections (where you can find defaultdict and Counter as recommend Ed in other answers, itertools (and more_itertools for bonus points, although it is not in the standard library), str, functools, math, random and pathlib as a start.





                            share|improve this answer











                            $endgroup$

















                              4














                              4










                              4







                              $begingroup$

                              Let's look at this one line, which shows quite a few areas where your code could be improved:



                              if (word in self.adjList[prev]):


                              • As noted elsewhere, Python has an official style-guide, PEP8, which recommends using lower_case for variables and functions.

                              • Basically all Python keywords (at least if, elif, else, for, while, in, with, except, return, yield, yield from, print in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses.

                              • Whenever you do if X in Y you should know what the time complexity of that statement is. For tuple, str and list it is just a linear scan, so it is linear. For set and dict it is (amortized) constant due to the use of hashes.


                              • adjList is not actually a list, but a dict! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call it adjacancies.

                              • Take an afternoon and work through the Python standard library. I would recommend at least collections (where you can find defaultdict and Counter as recommend Ed in other answers, itertools (and more_itertools for bonus points, although it is not in the standard library), str, functools, math, random and pathlib as a start.





                              share|improve this answer











                              $endgroup$



                              Let's look at this one line, which shows quite a few areas where your code could be improved:



                              if (word in self.adjList[prev]):


                              • As noted elsewhere, Python has an official style-guide, PEP8, which recommends using lower_case for variables and functions.

                              • Basically all Python keywords (at least if, elif, else, for, while, in, with, except, return, yield, yield from, print in Python 2) take an expression as argument. Expressions do not need to be surrounded by parentheses.

                              • Whenever you do if X in Y you should know what the time complexity of that statement is. For tuple, str and list it is just a linear scan, so it is linear. For set and dict it is (amortized) constant due to the use of hashes.


                              • adjList is not actually a list, but a dict! This is why Hungarian notation (putting the type in the name) is discouraged and in addition duck typing is encouraged. Here you could just call it adjacancies.

                              • Take an afternoon and work through the Python standard library. I would recommend at least collections (where you can find defaultdict and Counter as recommend Ed in other answers, itertools (and more_itertools for bonus points, although it is not in the standard library), str, functools, math, random and pathlib as a start.






                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited Aug 13 at 9:05

























                              answered Aug 13 at 8:43









                              GraipherGraipher

                              30k6 gold badges47 silver badges104 bronze badges




                              30k6 gold badges47 silver badges104 bronze badges
























                                  4













                                  $begingroup$

                                  For long blocks of informational text at the beginning of a class or method definition, you should use docstrings instead of comments, as per PEP 8. This makes your descriptions automatically available from the help() function.



                                  Example:



                                  def foo(bar):
                                  """Foo does something.
                                  This description can span multiple lines
                                  """
                                  return bar





                                  share|improve this answer









                                  $endgroup$



















                                    4













                                    $begingroup$

                                    For long blocks of informational text at the beginning of a class or method definition, you should use docstrings instead of comments, as per PEP 8. This makes your descriptions automatically available from the help() function.



                                    Example:



                                    def foo(bar):
                                    """Foo does something.
                                    This description can span multiple lines
                                    """
                                    return bar





                                    share|improve this answer









                                    $endgroup$

















                                      4














                                      4










                                      4







                                      $begingroup$

                                      For long blocks of informational text at the beginning of a class or method definition, you should use docstrings instead of comments, as per PEP 8. This makes your descriptions automatically available from the help() function.



                                      Example:



                                      def foo(bar):
                                      """Foo does something.
                                      This description can span multiple lines
                                      """
                                      return bar





                                      share|improve this answer









                                      $endgroup$



                                      For long blocks of informational text at the beginning of a class or method definition, you should use docstrings instead of comments, as per PEP 8. This makes your descriptions automatically available from the help() function.



                                      Example:



                                      def foo(bar):
                                      """Foo does something.
                                      This description can span multiple lines
                                      """
                                      return bar






                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Aug 13 at 14:50









                                      Delya ErricsonDelya Erricson

                                      412 bronze badges




                                      412 bronze badges






























                                          draft saved

                                          draft discarded
















































                                          Thanks for contributing an answer to Code Review Stack Exchange!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid


                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.

                                          Use MathJax to format equations. MathJax reference.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function ()
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f225992%2fmarkov-chain-sentence-generator-in-python%23new-answer', 'question_page');

                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Get product attribute by attribute group code in magento 2get product attribute by product attribute group in magento 2Magento 2 Log Bundle Product Data in List Page?How to get all product attribute of a attribute group of Default attribute set?Magento 2.1 Create a filter in the product grid by new attributeMagento 2 : Get Product Attribute values By GroupMagento 2 How to get all existing values for one attributeMagento 2 get custom attribute of a single product inside a pluginMagento 2.3 How to get all the Multi Source Inventory (MSI) locations collection in custom module?Magento2: how to develop rest API to get new productsGet product attribute by attribute group code ( [attribute_group_code] ) in magento 2

                                          Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                                          Magento 2.3: How do i solve this, Not registered handle, on custom form?How can i rewrite TierPrice Block in Magento2magento 2 captcha not rendering if I override layout xmlmain.CRITICAL: Plugin class doesn't existMagento 2 : Problem while adding custom button order view page?Magento 2.2.5: Overriding Admin Controller sales/orderMagento 2.2.5: Add, Update and Delete existing products Custom OptionsMagento 2.3 : File Upload issue in UI Component FormMagento2 Not registered handleHow to configured Form Builder Js in my custom magento 2.3.0 module?Magento 2.3. How to create image upload field in an admin form