How to slice a string input at a certain unknown indexHow do I check if a string is a number (float)?How do I parse a string to a float or int?How to remove an element from a list by index?How to substring a string in Python?How do I trim whitespace from a string?How can I print literal curly-brace characters in python string and also use .format on it?How do I lowercase a string in Python?How to read a text file into a string variable and strip newlines?How to change a string into uppercaseHow to check if the string is empty?

Replacing URI when using dynamic hosts in Nginx reverse proxy

How can we better understand multiplicative inverse modulo something?

Could the crash sites of the Apollo 11 and 16 LMs be seen by the LRO?

Why do candidates not quit if they no longer have a realistic chance to win in the 2020 US presidents election

(algebraic topology) question about the cellular approximation theorem

How to determine port and starboard on a rotating wheel space station?

Absconding a company after 1st day of joining

Confused about 誘われて (Sasowarete)

Possible isometry groups of open manifolds

What are some symbols representing peasants/oppressed persons fighting back?

Why use null function instead of == []

How did John Lennon tune his guitar

Are there any double stars that I can actually see orbit each other?

Was adding milk to tea started to reduce employee tea break time?

HackerRank: Electronics Shop

Filtering fine silt/mud from water (not necessarily bacteria etc.)

How are "soeben" and "eben" different from one another?

Why hasn't the U.S. government paid war reparations to any country it attacked?

Alternatives to using writing paper for writing practice

Ezek. 24:1-2, "Again in the ninth year, in the tenth month, in the tenth day of the month, ...." Which month was the tenth month?

Why does the trade federation become so alarmed upon learning the ambassadors are Jedi Knights?

Does entangle require vegetation?

Would letting a multiclass character rebuild their character to be single-classed be game-breaking?

Did the Shuttle's rudder or elevons operate when flown on its carrier 747?



How to slice a string input at a certain unknown index


How do I check if a string is a number (float)?How do I parse a string to a float or int?How to remove an element from a list by index?How to substring a string in Python?How do I trim whitespace from a string?How can I print literal curly-brace characters in python string and also use .format on it?How do I lowercase a string in Python?How to read a text file into a string variable and strip newlines?How to change a string into uppercaseHow to check if the string is empty?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








14















A string is given as an input (e.g. "What is your name?"). The input always contains a question which I want to extract. But the problem that I am trying to solve is that the input is always with unneeded input.



So the input could be (but not limited to) the following:



1- "eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn"
2- "What is yournlastname and email?ndasf?lkjas"
3- "askjdmk.nGiven your skillsnhow would you rate yourself?nand your name? dasf?"



(Notice that at the third input, the question starts with the word "Given" and end with "yourself?")



The above input examples are generated by the pytesseract OCR library of scanning an image and converting it into text



I only want to extract the question from the garbage input and nothing else.



I tried to use find('?', 1) function of the re library to get index of last part of the question (assuming for now that the first question mark is always the end of the question and not part of the input that I don't want). But I can't figure out how to get the index of the first letter of the question. I tried to loop in reverse and get the first spotted n in the input, but the question doesn't always have n before the first letter of the question.



def extractQuestion(input):
index_end_q = input.find('?', 1)
index_first_letter_of_q = 0 # TODO
question = 'n ' . join(input[index_first_letter_of_q :index_end_q ])











share|improve this question
























  • I think more examples may help determine if there is any invariant property about the start of the question to hook into.

    – Andrew Allen
    Jul 6 at 9:45






  • 2





    I think this TODO is the TODO of humanity right now because you'll need to make your program understand human language in order to properly solve this, and this task remains largely unsolved now.

    – ForceBru
    Jul 6 at 9:45






  • 2





    You said the input could be, but not limited to, the following. Well, what is it limited to? Maybe you can tell us where you're getting these inputs and we may be able to provide a solution that navigates around having messy inputs in the first place.

    – user10987432
    Jul 6 at 9:55

















14















A string is given as an input (e.g. "What is your name?"). The input always contains a question which I want to extract. But the problem that I am trying to solve is that the input is always with unneeded input.



So the input could be (but not limited to) the following:



1- "eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn"
2- "What is yournlastname and email?ndasf?lkjas"
3- "askjdmk.nGiven your skillsnhow would you rate yourself?nand your name? dasf?"



(Notice that at the third input, the question starts with the word "Given" and end with "yourself?")



The above input examples are generated by the pytesseract OCR library of scanning an image and converting it into text



I only want to extract the question from the garbage input and nothing else.



I tried to use find('?', 1) function of the re library to get index of last part of the question (assuming for now that the first question mark is always the end of the question and not part of the input that I don't want). But I can't figure out how to get the index of the first letter of the question. I tried to loop in reverse and get the first spotted n in the input, but the question doesn't always have n before the first letter of the question.



def extractQuestion(input):
index_end_q = input.find('?', 1)
index_first_letter_of_q = 0 # TODO
question = 'n ' . join(input[index_first_letter_of_q :index_end_q ])











share|improve this question
























  • I think more examples may help determine if there is any invariant property about the start of the question to hook into.

    – Andrew Allen
    Jul 6 at 9:45






  • 2





    I think this TODO is the TODO of humanity right now because you'll need to make your program understand human language in order to properly solve this, and this task remains largely unsolved now.

    – ForceBru
    Jul 6 at 9:45






  • 2





    You said the input could be, but not limited to, the following. Well, what is it limited to? Maybe you can tell us where you're getting these inputs and we may be able to provide a solution that navigates around having messy inputs in the first place.

    – user10987432
    Jul 6 at 9:55













14












14








14


2






A string is given as an input (e.g. "What is your name?"). The input always contains a question which I want to extract. But the problem that I am trying to solve is that the input is always with unneeded input.



So the input could be (but not limited to) the following:



1- "eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn"
2- "What is yournlastname and email?ndasf?lkjas"
3- "askjdmk.nGiven your skillsnhow would you rate yourself?nand your name? dasf?"



(Notice that at the third input, the question starts with the word "Given" and end with "yourself?")



The above input examples are generated by the pytesseract OCR library of scanning an image and converting it into text



I only want to extract the question from the garbage input and nothing else.



I tried to use find('?', 1) function of the re library to get index of last part of the question (assuming for now that the first question mark is always the end of the question and not part of the input that I don't want). But I can't figure out how to get the index of the first letter of the question. I tried to loop in reverse and get the first spotted n in the input, but the question doesn't always have n before the first letter of the question.



def extractQuestion(input):
index_end_q = input.find('?', 1)
index_first_letter_of_q = 0 # TODO
question = 'n ' . join(input[index_first_letter_of_q :index_end_q ])











share|improve this question
















A string is given as an input (e.g. "What is your name?"). The input always contains a question which I want to extract. But the problem that I am trying to solve is that the input is always with unneeded input.



So the input could be (but not limited to) the following:



1- "eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn"
2- "What is yournlastname and email?ndasf?lkjas"
3- "askjdmk.nGiven your skillsnhow would you rate yourself?nand your name? dasf?"



(Notice that at the third input, the question starts with the word "Given" and end with "yourself?")



The above input examples are generated by the pytesseract OCR library of scanning an image and converting it into text



I only want to extract the question from the garbage input and nothing else.



I tried to use find('?', 1) function of the re library to get index of last part of the question (assuming for now that the first question mark is always the end of the question and not part of the input that I don't want). But I can't figure out how to get the index of the first letter of the question. I tried to loop in reverse and get the first spotted n in the input, but the question doesn't always have n before the first letter of the question.



def extractQuestion(input):
index_end_q = input.find('?', 1)
index_first_letter_of_q = 0 # TODO
question = 'n ' . join(input[index_first_letter_of_q :index_end_q ])








python






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jul 6 at 20:29









Peter Mortensen

14.2k19 gold badges88 silver badges115 bronze badges




14.2k19 gold badges88 silver badges115 bronze badges










asked Jul 6 at 9:29









LinkCoderLinkCoder

735 bronze badges




735 bronze badges












  • I think more examples may help determine if there is any invariant property about the start of the question to hook into.

    – Andrew Allen
    Jul 6 at 9:45






  • 2





    I think this TODO is the TODO of humanity right now because you'll need to make your program understand human language in order to properly solve this, and this task remains largely unsolved now.

    – ForceBru
    Jul 6 at 9:45






  • 2





    You said the input could be, but not limited to, the following. Well, what is it limited to? Maybe you can tell us where you're getting these inputs and we may be able to provide a solution that navigates around having messy inputs in the first place.

    – user10987432
    Jul 6 at 9:55

















  • I think more examples may help determine if there is any invariant property about the start of the question to hook into.

    – Andrew Allen
    Jul 6 at 9:45






  • 2





    I think this TODO is the TODO of humanity right now because you'll need to make your program understand human language in order to properly solve this, and this task remains largely unsolved now.

    – ForceBru
    Jul 6 at 9:45






  • 2





    You said the input could be, but not limited to, the following. Well, what is it limited to? Maybe you can tell us where you're getting these inputs and we may be able to provide a solution that navigates around having messy inputs in the first place.

    – user10987432
    Jul 6 at 9:55
















I think more examples may help determine if there is any invariant property about the start of the question to hook into.

– Andrew Allen
Jul 6 at 9:45





I think more examples may help determine if there is any invariant property about the start of the question to hook into.

– Andrew Allen
Jul 6 at 9:45




2




2





I think this TODO is the TODO of humanity right now because you'll need to make your program understand human language in order to properly solve this, and this task remains largely unsolved now.

– ForceBru
Jul 6 at 9:45





I think this TODO is the TODO of humanity right now because you'll need to make your program understand human language in order to properly solve this, and this task remains largely unsolved now.

– ForceBru
Jul 6 at 9:45




2




2





You said the input could be, but not limited to, the following. Well, what is it limited to? Maybe you can tell us where you're getting these inputs and we may be able to provide a solution that navigates around having messy inputs in the first place.

– user10987432
Jul 6 at 9:55





You said the input could be, but not limited to, the following. Well, what is it limited to? Maybe you can tell us where you're getting these inputs and we may be able to provide a solution that navigates around having messy inputs in the first place.

– user10987432
Jul 6 at 9:55












2 Answers
2






active

oldest

votes


















12














A way to find the question's first word index would be to search for the first word that has an actual meaning (you're interested in English words I suppose). A way to do that would be using pyenchant:



#!/usr/bin/env python

import enchant

GLOSSARY = enchant.Dict("en_US")

def isWord(word):
return True if GLOSSARY.check(word) else False

sentences = [
"eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn",
"What is yournlastname and email?ndasf?lkjas",
"nGiven your skillsnhow would you rate yourself?nand your name? dasf?"]

for sentence in sentences:
for i,w in enumerate(sentence.split()):
if isWord(w):
print('index: => '.format(i, w))
break


The above piece of code gives as a result:



index: 3 => What
index: 0 => What
index: 0 => Given





share|improve this answer

























  • Yes this would be a great solution for the problem, but what if the input BEFORE the question is also a valid english word? I updated the question btw

    – LinkCoder
    Jul 6 at 9:49











  • @LinkCoder Then the problem is much more complicated from the one you initially described. Then maybe NLTK be of some help to help you recognize "logical" sentences, but as far as I know that's not an easy problem to solve (and I think it hasn't been solved by the time I posted this answer).

    – game0ver
    Jul 6 at 9:53












  • Alright then assuming the input before question is garbage, then how could I find the index of the first letter of the question of the whole input because then I can slice it.

    – LinkCoder
    Jul 6 at 9:59











  • @LinkCoder that's easy, you can use the python built-in find() function.

    – game0ver
    Jul 6 at 10:04











  • Ok I get it now thanks for the effort that you have put into your answer :)

    – LinkCoder
    Jul 6 at 10:19


















6














You could try a regular expression like b[A-Z][a-z][^?]+?, meaning:



  • The start of a word b with an upper case letter [A-Z] followed by a lower case letter [a-z],

  • then a sequence of non-questionmark-characters [^?]+,

  • followed by a literal question mark ?.

This can still have some false positives or misses, e.g. if a question actually starts with an acronym, or if there is a name in the middle of the question, but for you examples it works quite well.



>>> tests = ["eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn",
"What is yournlastname and email?ndasf?lkjas",
"nGiven your skillsnhow would you rate yourself?nand your name? dasf?"]

>>> import re
>>> p = r"b[A-Z][a-z][^?]+?"
>>> [re.search(p, t).group() for t in tests]
['What is your name?',
'What is yournlastname and email?',
'Given your skillsnhow would you rate yourself?']


If that's one blob of text, you can use findall instead of search:



>>> text = "n".join(tests)
>>> re.findall(p, text)
['What is your name?',
'What is yournlastname and email?',
'Given your skillsnhow would you rate yourself?']


Actually, this also seems to work reasonably well for questions with names in them:



>>> t = "asdGARBAGEasdnHow did you like St. Petersburg? more stuff with ?" 
>>> re.search(p, t).group()
'How did you like St. Petersburg?'





share|improve this answer

























  • Thanks for the effort that you put into your answer. I have a question though: How can I do the exact thing that you did in your answer on a single string variable?

    – LinkCoder
    Jul 6 at 10:19











  • @LinkCoder Wouldn't that be exactly what I did in the lower code with text?

    – tobias_k
    Jul 6 at 10:59











  • Yes I figured it out. I just used re.search(regex_pattern, input, flags=re.S).group().replace("n", " ") so thanks for that. Btw is there a regex that doesn't miss names but does the same thing as what you did above?

    – LinkCoder
    Jul 6 at 11:04











  • Actually, I think it should work okay if there are names in the question, as long as the start of the question is the first word in the sentence starting with an upper-case letter.

    – tobias_k
    Jul 6 at 11:06











  • Ok I think I understand it thanks very much for your effort and time :)

    – LinkCoder
    Jul 6 at 11:11













Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56912823%2fhow-to-slice-a-string-input-at-a-certain-unknown-index%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









12














A way to find the question's first word index would be to search for the first word that has an actual meaning (you're interested in English words I suppose). A way to do that would be using pyenchant:



#!/usr/bin/env python

import enchant

GLOSSARY = enchant.Dict("en_US")

def isWord(word):
return True if GLOSSARY.check(word) else False

sentences = [
"eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn",
"What is yournlastname and email?ndasf?lkjas",
"nGiven your skillsnhow would you rate yourself?nand your name? dasf?"]

for sentence in sentences:
for i,w in enumerate(sentence.split()):
if isWord(w):
print('index: => '.format(i, w))
break


The above piece of code gives as a result:



index: 3 => What
index: 0 => What
index: 0 => Given





share|improve this answer

























  • Yes this would be a great solution for the problem, but what if the input BEFORE the question is also a valid english word? I updated the question btw

    – LinkCoder
    Jul 6 at 9:49











  • @LinkCoder Then the problem is much more complicated from the one you initially described. Then maybe NLTK be of some help to help you recognize "logical" sentences, but as far as I know that's not an easy problem to solve (and I think it hasn't been solved by the time I posted this answer).

    – game0ver
    Jul 6 at 9:53












  • Alright then assuming the input before question is garbage, then how could I find the index of the first letter of the question of the whole input because then I can slice it.

    – LinkCoder
    Jul 6 at 9:59











  • @LinkCoder that's easy, you can use the python built-in find() function.

    – game0ver
    Jul 6 at 10:04











  • Ok I get it now thanks for the effort that you have put into your answer :)

    – LinkCoder
    Jul 6 at 10:19















12














A way to find the question's first word index would be to search for the first word that has an actual meaning (you're interested in English words I suppose). A way to do that would be using pyenchant:



#!/usr/bin/env python

import enchant

GLOSSARY = enchant.Dict("en_US")

def isWord(word):
return True if GLOSSARY.check(word) else False

sentences = [
"eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn",
"What is yournlastname and email?ndasf?lkjas",
"nGiven your skillsnhow would you rate yourself?nand your name? dasf?"]

for sentence in sentences:
for i,w in enumerate(sentence.split()):
if isWord(w):
print('index: => '.format(i, w))
break


The above piece of code gives as a result:



index: 3 => What
index: 0 => What
index: 0 => Given





share|improve this answer

























  • Yes this would be a great solution for the problem, but what if the input BEFORE the question is also a valid english word? I updated the question btw

    – LinkCoder
    Jul 6 at 9:49











  • @LinkCoder Then the problem is much more complicated from the one you initially described. Then maybe NLTK be of some help to help you recognize "logical" sentences, but as far as I know that's not an easy problem to solve (and I think it hasn't been solved by the time I posted this answer).

    – game0ver
    Jul 6 at 9:53












  • Alright then assuming the input before question is garbage, then how could I find the index of the first letter of the question of the whole input because then I can slice it.

    – LinkCoder
    Jul 6 at 9:59











  • @LinkCoder that's easy, you can use the python built-in find() function.

    – game0ver
    Jul 6 at 10:04











  • Ok I get it now thanks for the effort that you have put into your answer :)

    – LinkCoder
    Jul 6 at 10:19













12












12








12







A way to find the question's first word index would be to search for the first word that has an actual meaning (you're interested in English words I suppose). A way to do that would be using pyenchant:



#!/usr/bin/env python

import enchant

GLOSSARY = enchant.Dict("en_US")

def isWord(word):
return True if GLOSSARY.check(word) else False

sentences = [
"eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn",
"What is yournlastname and email?ndasf?lkjas",
"nGiven your skillsnhow would you rate yourself?nand your name? dasf?"]

for sentence in sentences:
for i,w in enumerate(sentence.split()):
if isWord(w):
print('index: => '.format(i, w))
break


The above piece of code gives as a result:



index: 3 => What
index: 0 => What
index: 0 => Given





share|improve this answer















A way to find the question's first word index would be to search for the first word that has an actual meaning (you're interested in English words I suppose). A way to do that would be using pyenchant:



#!/usr/bin/env python

import enchant

GLOSSARY = enchant.Dict("en_US")

def isWord(word):
return True if GLOSSARY.check(word) else False

sentences = [
"eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn",
"What is yournlastname and email?ndasf?lkjas",
"nGiven your skillsnhow would you rate yourself?nand your name? dasf?"]

for sentence in sentences:
for i,w in enumerate(sentence.split()):
if isWord(w):
print('index: => '.format(i, w))
break


The above piece of code gives as a result:



index: 3 => What
index: 0 => What
index: 0 => Given






share|improve this answer














share|improve this answer



share|improve this answer








edited Jul 6 at 18:50









Peter Mortensen

14.2k19 gold badges88 silver badges115 bronze badges




14.2k19 gold badges88 silver badges115 bronze badges










answered Jul 6 at 9:42









game0vergame0ver

8585 silver badges19 bronze badges




8585 silver badges19 bronze badges












  • Yes this would be a great solution for the problem, but what if the input BEFORE the question is also a valid english word? I updated the question btw

    – LinkCoder
    Jul 6 at 9:49











  • @LinkCoder Then the problem is much more complicated from the one you initially described. Then maybe NLTK be of some help to help you recognize "logical" sentences, but as far as I know that's not an easy problem to solve (and I think it hasn't been solved by the time I posted this answer).

    – game0ver
    Jul 6 at 9:53












  • Alright then assuming the input before question is garbage, then how could I find the index of the first letter of the question of the whole input because then I can slice it.

    – LinkCoder
    Jul 6 at 9:59











  • @LinkCoder that's easy, you can use the python built-in find() function.

    – game0ver
    Jul 6 at 10:04











  • Ok I get it now thanks for the effort that you have put into your answer :)

    – LinkCoder
    Jul 6 at 10:19

















  • Yes this would be a great solution for the problem, but what if the input BEFORE the question is also a valid english word? I updated the question btw

    – LinkCoder
    Jul 6 at 9:49











  • @LinkCoder Then the problem is much more complicated from the one you initially described. Then maybe NLTK be of some help to help you recognize "logical" sentences, but as far as I know that's not an easy problem to solve (and I think it hasn't been solved by the time I posted this answer).

    – game0ver
    Jul 6 at 9:53












  • Alright then assuming the input before question is garbage, then how could I find the index of the first letter of the question of the whole input because then I can slice it.

    – LinkCoder
    Jul 6 at 9:59











  • @LinkCoder that's easy, you can use the python built-in find() function.

    – game0ver
    Jul 6 at 10:04











  • Ok I get it now thanks for the effort that you have put into your answer :)

    – LinkCoder
    Jul 6 at 10:19
















Yes this would be a great solution for the problem, but what if the input BEFORE the question is also a valid english word? I updated the question btw

– LinkCoder
Jul 6 at 9:49





Yes this would be a great solution for the problem, but what if the input BEFORE the question is also a valid english word? I updated the question btw

– LinkCoder
Jul 6 at 9:49













@LinkCoder Then the problem is much more complicated from the one you initially described. Then maybe NLTK be of some help to help you recognize "logical" sentences, but as far as I know that's not an easy problem to solve (and I think it hasn't been solved by the time I posted this answer).

– game0ver
Jul 6 at 9:53






@LinkCoder Then the problem is much more complicated from the one you initially described. Then maybe NLTK be of some help to help you recognize "logical" sentences, but as far as I know that's not an easy problem to solve (and I think it hasn't been solved by the time I posted this answer).

– game0ver
Jul 6 at 9:53














Alright then assuming the input before question is garbage, then how could I find the index of the first letter of the question of the whole input because then I can slice it.

– LinkCoder
Jul 6 at 9:59





Alright then assuming the input before question is garbage, then how could I find the index of the first letter of the question of the whole input because then I can slice it.

– LinkCoder
Jul 6 at 9:59













@LinkCoder that's easy, you can use the python built-in find() function.

– game0ver
Jul 6 at 10:04





@LinkCoder that's easy, you can use the python built-in find() function.

– game0ver
Jul 6 at 10:04













Ok I get it now thanks for the effort that you have put into your answer :)

– LinkCoder
Jul 6 at 10:19





Ok I get it now thanks for the effort that you have put into your answer :)

– LinkCoder
Jul 6 at 10:19













6














You could try a regular expression like b[A-Z][a-z][^?]+?, meaning:



  • The start of a word b with an upper case letter [A-Z] followed by a lower case letter [a-z],

  • then a sequence of non-questionmark-characters [^?]+,

  • followed by a literal question mark ?.

This can still have some false positives or misses, e.g. if a question actually starts with an acronym, or if there is a name in the middle of the question, but for you examples it works quite well.



>>> tests = ["eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn",
"What is yournlastname and email?ndasf?lkjas",
"nGiven your skillsnhow would you rate yourself?nand your name? dasf?"]

>>> import re
>>> p = r"b[A-Z][a-z][^?]+?"
>>> [re.search(p, t).group() for t in tests]
['What is your name?',
'What is yournlastname and email?',
'Given your skillsnhow would you rate yourself?']


If that's one blob of text, you can use findall instead of search:



>>> text = "n".join(tests)
>>> re.findall(p, text)
['What is your name?',
'What is yournlastname and email?',
'Given your skillsnhow would you rate yourself?']


Actually, this also seems to work reasonably well for questions with names in them:



>>> t = "asdGARBAGEasdnHow did you like St. Petersburg? more stuff with ?" 
>>> re.search(p, t).group()
'How did you like St. Petersburg?'





share|improve this answer

























  • Thanks for the effort that you put into your answer. I have a question though: How can I do the exact thing that you did in your answer on a single string variable?

    – LinkCoder
    Jul 6 at 10:19











  • @LinkCoder Wouldn't that be exactly what I did in the lower code with text?

    – tobias_k
    Jul 6 at 10:59











  • Yes I figured it out. I just used re.search(regex_pattern, input, flags=re.S).group().replace("n", " ") so thanks for that. Btw is there a regex that doesn't miss names but does the same thing as what you did above?

    – LinkCoder
    Jul 6 at 11:04











  • Actually, I think it should work okay if there are names in the question, as long as the start of the question is the first word in the sentence starting with an upper-case letter.

    – tobias_k
    Jul 6 at 11:06











  • Ok I think I understand it thanks very much for your effort and time :)

    – LinkCoder
    Jul 6 at 11:11















6














You could try a regular expression like b[A-Z][a-z][^?]+?, meaning:



  • The start of a word b with an upper case letter [A-Z] followed by a lower case letter [a-z],

  • then a sequence of non-questionmark-characters [^?]+,

  • followed by a literal question mark ?.

This can still have some false positives or misses, e.g. if a question actually starts with an acronym, or if there is a name in the middle of the question, but for you examples it works quite well.



>>> tests = ["eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn",
"What is yournlastname and email?ndasf?lkjas",
"nGiven your skillsnhow would you rate yourself?nand your name? dasf?"]

>>> import re
>>> p = r"b[A-Z][a-z][^?]+?"
>>> [re.search(p, t).group() for t in tests]
['What is your name?',
'What is yournlastname and email?',
'Given your skillsnhow would you rate yourself?']


If that's one blob of text, you can use findall instead of search:



>>> text = "n".join(tests)
>>> re.findall(p, text)
['What is your name?',
'What is yournlastname and email?',
'Given your skillsnhow would you rate yourself?']


Actually, this also seems to work reasonably well for questions with names in them:



>>> t = "asdGARBAGEasdnHow did you like St. Petersburg? more stuff with ?" 
>>> re.search(p, t).group()
'How did you like St. Petersburg?'





share|improve this answer

























  • Thanks for the effort that you put into your answer. I have a question though: How can I do the exact thing that you did in your answer on a single string variable?

    – LinkCoder
    Jul 6 at 10:19











  • @LinkCoder Wouldn't that be exactly what I did in the lower code with text?

    – tobias_k
    Jul 6 at 10:59











  • Yes I figured it out. I just used re.search(regex_pattern, input, flags=re.S).group().replace("n", " ") so thanks for that. Btw is there a regex that doesn't miss names but does the same thing as what you did above?

    – LinkCoder
    Jul 6 at 11:04











  • Actually, I think it should work okay if there are names in the question, as long as the start of the question is the first word in the sentence starting with an upper-case letter.

    – tobias_k
    Jul 6 at 11:06











  • Ok I think I understand it thanks very much for your effort and time :)

    – LinkCoder
    Jul 6 at 11:11













6












6








6







You could try a regular expression like b[A-Z][a-z][^?]+?, meaning:



  • The start of a word b with an upper case letter [A-Z] followed by a lower case letter [a-z],

  • then a sequence of non-questionmark-characters [^?]+,

  • followed by a literal question mark ?.

This can still have some false positives or misses, e.g. if a question actually starts with an acronym, or if there is a name in the middle of the question, but for you examples it works quite well.



>>> tests = ["eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn",
"What is yournlastname and email?ndasf?lkjas",
"nGiven your skillsnhow would you rate yourself?nand your name? dasf?"]

>>> import re
>>> p = r"b[A-Z][a-z][^?]+?"
>>> [re.search(p, t).group() for t in tests]
['What is your name?',
'What is yournlastname and email?',
'Given your skillsnhow would you rate yourself?']


If that's one blob of text, you can use findall instead of search:



>>> text = "n".join(tests)
>>> re.findall(p, text)
['What is your name?',
'What is yournlastname and email?',
'Given your skillsnhow would you rate yourself?']


Actually, this also seems to work reasonably well for questions with names in them:



>>> t = "asdGARBAGEasdnHow did you like St. Petersburg? more stuff with ?" 
>>> re.search(p, t).group()
'How did you like St. Petersburg?'





share|improve this answer















You could try a regular expression like b[A-Z][a-z][^?]+?, meaning:



  • The start of a word b with an upper case letter [A-Z] followed by a lower case letter [a-z],

  • then a sequence of non-questionmark-characters [^?]+,

  • followed by a literal question mark ?.

This can still have some false positives or misses, e.g. if a question actually starts with an acronym, or if there is a name in the middle of the question, but for you examples it works quite well.



>>> tests = ["eo000 ATATAT EGnnWhat is your name?nkgda dasflkjasn",
"What is yournlastname and email?ndasf?lkjas",
"nGiven your skillsnhow would you rate yourself?nand your name? dasf?"]

>>> import re
>>> p = r"b[A-Z][a-z][^?]+?"
>>> [re.search(p, t).group() for t in tests]
['What is your name?',
'What is yournlastname and email?',
'Given your skillsnhow would you rate yourself?']


If that's one blob of text, you can use findall instead of search:



>>> text = "n".join(tests)
>>> re.findall(p, text)
['What is your name?',
'What is yournlastname and email?',
'Given your skillsnhow would you rate yourself?']


Actually, this also seems to work reasonably well for questions with names in them:



>>> t = "asdGARBAGEasdnHow did you like St. Petersburg? more stuff with ?" 
>>> re.search(p, t).group()
'How did you like St. Petersburg?'






share|improve this answer














share|improve this answer



share|improve this answer








edited Jul 6 at 11:10

























answered Jul 6 at 9:43









tobias_ktobias_k

61.3k9 gold badges73 silver badges116 bronze badges




61.3k9 gold badges73 silver badges116 bronze badges












  • Thanks for the effort that you put into your answer. I have a question though: How can I do the exact thing that you did in your answer on a single string variable?

    – LinkCoder
    Jul 6 at 10:19











  • @LinkCoder Wouldn't that be exactly what I did in the lower code with text?

    – tobias_k
    Jul 6 at 10:59











  • Yes I figured it out. I just used re.search(regex_pattern, input, flags=re.S).group().replace("n", " ") so thanks for that. Btw is there a regex that doesn't miss names but does the same thing as what you did above?

    – LinkCoder
    Jul 6 at 11:04











  • Actually, I think it should work okay if there are names in the question, as long as the start of the question is the first word in the sentence starting with an upper-case letter.

    – tobias_k
    Jul 6 at 11:06











  • Ok I think I understand it thanks very much for your effort and time :)

    – LinkCoder
    Jul 6 at 11:11

















  • Thanks for the effort that you put into your answer. I have a question though: How can I do the exact thing that you did in your answer on a single string variable?

    – LinkCoder
    Jul 6 at 10:19











  • @LinkCoder Wouldn't that be exactly what I did in the lower code with text?

    – tobias_k
    Jul 6 at 10:59











  • Yes I figured it out. I just used re.search(regex_pattern, input, flags=re.S).group().replace("n", " ") so thanks for that. Btw is there a regex that doesn't miss names but does the same thing as what you did above?

    – LinkCoder
    Jul 6 at 11:04











  • Actually, I think it should work okay if there are names in the question, as long as the start of the question is the first word in the sentence starting with an upper-case letter.

    – tobias_k
    Jul 6 at 11:06











  • Ok I think I understand it thanks very much for your effort and time :)

    – LinkCoder
    Jul 6 at 11:11
















Thanks for the effort that you put into your answer. I have a question though: How can I do the exact thing that you did in your answer on a single string variable?

– LinkCoder
Jul 6 at 10:19





Thanks for the effort that you put into your answer. I have a question though: How can I do the exact thing that you did in your answer on a single string variable?

– LinkCoder
Jul 6 at 10:19













@LinkCoder Wouldn't that be exactly what I did in the lower code with text?

– tobias_k
Jul 6 at 10:59





@LinkCoder Wouldn't that be exactly what I did in the lower code with text?

– tobias_k
Jul 6 at 10:59













Yes I figured it out. I just used re.search(regex_pattern, input, flags=re.S).group().replace("n", " ") so thanks for that. Btw is there a regex that doesn't miss names but does the same thing as what you did above?

– LinkCoder
Jul 6 at 11:04





Yes I figured it out. I just used re.search(regex_pattern, input, flags=re.S).group().replace("n", " ") so thanks for that. Btw is there a regex that doesn't miss names but does the same thing as what you did above?

– LinkCoder
Jul 6 at 11:04













Actually, I think it should work okay if there are names in the question, as long as the start of the question is the first word in the sentence starting with an upper-case letter.

– tobias_k
Jul 6 at 11:06





Actually, I think it should work okay if there are names in the question, as long as the start of the question is the first word in the sentence starting with an upper-case letter.

– tobias_k
Jul 6 at 11:06













Ok I think I understand it thanks very much for your effort and time :)

– LinkCoder
Jul 6 at 11:11





Ok I think I understand it thanks very much for your effort and time :)

– LinkCoder
Jul 6 at 11:11

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56912823%2fhow-to-slice-a-string-input-at-a-certain-unknown-index%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?