Probability of seeing a bird on a certain date based on historical notesPoisson Regression : expectation vs probability for each outcomePoisson regression with (auto-correlated) time seriesProbability that a random sample will have certain elementsProbability of a certain run of dice rollsProbability of seeing k faces that appear more than 3 times when rolling 10 diceProbability of seeing observation > x based on historical observationsEstimating probability of observing greater than X events based on a current population and historical ratesProbability in poker based on previous outcomesProbability based on average/historical dataConditional Probability - Drawing a Certain ColorProbability of completing a task along certain path in a certain time using beta distributionCalculate probability of bird in the cage
Why is this Simple Puzzle impossible to solve?
In general, would I need to season a meat when making a sauce?
Would the Geas spell work in a dead magic zone once you enter it?
Can a wire having 610-670 THz (frequency of blue light) A.C frequency supply, generate blue light?
Would Brexit have gone ahead by now if Gina Miller had not forced the Government to involve Parliament?
Show differential operator is not bounded using definition of bounded operators
How strong are Wi-Fi signals?
Why does the 'metric Lagrangian' approach appear to fail in Newtonian mechanics?
A Python Blackjack terminal based game
How can I find where certain bash function is defined?
Can R-3-methyl-4-heptanone be enantioselectively synthesised from 4-heptanone?
Why doesn't the Earth's acceleration towards the Moon accumulate to push the Earth off its orbit?
What are these arcade games in Ghostbusters 1984?
Full backup on database creation
Is there a public standard for 8 and 10 character grid locators?
Forward and backward integration -- cause of errors
What is the most important source of natural gas? coal, oil or other?
analysis of BJT PNP type - why they can use voltage divider?
At what point in European history could a government build a printing press given a basic description?
Dictionary size reduces upon increasing one element
Why do airplanes use an axial flow jet engine instead of a more compact centrifugal jet engine?
Plot twist where the antagonist wins
Placing bypass capacitors after VCC reaches the IC
Old short story, same personalities, differing planes of existence
Probability of seeing a bird on a certain date based on historical notes
Poisson Regression : expectation vs probability for each outcomePoisson regression with (auto-correlated) time seriesProbability that a random sample will have certain elementsProbability of a certain run of dice rollsProbability of seeing k faces that appear more than 3 times when rolling 10 diceProbability of seeing observation > x based on historical observationsEstimating probability of observing greater than X events based on a current population and historical ratesProbability in poker based on previous outcomesProbability based on average/historical dataConditional Probability - Drawing a Certain ColorProbability of completing a task along certain path in a certain time using beta distributionCalculate probability of bird in the cage
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I have a database filled with different bird species that were seen on different dates (10 years of records).
Each row in the table contains:
Date, Time, Bird Species, Spot where it was seen
So it looks like:
2008-04-07, 14:22:48, Himalayan Snowcock, Spot-4
2008-04-19, 11:44:01, Ring-necked Pheasant, Spot-12
...
2019-05-20, 08:51:14, American Kestrel, Spot-8
It contains thousands of records like this.
Now I need to create a calendar detailing Birds x Months and on each cell it will contain the probability of seeing that bird on that month.
How can I compute the probability based on the records I have?
probability
New contributor
Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
I have a database filled with different bird species that were seen on different dates (10 years of records).
Each row in the table contains:
Date, Time, Bird Species, Spot where it was seen
So it looks like:
2008-04-07, 14:22:48, Himalayan Snowcock, Spot-4
2008-04-19, 11:44:01, Ring-necked Pheasant, Spot-12
...
2019-05-20, 08:51:14, American Kestrel, Spot-8
It contains thousands of records like this.
Now I need to create a calendar detailing Birds x Months and on each cell it will contain the probability of seeing that bird on that month.
How can I compute the probability based on the records I have?
probability
New contributor
Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
I have a database filled with different bird species that were seen on different dates (10 years of records).
Each row in the table contains:
Date, Time, Bird Species, Spot where it was seen
So it looks like:
2008-04-07, 14:22:48, Himalayan Snowcock, Spot-4
2008-04-19, 11:44:01, Ring-necked Pheasant, Spot-12
...
2019-05-20, 08:51:14, American Kestrel, Spot-8
It contains thousands of records like this.
Now I need to create a calendar detailing Birds x Months and on each cell it will contain the probability of seeing that bird on that month.
How can I compute the probability based on the records I have?
probability
New contributor
Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
I have a database filled with different bird species that were seen on different dates (10 years of records).
Each row in the table contains:
Date, Time, Bird Species, Spot where it was seen
So it looks like:
2008-04-07, 14:22:48, Himalayan Snowcock, Spot-4
2008-04-19, 11:44:01, Ring-necked Pheasant, Spot-12
...
2019-05-20, 08:51:14, American Kestrel, Spot-8
It contains thousands of records like this.
Now I need to create a calendar detailing Birds x Months and on each cell it will contain the probability of seeing that bird on that month.
How can I compute the probability based on the records I have?
probability
probability
New contributor
Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited May 23 at 17:59
whuber♦
209k34459835
209k34459835
New contributor
Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked May 21 at 3:19
Stephen H. AndersonStephen H. Anderson
1084
1084
New contributor
Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
A simple approach would be to create a 2 × 2 contingency table (Bird × Month) and then for each unique combination of bird and month divide the number of years where that species was seen in that month by 10, which would give you a proportion. However, this approach throws away some information
If you wanted to incorporate the fact that some birds are much more common than others and that different months are of different lengths you could use the # of sightings / the number of days (accounting for leap years probably). This would capture the rarity of birds as well as make use of the fact that your information is at a finer than month resolution. If data wasn't collected on a daily basis you could instead divide by the number of days were there was an observer.
You do much more complex and employ something like a time-series model to account for changes in detectability through time.... that is probably a question in itself and would be much more challenging to carry out, at least for me.
$endgroup$
$begingroup$
Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
$endgroup$
– André.B
May 21 at 21:15
$begingroup$
Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
$endgroup$
– André.B
May 21 at 21:22
$begingroup$
Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
$endgroup$
– Stephen H. Anderson
May 22 at 11:51
|
show 5 more comments
$begingroup$
You could potentially use a Poisson regression to model monthly counts of birds of a particular species as a function of explanatory/independent variables that you think affect the monthly rate of observations, such as linear time trends/calendar month/season/year (potentially supplementing this data with external data sources on things like weather).
After estimating the regression, you could then compute probabilities. As Poisson is a discrete probability distribution, you can calculate the probability of X (specify desired count here) of events for specified values of predictors you used in the regression. If you wanted to know the probability of seeing at least one bird, you can compute it as 1 minus the probability of 0 birds seen. Here is another CV Q&A on predicting probabilities with Poisson regression: Poisson Regression : expectation vs probability for each outcome
For this type of regression, you would normally be assuming that counts in consecutive months are not correlated, meaning that you do not have autocorrelation. If that assumption is not valid, which it probably is not, given that you have time series data, you can examine this Q&A for potential approaches: Poisson regression with (auto-correlated) time series
Poisson regression has a couple of well-known relatives: Negative Binomial and Zero-Inflated Poisson (also zero-truncated Poisson, but that one should not apply here, as you should have a non-zero probability of 0 birds observed). Negative Binomial is needed when the variance of data exceeds the mean in a given month. Zero-Inflated Poisson is used when there are "excess" zeros in the data generated by a separate process.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Stephen H. Anderson is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f409332%2fprobability-of-seeing-a-bird-on-a-certain-date-based-on-historical-notes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
A simple approach would be to create a 2 × 2 contingency table (Bird × Month) and then for each unique combination of bird and month divide the number of years where that species was seen in that month by 10, which would give you a proportion. However, this approach throws away some information
If you wanted to incorporate the fact that some birds are much more common than others and that different months are of different lengths you could use the # of sightings / the number of days (accounting for leap years probably). This would capture the rarity of birds as well as make use of the fact that your information is at a finer than month resolution. If data wasn't collected on a daily basis you could instead divide by the number of days were there was an observer.
You do much more complex and employ something like a time-series model to account for changes in detectability through time.... that is probably a question in itself and would be much more challenging to carry out, at least for me.
$endgroup$
$begingroup$
Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
$endgroup$
– André.B
May 21 at 21:15
$begingroup$
Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
$endgroup$
– André.B
May 21 at 21:22
$begingroup$
Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
$endgroup$
– Stephen H. Anderson
May 22 at 11:51
|
show 5 more comments
$begingroup$
A simple approach would be to create a 2 × 2 contingency table (Bird × Month) and then for each unique combination of bird and month divide the number of years where that species was seen in that month by 10, which would give you a proportion. However, this approach throws away some information
If you wanted to incorporate the fact that some birds are much more common than others and that different months are of different lengths you could use the # of sightings / the number of days (accounting for leap years probably). This would capture the rarity of birds as well as make use of the fact that your information is at a finer than month resolution. If data wasn't collected on a daily basis you could instead divide by the number of days were there was an observer.
You do much more complex and employ something like a time-series model to account for changes in detectability through time.... that is probably a question in itself and would be much more challenging to carry out, at least for me.
$endgroup$
$begingroup$
Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
$endgroup$
– André.B
May 21 at 21:15
$begingroup$
Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
$endgroup$
– André.B
May 21 at 21:22
$begingroup$
Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
$endgroup$
– Stephen H. Anderson
May 22 at 11:51
|
show 5 more comments
$begingroup$
A simple approach would be to create a 2 × 2 contingency table (Bird × Month) and then for each unique combination of bird and month divide the number of years where that species was seen in that month by 10, which would give you a proportion. However, this approach throws away some information
If you wanted to incorporate the fact that some birds are much more common than others and that different months are of different lengths you could use the # of sightings / the number of days (accounting for leap years probably). This would capture the rarity of birds as well as make use of the fact that your information is at a finer than month resolution. If data wasn't collected on a daily basis you could instead divide by the number of days were there was an observer.
You do much more complex and employ something like a time-series model to account for changes in detectability through time.... that is probably a question in itself and would be much more challenging to carry out, at least for me.
$endgroup$
A simple approach would be to create a 2 × 2 contingency table (Bird × Month) and then for each unique combination of bird and month divide the number of years where that species was seen in that month by 10, which would give you a proportion. However, this approach throws away some information
If you wanted to incorporate the fact that some birds are much more common than others and that different months are of different lengths you could use the # of sightings / the number of days (accounting for leap years probably). This would capture the rarity of birds as well as make use of the fact that your information is at a finer than month resolution. If data wasn't collected on a daily basis you could instead divide by the number of days were there was an observer.
You do much more complex and employ something like a time-series model to account for changes in detectability through time.... that is probably a question in itself and would be much more challenging to carry out, at least for me.
answered May 21 at 4:24
André.BAndré.B
68417
68417
$begingroup$
Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
$endgroup$
– André.B
May 21 at 21:15
$begingroup$
Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
$endgroup$
– André.B
May 21 at 21:22
$begingroup$
Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
$endgroup$
– Stephen H. Anderson
May 22 at 11:51
|
show 5 more comments
$begingroup$
Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
$endgroup$
– André.B
May 21 at 21:15
$begingroup$
Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
$endgroup$
– André.B
May 21 at 21:22
$begingroup$
Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
$endgroup$
– Stephen H. Anderson
May 22 at 11:51
$begingroup$
Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
$endgroup$
– Stephen H. Anderson
May 21 at 11:37
$begingroup$
That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
$endgroup$
– André.B
May 21 at 21:15
$begingroup$
That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
$endgroup$
– André.B
May 21 at 21:15
$begingroup$
Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
$endgroup$
– André.B
May 21 at 21:22
$begingroup$
Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
$endgroup$
– André.B
May 21 at 21:22
$begingroup$
Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
$endgroup$
– Stephen H. Anderson
May 22 at 11:51
$begingroup$
Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
$endgroup$
– Stephen H. Anderson
May 22 at 11:51
|
show 5 more comments
$begingroup$
You could potentially use a Poisson regression to model monthly counts of birds of a particular species as a function of explanatory/independent variables that you think affect the monthly rate of observations, such as linear time trends/calendar month/season/year (potentially supplementing this data with external data sources on things like weather).
After estimating the regression, you could then compute probabilities. As Poisson is a discrete probability distribution, you can calculate the probability of X (specify desired count here) of events for specified values of predictors you used in the regression. If you wanted to know the probability of seeing at least one bird, you can compute it as 1 minus the probability of 0 birds seen. Here is another CV Q&A on predicting probabilities with Poisson regression: Poisson Regression : expectation vs probability for each outcome
For this type of regression, you would normally be assuming that counts in consecutive months are not correlated, meaning that you do not have autocorrelation. If that assumption is not valid, which it probably is not, given that you have time series data, you can examine this Q&A for potential approaches: Poisson regression with (auto-correlated) time series
Poisson regression has a couple of well-known relatives: Negative Binomial and Zero-Inflated Poisson (also zero-truncated Poisson, but that one should not apply here, as you should have a non-zero probability of 0 birds observed). Negative Binomial is needed when the variance of data exceeds the mean in a given month. Zero-Inflated Poisson is used when there are "excess" zeros in the data generated by a separate process.
$endgroup$
add a comment |
$begingroup$
You could potentially use a Poisson regression to model monthly counts of birds of a particular species as a function of explanatory/independent variables that you think affect the monthly rate of observations, such as linear time trends/calendar month/season/year (potentially supplementing this data with external data sources on things like weather).
After estimating the regression, you could then compute probabilities. As Poisson is a discrete probability distribution, you can calculate the probability of X (specify desired count here) of events for specified values of predictors you used in the regression. If you wanted to know the probability of seeing at least one bird, you can compute it as 1 minus the probability of 0 birds seen. Here is another CV Q&A on predicting probabilities with Poisson regression: Poisson Regression : expectation vs probability for each outcome
For this type of regression, you would normally be assuming that counts in consecutive months are not correlated, meaning that you do not have autocorrelation. If that assumption is not valid, which it probably is not, given that you have time series data, you can examine this Q&A for potential approaches: Poisson regression with (auto-correlated) time series
Poisson regression has a couple of well-known relatives: Negative Binomial and Zero-Inflated Poisson (also zero-truncated Poisson, but that one should not apply here, as you should have a non-zero probability of 0 birds observed). Negative Binomial is needed when the variance of data exceeds the mean in a given month. Zero-Inflated Poisson is used when there are "excess" zeros in the data generated by a separate process.
$endgroup$
add a comment |
$begingroup$
You could potentially use a Poisson regression to model monthly counts of birds of a particular species as a function of explanatory/independent variables that you think affect the monthly rate of observations, such as linear time trends/calendar month/season/year (potentially supplementing this data with external data sources on things like weather).
After estimating the regression, you could then compute probabilities. As Poisson is a discrete probability distribution, you can calculate the probability of X (specify desired count here) of events for specified values of predictors you used in the regression. If you wanted to know the probability of seeing at least one bird, you can compute it as 1 minus the probability of 0 birds seen. Here is another CV Q&A on predicting probabilities with Poisson regression: Poisson Regression : expectation vs probability for each outcome
For this type of regression, you would normally be assuming that counts in consecutive months are not correlated, meaning that you do not have autocorrelation. If that assumption is not valid, which it probably is not, given that you have time series data, you can examine this Q&A for potential approaches: Poisson regression with (auto-correlated) time series
Poisson regression has a couple of well-known relatives: Negative Binomial and Zero-Inflated Poisson (also zero-truncated Poisson, but that one should not apply here, as you should have a non-zero probability of 0 birds observed). Negative Binomial is needed when the variance of data exceeds the mean in a given month. Zero-Inflated Poisson is used when there are "excess" zeros in the data generated by a separate process.
$endgroup$
You could potentially use a Poisson regression to model monthly counts of birds of a particular species as a function of explanatory/independent variables that you think affect the monthly rate of observations, such as linear time trends/calendar month/season/year (potentially supplementing this data with external data sources on things like weather).
After estimating the regression, you could then compute probabilities. As Poisson is a discrete probability distribution, you can calculate the probability of X (specify desired count here) of events for specified values of predictors you used in the regression. If you wanted to know the probability of seeing at least one bird, you can compute it as 1 minus the probability of 0 birds seen. Here is another CV Q&A on predicting probabilities with Poisson regression: Poisson Regression : expectation vs probability for each outcome
For this type of regression, you would normally be assuming that counts in consecutive months are not correlated, meaning that you do not have autocorrelation. If that assumption is not valid, which it probably is not, given that you have time series data, you can examine this Q&A for potential approaches: Poisson regression with (auto-correlated) time series
Poisson regression has a couple of well-known relatives: Negative Binomial and Zero-Inflated Poisson (also zero-truncated Poisson, but that one should not apply here, as you should have a non-zero probability of 0 birds observed). Negative Binomial is needed when the variance of data exceeds the mean in a given month. Zero-Inflated Poisson is used when there are "excess" zeros in the data generated by a separate process.
answered May 21 at 7:10
AlexKAlexK
580110
580110
add a comment |
add a comment |
Stephen H. Anderson is a new contributor. Be nice, and check out our Code of Conduct.
Stephen H. Anderson is a new contributor. Be nice, and check out our Code of Conduct.
Stephen H. Anderson is a new contributor. Be nice, and check out our Code of Conduct.
Stephen H. Anderson is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f409332%2fprobability-of-seeing-a-bird-on-a-certain-date-based-on-historical-notes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown