Probability of seeing a bird on a certain date based on historical notesPoisson Regression : expectation vs probability for each outcomePoisson regression with (auto-correlated) time seriesProbability that a random sample will have certain elementsProbability of a certain run of dice rollsProbability of seeing k faces that appear more than 3 times when rolling 10 diceProbability of seeing observation > x based on historical observationsEstimating probability of observing greater than X events based on a current population and historical ratesProbability in poker based on previous outcomesProbability based on average/historical dataConditional Probability - Drawing a Certain ColorProbability of completing a task along certain path in a certain time using beta distributionCalculate probability of bird in the cage

Why is this Simple Puzzle impossible to solve?

In general, would I need to season a meat when making a sauce?

Would the Geas spell work in a dead magic zone once you enter it?

Can a wire having 610-670 THz (frequency of blue light) A.C frequency supply, generate blue light?

Would Brexit have gone ahead by now if Gina Miller had not forced the Government to involve Parliament?

Show differential operator is not bounded using definition of bounded operators

How strong are Wi-Fi signals?

Why does the 'metric Lagrangian' approach appear to fail in Newtonian mechanics?

A Python Blackjack terminal based game

How can I find where certain bash function is defined?

Can R-3-methyl-4-heptanone be enantioselectively synthesised from 4-heptanone?

Why doesn't the Earth's acceleration towards the Moon accumulate to push the Earth off its orbit?

What are these arcade games in Ghostbusters 1984?

Full backup on database creation

Is there a public standard for 8 and 10 character grid locators?

Forward and backward integration -- cause of errors

What is the most important source of natural gas? coal, oil or other?

analysis of BJT PNP type - why they can use voltage divider?

At what point in European history could a government build a printing press given a basic description?

Dictionary size reduces upon increasing one element

Why do airplanes use an axial flow jet engine instead of a more compact centrifugal jet engine?

Plot twist where the antagonist wins

Placing bypass capacitors after VCC reaches the IC

Old short story, same personalities, differing planes of existence



Probability of seeing a bird on a certain date based on historical notes


Poisson Regression : expectation vs probability for each outcomePoisson regression with (auto-correlated) time seriesProbability that a random sample will have certain elementsProbability of a certain run of dice rollsProbability of seeing k faces that appear more than 3 times when rolling 10 diceProbability of seeing observation > x based on historical observationsEstimating probability of observing greater than X events based on a current population and historical ratesProbability in poker based on previous outcomesProbability based on average/historical dataConditional Probability - Drawing a Certain ColorProbability of completing a task along certain path in a certain time using beta distributionCalculate probability of bird in the cage






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1












$begingroup$


I have a database filled with different bird species that were seen on different dates (10 years of records).



Each row in the table contains:



Date, Time, Bird Species, Spot where it was seen



So it looks like:



2008-04-07, 14:22:48, Himalayan Snowcock, Spot-4
2008-04-19, 11:44:01, Ring-necked Pheasant, Spot-12
...
2019-05-20, 08:51:14, American Kestrel, Spot-8


It contains thousands of records like this.



Now I need to create a calendar detailing Birds x Months and on each cell it will contain the probability of seeing that bird on that month.



How can I compute the probability based on the records I have?










share|cite|improve this question









New contributor



Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$


















    1












    $begingroup$


    I have a database filled with different bird species that were seen on different dates (10 years of records).



    Each row in the table contains:



    Date, Time, Bird Species, Spot where it was seen



    So it looks like:



    2008-04-07, 14:22:48, Himalayan Snowcock, Spot-4
    2008-04-19, 11:44:01, Ring-necked Pheasant, Spot-12
    ...
    2019-05-20, 08:51:14, American Kestrel, Spot-8


    It contains thousands of records like this.



    Now I need to create a calendar detailing Birds x Months and on each cell it will contain the probability of seeing that bird on that month.



    How can I compute the probability based on the records I have?










    share|cite|improve this question









    New contributor



    Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$














      1












      1








      1


      1



      $begingroup$


      I have a database filled with different bird species that were seen on different dates (10 years of records).



      Each row in the table contains:



      Date, Time, Bird Species, Spot where it was seen



      So it looks like:



      2008-04-07, 14:22:48, Himalayan Snowcock, Spot-4
      2008-04-19, 11:44:01, Ring-necked Pheasant, Spot-12
      ...
      2019-05-20, 08:51:14, American Kestrel, Spot-8


      It contains thousands of records like this.



      Now I need to create a calendar detailing Birds x Months and on each cell it will contain the probability of seeing that bird on that month.



      How can I compute the probability based on the records I have?










      share|cite|improve this question









      New contributor



      Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$




      I have a database filled with different bird species that were seen on different dates (10 years of records).



      Each row in the table contains:



      Date, Time, Bird Species, Spot where it was seen



      So it looks like:



      2008-04-07, 14:22:48, Himalayan Snowcock, Spot-4
      2008-04-19, 11:44:01, Ring-necked Pheasant, Spot-12
      ...
      2019-05-20, 08:51:14, American Kestrel, Spot-8


      It contains thousands of records like this.



      Now I need to create a calendar detailing Birds x Months and on each cell it will contain the probability of seeing that bird on that month.



      How can I compute the probability based on the records I have?







      probability






      share|cite|improve this question









      New contributor



      Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.










      share|cite|improve this question









      New contributor



      Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share|cite|improve this question




      share|cite|improve this question








      edited May 23 at 17:59









      whuber

      209k34459835




      209k34459835






      New contributor



      Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      asked May 21 at 3:19









      Stephen H. AndersonStephen H. Anderson

      1084




      1084




      New contributor



      Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




      New contributor




      Stephen H. Anderson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes


















          2












          $begingroup$

          1. A simple approach would be to create a 2 × 2 contingency table (Bird × Month) and then for each unique combination of bird and month divide the number of years where that species was seen in that month by 10, which would give you a proportion. However, this approach throws away some information


          2. If you wanted to incorporate the fact that some birds are much more common than others and that different months are of different lengths you could use the # of sightings / the number of days (accounting for leap years probably). This would capture the rarity of birds as well as make use of the fact that your information is at a finer than month resolution. If data wasn't collected on a daily basis you could instead divide by the number of days were there was an observer.


          3. You do much more complex and employ something like a time-series model to account for changes in detectability through time.... that is probably a question in itself and would be much more challenging to carry out, at least for me.






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
            $endgroup$
            – Stephen H. Anderson
            May 21 at 11:37










          • $begingroup$
            It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
            $endgroup$
            – Stephen H. Anderson
            May 21 at 11:37










          • $begingroup$
            That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
            $endgroup$
            – André.B
            May 21 at 21:15











          • $begingroup$
            Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
            $endgroup$
            – André.B
            May 21 at 21:22










          • $begingroup$
            Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
            $endgroup$
            – Stephen H. Anderson
            May 22 at 11:51


















          1












          $begingroup$

          You could potentially use a Poisson regression to model monthly counts of birds of a particular species as a function of explanatory/independent variables that you think affect the monthly rate of observations, such as linear time trends/calendar month/season/year (potentially supplementing this data with external data sources on things like weather).



          After estimating the regression, you could then compute probabilities. As Poisson is a discrete probability distribution, you can calculate the probability of X (specify desired count here) of events for specified values of predictors you used in the regression. If you wanted to know the probability of seeing at least one bird, you can compute it as 1 minus the probability of 0 birds seen. Here is another CV Q&A on predicting probabilities with Poisson regression: Poisson Regression : expectation vs probability for each outcome



          For this type of regression, you would normally be assuming that counts in consecutive months are not correlated, meaning that you do not have autocorrelation. If that assumption is not valid, which it probably is not, given that you have time series data, you can examine this Q&A for potential approaches: Poisson regression with (auto-correlated) time series



          Poisson regression has a couple of well-known relatives: Negative Binomial and Zero-Inflated Poisson (also zero-truncated Poisson, but that one should not apply here, as you should have a non-zero probability of 0 birds observed). Negative Binomial is needed when the variance of data exceeds the mean in a given month. Zero-Inflated Poisson is used when there are "excess" zeros in the data generated by a separate process.






          share|cite|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "65"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            Stephen H. Anderson is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f409332%2fprobability-of-seeing-a-bird-on-a-certain-date-based-on-historical-notes%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2












            $begingroup$

            1. A simple approach would be to create a 2 × 2 contingency table (Bird × Month) and then for each unique combination of bird and month divide the number of years where that species was seen in that month by 10, which would give you a proportion. However, this approach throws away some information


            2. If you wanted to incorporate the fact that some birds are much more common than others and that different months are of different lengths you could use the # of sightings / the number of days (accounting for leap years probably). This would capture the rarity of birds as well as make use of the fact that your information is at a finer than month resolution. If data wasn't collected on a daily basis you could instead divide by the number of days were there was an observer.


            3. You do much more complex and employ something like a time-series model to account for changes in detectability through time.... that is probably a question in itself and would be much more challenging to carry out, at least for me.






            share|cite|improve this answer









            $endgroup$












            • $begingroup$
              Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
              $endgroup$
              – Stephen H. Anderson
              May 21 at 11:37










            • $begingroup$
              It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
              $endgroup$
              – Stephen H. Anderson
              May 21 at 11:37










            • $begingroup$
              That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
              $endgroup$
              – André.B
              May 21 at 21:15











            • $begingroup$
              Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
              $endgroup$
              – André.B
              May 21 at 21:22










            • $begingroup$
              Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
              $endgroup$
              – Stephen H. Anderson
              May 22 at 11:51















            2












            $begingroup$

            1. A simple approach would be to create a 2 × 2 contingency table (Bird × Month) and then for each unique combination of bird and month divide the number of years where that species was seen in that month by 10, which would give you a proportion. However, this approach throws away some information


            2. If you wanted to incorporate the fact that some birds are much more common than others and that different months are of different lengths you could use the # of sightings / the number of days (accounting for leap years probably). This would capture the rarity of birds as well as make use of the fact that your information is at a finer than month resolution. If data wasn't collected on a daily basis you could instead divide by the number of days were there was an observer.


            3. You do much more complex and employ something like a time-series model to account for changes in detectability through time.... that is probably a question in itself and would be much more challenging to carry out, at least for me.






            share|cite|improve this answer









            $endgroup$












            • $begingroup$
              Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
              $endgroup$
              – Stephen H. Anderson
              May 21 at 11:37










            • $begingroup$
              It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
              $endgroup$
              – Stephen H. Anderson
              May 21 at 11:37










            • $begingroup$
              That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
              $endgroup$
              – André.B
              May 21 at 21:15











            • $begingroup$
              Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
              $endgroup$
              – André.B
              May 21 at 21:22










            • $begingroup$
              Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
              $endgroup$
              – Stephen H. Anderson
              May 22 at 11:51













            2












            2








            2





            $begingroup$

            1. A simple approach would be to create a 2 × 2 contingency table (Bird × Month) and then for each unique combination of bird and month divide the number of years where that species was seen in that month by 10, which would give you a proportion. However, this approach throws away some information


            2. If you wanted to incorporate the fact that some birds are much more common than others and that different months are of different lengths you could use the # of sightings / the number of days (accounting for leap years probably). This would capture the rarity of birds as well as make use of the fact that your information is at a finer than month resolution. If data wasn't collected on a daily basis you could instead divide by the number of days were there was an observer.


            3. You do much more complex and employ something like a time-series model to account for changes in detectability through time.... that is probably a question in itself and would be much more challenging to carry out, at least for me.






            share|cite|improve this answer









            $endgroup$



            1. A simple approach would be to create a 2 × 2 contingency table (Bird × Month) and then for each unique combination of bird and month divide the number of years where that species was seen in that month by 10, which would give you a proportion. However, this approach throws away some information


            2. If you wanted to incorporate the fact that some birds are much more common than others and that different months are of different lengths you could use the # of sightings / the number of days (accounting for leap years probably). This would capture the rarity of birds as well as make use of the fact that your information is at a finer than month resolution. If data wasn't collected on a daily basis you could instead divide by the number of days were there was an observer.


            3. You do much more complex and employ something like a time-series model to account for changes in detectability through time.... that is probably a question in itself and would be much more challenging to carry out, at least for me.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered May 21 at 4:24









            André.BAndré.B

            68417




            68417











            • $begingroup$
              Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
              $endgroup$
              – Stephen H. Anderson
              May 21 at 11:37










            • $begingroup$
              It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
              $endgroup$
              – Stephen H. Anderson
              May 21 at 11:37










            • $begingroup$
              That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
              $endgroup$
              – André.B
              May 21 at 21:15











            • $begingroup$
              Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
              $endgroup$
              – André.B
              May 21 at 21:22










            • $begingroup$
              Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
              $endgroup$
              – Stephen H. Anderson
              May 22 at 11:51
















            • $begingroup$
              Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
              $endgroup$
              – Stephen H. Anderson
              May 21 at 11:37










            • $begingroup$
              It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
              $endgroup$
              – Stephen H. Anderson
              May 21 at 11:37










            • $begingroup$
              That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
              $endgroup$
              – André.B
              May 21 at 21:15











            • $begingroup$
              Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
              $endgroup$
              – André.B
              May 21 at 21:22










            • $begingroup$
              Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
              $endgroup$
              – Stephen H. Anderson
              May 22 at 11:51















            $begingroup$
            Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
            $endgroup$
            – Stephen H. Anderson
            May 21 at 11:37




            $begingroup$
            Hi Andre, thanks for your answer. I'm a bit confused about option 2 which I think may work best. The only "problem" I'm thinking and I'm not sure how to face is that, for example, on a certain month, let's say april 2018 birdwatchers may have been visiting the same spot many days. So a certain bird which lives there will always be seen on those dates. So that would increase the times the bird was seen but would not mean that bird is easier to see than others.
            $endgroup$
            – Stephen H. Anderson
            May 21 at 11:37












            $begingroup$
            It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
            $endgroup$
            – Stephen H. Anderson
            May 21 at 11:37




            $begingroup$
            It would just mean that the spot was visited more times. Is there a way to have that into consideration? I didn't mention that for each row of data I also have the ID (a number) identifying the exact spot where it was seen. Something like: 2019-05-20, 08:51:14, American Kestrel, 82 (82 being the spot)
            $endgroup$
            – Stephen H. Anderson
            May 21 at 11:37












            $begingroup$
            That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
            $endgroup$
            – André.B
            May 21 at 21:15





            $begingroup$
            That does complicate the problem but I am not sure there is a good way to tie that in, as there is no way to determine if it was the same bird. If you wanted to take a conservative approach you could remove records where the same bird was seen at the same spot multiple times in a given month. This would reduce the probabilities probably past what you could actually expect though.
            $endgroup$
            – André.B
            May 21 at 21:15













            $begingroup$
            Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
            $endgroup$
            – André.B
            May 21 at 21:22




            $begingroup$
            Alternatively, and this would be a fair bit of work, you could look up information about the behavioural ecology of each bird species with duplicate records at a given site/month and then make a decision as to whether or not to remove duplicates based on known information about the bird's home-range and nesting behaviour. For instance, if the american kestrel was seen repeatedly in prime nesting habitat, then perhaps remove them. Conversely, if it was seen during what would normally be migratory conditions or in a habitat that they don't frequent you could keep it.
            $endgroup$
            – André.B
            May 21 at 21:22












            $begingroup$
            Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
            $endgroup$
            – Stephen H. Anderson
            May 22 at 11:51




            $begingroup$
            Hi Andre, I was thinking something similar to your first option. Something like. First time a bird is seen on a spot it's counted as 1 but after that, every extra time the bird is seen at the same spot in the same month it counts as 0,25 (or some other value?) instread of 1. Do you think this is an improvement and would give better results? Problem about your last suggestion is the number of bird species is over 1000 (different countries) and we don't even know the biology.
            $endgroup$
            – Stephen H. Anderson
            May 22 at 11:51













            1












            $begingroup$

            You could potentially use a Poisson regression to model monthly counts of birds of a particular species as a function of explanatory/independent variables that you think affect the monthly rate of observations, such as linear time trends/calendar month/season/year (potentially supplementing this data with external data sources on things like weather).



            After estimating the regression, you could then compute probabilities. As Poisson is a discrete probability distribution, you can calculate the probability of X (specify desired count here) of events for specified values of predictors you used in the regression. If you wanted to know the probability of seeing at least one bird, you can compute it as 1 minus the probability of 0 birds seen. Here is another CV Q&A on predicting probabilities with Poisson regression: Poisson Regression : expectation vs probability for each outcome



            For this type of regression, you would normally be assuming that counts in consecutive months are not correlated, meaning that you do not have autocorrelation. If that assumption is not valid, which it probably is not, given that you have time series data, you can examine this Q&A for potential approaches: Poisson regression with (auto-correlated) time series



            Poisson regression has a couple of well-known relatives: Negative Binomial and Zero-Inflated Poisson (also zero-truncated Poisson, but that one should not apply here, as you should have a non-zero probability of 0 birds observed). Negative Binomial is needed when the variance of data exceeds the mean in a given month. Zero-Inflated Poisson is used when there are "excess" zeros in the data generated by a separate process.






            share|cite|improve this answer









            $endgroup$

















              1












              $begingroup$

              You could potentially use a Poisson regression to model monthly counts of birds of a particular species as a function of explanatory/independent variables that you think affect the monthly rate of observations, such as linear time trends/calendar month/season/year (potentially supplementing this data with external data sources on things like weather).



              After estimating the regression, you could then compute probabilities. As Poisson is a discrete probability distribution, you can calculate the probability of X (specify desired count here) of events for specified values of predictors you used in the regression. If you wanted to know the probability of seeing at least one bird, you can compute it as 1 minus the probability of 0 birds seen. Here is another CV Q&A on predicting probabilities with Poisson regression: Poisson Regression : expectation vs probability for each outcome



              For this type of regression, you would normally be assuming that counts in consecutive months are not correlated, meaning that you do not have autocorrelation. If that assumption is not valid, which it probably is not, given that you have time series data, you can examine this Q&A for potential approaches: Poisson regression with (auto-correlated) time series



              Poisson regression has a couple of well-known relatives: Negative Binomial and Zero-Inflated Poisson (also zero-truncated Poisson, but that one should not apply here, as you should have a non-zero probability of 0 birds observed). Negative Binomial is needed when the variance of data exceeds the mean in a given month. Zero-Inflated Poisson is used when there are "excess" zeros in the data generated by a separate process.






              share|cite|improve this answer









              $endgroup$















                1












                1








                1





                $begingroup$

                You could potentially use a Poisson regression to model monthly counts of birds of a particular species as a function of explanatory/independent variables that you think affect the monthly rate of observations, such as linear time trends/calendar month/season/year (potentially supplementing this data with external data sources on things like weather).



                After estimating the regression, you could then compute probabilities. As Poisson is a discrete probability distribution, you can calculate the probability of X (specify desired count here) of events for specified values of predictors you used in the regression. If you wanted to know the probability of seeing at least one bird, you can compute it as 1 minus the probability of 0 birds seen. Here is another CV Q&A on predicting probabilities with Poisson regression: Poisson Regression : expectation vs probability for each outcome



                For this type of regression, you would normally be assuming that counts in consecutive months are not correlated, meaning that you do not have autocorrelation. If that assumption is not valid, which it probably is not, given that you have time series data, you can examine this Q&A for potential approaches: Poisson regression with (auto-correlated) time series



                Poisson regression has a couple of well-known relatives: Negative Binomial and Zero-Inflated Poisson (also zero-truncated Poisson, but that one should not apply here, as you should have a non-zero probability of 0 birds observed). Negative Binomial is needed when the variance of data exceeds the mean in a given month. Zero-Inflated Poisson is used when there are "excess" zeros in the data generated by a separate process.






                share|cite|improve this answer









                $endgroup$



                You could potentially use a Poisson regression to model monthly counts of birds of a particular species as a function of explanatory/independent variables that you think affect the monthly rate of observations, such as linear time trends/calendar month/season/year (potentially supplementing this data with external data sources on things like weather).



                After estimating the regression, you could then compute probabilities. As Poisson is a discrete probability distribution, you can calculate the probability of X (specify desired count here) of events for specified values of predictors you used in the regression. If you wanted to know the probability of seeing at least one bird, you can compute it as 1 minus the probability of 0 birds seen. Here is another CV Q&A on predicting probabilities with Poisson regression: Poisson Regression : expectation vs probability for each outcome



                For this type of regression, you would normally be assuming that counts in consecutive months are not correlated, meaning that you do not have autocorrelation. If that assumption is not valid, which it probably is not, given that you have time series data, you can examine this Q&A for potential approaches: Poisson regression with (auto-correlated) time series



                Poisson regression has a couple of well-known relatives: Negative Binomial and Zero-Inflated Poisson (also zero-truncated Poisson, but that one should not apply here, as you should have a non-zero probability of 0 birds observed). Negative Binomial is needed when the variance of data exceeds the mean in a given month. Zero-Inflated Poisson is used when there are "excess" zeros in the data generated by a separate process.







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered May 21 at 7:10









                AlexKAlexK

                580110




                580110




















                    Stephen H. Anderson is a new contributor. Be nice, and check out our Code of Conduct.









                    draft saved

                    draft discarded


















                    Stephen H. Anderson is a new contributor. Be nice, and check out our Code of Conduct.












                    Stephen H. Anderson is a new contributor. Be nice, and check out our Code of Conduct.











                    Stephen H. Anderson is a new contributor. Be nice, and check out our Code of Conduct.














                    Thanks for contributing an answer to Cross Validated!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f409332%2fprobability-of-seeing-a-bird-on-a-certain-date-based-on-historical-notes%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

                    Area configuration aggregation error after install Porto themeMagento 2.1 CE Installed but front/backend not loading/workingCSS not loading on page within Magento 2 pageCannot install module in Magento 2no commands defined in the “setup” namespace. in Magento2Magento 2: Static files are present but shows 404Why do i have to always run the commands to clean cache in Magento 2.1.8?Failure reason: 'Unable to unserialize value.'Error 500 after magento migrationIn production mode the site does not loadMagento 2 : Error 500 after installing

                    Middle Expansion Olielle Resaix Definition: Uttering songs of triumph shouting with joy triumphant exulting Sejunction Journal 붙다 달 고급 품목 외출 The stretch trades the screeching tin. Definition: The act of speaking with a drawl a drawl Cough Sand Definition: An uproar a quarrel a noisy outbreak Shake Iron Publicize Horse House Baby 사과 Resaix Flaggy Jelly Temporary Unequaled Puppet A drop in the bucket Shrew 성격 회원 성질 미팅 The burn frames the tacky quality. Materialistic The smoke reduces the way. Yammoe Nondescript Cheek 얼굴 배 약하다 날리다 타다 The illegal country shows the iron. Help Rule Drearien Smoke Teaching Meaty Wasp Abraham Lincoln Jaws 진심 수리하다 Size Cork Idea Convert Think Lark John Lennon 거울 청소 군 추천하다 아이스크림