AIC for increasing sample sizePositive log likelihood values and penalty of more complex models when ranking models using AICUsing AIC, for model selection when both models are equally weighted, and one model has fewer parametersSparse parameters when computing AIC, BIC, etcAIC, BIC and GCV: what is best for making decision in penalized regression methods?Comparison of log-likelihood of two non-nested modelsAIC, model selection and overfittingCan we use AIC to compare two GLMs when the scale parameter is estimated separately for each?Determination of maximum log-likelihood of nonlinear model for calculation of Aikaike ICAIC formula in R vs PythonAIC Calculation using log likelihood

Fencing style for blades that can attack from a distance

Today is the Center

the place where lots of roads meet

How can bays and straits be determined in a procedurally generated map?

Email Account under attack (really) - anything I can do?

In Japanese, what’s the difference between “Tonari ni” (となりに) and “Tsugi” (つぎ)? When would you use one over the other?

strToHex ( string to its hex representation as string)

Why dont electromagnetic waves interact with each other?

Did Shadowfax go to Valinor?

Risk of getting Chronic Wasting Disease (CWD) in the United States?

Writing rule which states that two causes for the same superpower is bad writing

The magic money tree problem

How old can references or sources in a thesis be?

Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?

What are these boxed doors outside store fronts in New York?

What are the differences between the usage of 'it' and 'they'?

How is it possible to have an ability score that is less than 3?

How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?

To string or not to string

Do VLANs within a subnet need to have their own subnet for router on a stick?

Why are 150k or 200k jobs considered good when there are 300k+ births a month?

Why was the small council so happy for Tyrion to become the Master of Coin?

The use of multiple foreign keys on same column in SQL Server

What do the dots in this tr command do: tr .............A-Z A-ZA-Z <<< "JVPQBOV" (with 13 dots)

AIC for increasing sample size

Positive log likelihood values and penalty of more complex models when ranking models using AICUsing AIC, for model selection when both models are equally weighted, and one model has fewer parametersSparse parameters when computing AIC, BIC, etcAIC, BIC and GCV: what is best for making decision in penalized regression methods?Comparison of log-likelihood of two non-nested modelsAIC, model selection and overfittingCan we use AIC to compare two GLMs when the scale parameter is estimated separately for each?Determination of maximum log-likelihood of nonlinear model for calculation of Aikaike ICAIC formula in R vs PythonAIC Calculation using log likelihood

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.

I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?

edited yesterday

Richard Hardy

28.1k642128

asked yesterday

Jason

161

New contributor

$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
yesterday

$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
yesterday

$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
yesterday

add a comment |

edited yesterday

Richard Hardy

28.1k642128

asked yesterday

Jason

161

New contributor

$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
yesterday

$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
yesterday

$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
yesterday

add a comment |

edited yesterday

Richard Hardy

28.1k642128

asked yesterday

Jason

161

New contributor

model-selection aic asymptotics log-likelihood

edited yesterday

Richard Hardy

28.1k642128

asked yesterday

Jason

161

New contributor

edited yesterday

Richard Hardy

28.1k642128

asked yesterday

Jason

161

New contributor

edited yesterday

Richard Hardy

28.1k642128

edited yesterday

Richard Hardy

28.1k642128

edited yesterday

Richard Hardy

28.1k642128

asked yesterday

Jason

161

New contributor

asked yesterday

Jason

161

asked yesterday

Jason

161

New contributor

Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
yesterday

$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
yesterday

$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
yesterday

add a comment |

$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
yesterday

$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
yesterday

$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
yesterday

Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.

– Richard Hardy
yesterday

Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.

– Jason
yesterday

In my comment above, it should be negative likelihood, not raw likelihood.

– Richard Hardy
yesterday

add a comment |

1 Answer
1

active

oldest

votes

It's a known criticism of AIC.

The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,

$$ textBIC = log(n) k - 2 log mathcalL,$$

though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.

Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.

edited yesterday

answered yesterday

AdamO

34.4k264142

$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday

$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday

$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday

$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Jason is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401363%2faic-for-increasing-sample-size%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

It's a known criticism of AIC.

The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,

$$ textBIC = log(n) k - 2 log mathcalL,$$

edited yesterday

answered yesterday

AdamO

34.4k264142

$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday

$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday

$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday

$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday

add a comment |

It's a known criticism of AIC.

The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,

$$ textBIC = log(n) k - 2 log mathcalL,$$

edited yesterday

answered yesterday

AdamO

34.4k264142

$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday

$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday

$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday

$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday

add a comment |

It's a known criticism of AIC.

The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,

$$ textBIC = log(n) k - 2 log mathcalL,$$

edited yesterday

answered yesterday

AdamO

34.4k264142

It's a known criticism of AIC.

The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,

$$ textBIC = log(n) k - 2 log mathcalL,$$

edited yesterday

answered yesterday

AdamO

34.4k264142

edited yesterday

answered yesterday

AdamO

34.4k264142

answered yesterday

AdamO

34.4k264142

answered yesterday

AdamO

34.4k264142

$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday

$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday

$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday

$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday

add a comment |

$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday

$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday

$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday

$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday

You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?

– Richard Hardy
yesterday

OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.

– Richard Hardy
yesterday

@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.

– AdamO
yesterday

Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.

– Richard Hardy
yesterday

add a comment |

Jason is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Jason is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ttdfjt

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

1 Answer
1

1 Answer
1

1 Answer
1