AIC for increasing sample sizePositive log likelihood values and penalty of more complex models when ranking models using AICUsing AIC, for model selection when both models are equally weighted, and one model has fewer parametersSparse parameters when computing AIC, BIC, etcAIC, BIC and GCV: what is best for making decision in penalized regression methods?Comparison of log-likelihood of two non-nested modelsAIC, model selection and overfittingCan we use AIC to compare two GLMs when the scale parameter is estimated separately for each?Determination of maximum log-likelihood of nonlinear model for calculation of Aikaike ICAIC formula in R vs PythonAIC Calculation using log likelihood
Fencing style for blades that can attack from a distance
Today is the Center
the place where lots of roads meet
How can bays and straits be determined in a procedurally generated map?
Email Account under attack (really) - anything I can do?
In Japanese, what’s the difference between “Tonari ni” (となりに) and “Tsugi” (つぎ)? When would you use one over the other?
strToHex ( string to its hex representation as string)
Why dont electromagnetic waves interact with each other?
Did Shadowfax go to Valinor?
Risk of getting Chronic Wasting Disease (CWD) in the United States?
Writing rule which states that two causes for the same superpower is bad writing
The magic money tree problem
How old can references or sources in a thesis be?
Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?
What are these boxed doors outside store fronts in New York?
What are the differences between the usage of 'it' and 'they'?
How is it possible to have an ability score that is less than 3?
How is the claim "I am in New York only if I am in America" the same as "If I am in New York, then I am in America?
To string or not to string
Do VLANs within a subnet need to have their own subnet for router on a stick?
Why are 150k or 200k jobs considered good when there are 300k+ births a month?
Why was the small council so happy for Tyrion to become the Master of Coin?
The use of multiple foreign keys on same column in SQL Server
What do the dots in this tr command do: tr .............A-Z A-ZA-Z <<< "JVPQBOV" (with 13 dots)
AIC for increasing sample size
Positive log likelihood values and penalty of more complex models when ranking models using AICUsing AIC, for model selection when both models are equally weighted, and one model has fewer parametersSparse parameters when computing AIC, BIC, etcAIC, BIC and GCV: what is best for making decision in penalized regression methods?Comparison of log-likelihood of two non-nested modelsAIC, model selection and overfittingCan we use AIC to compare two GLMs when the scale parameter is estimated separately for each?Determination of maximum log-likelihood of nonlinear model for calculation of Aikaike ICAIC formula in R vs PythonAIC Calculation using log likelihood
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.
I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?
model-selection aic asymptotics log-likelihood
New contributor
$endgroup$
add a comment |
$begingroup$
I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.
I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?
model-selection aic asymptotics log-likelihood
New contributor
$endgroup$
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
yesterday
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
yesterday
add a comment |
$begingroup$
I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.
I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?
model-selection aic asymptotics log-likelihood
New contributor
$endgroup$
I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.
I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?
model-selection aic asymptotics log-likelihood
model-selection aic asymptotics log-likelihood
New contributor
New contributor
edited yesterday
Richard Hardy
28.1k642128
28.1k642128
New contributor
asked yesterday
JasonJason
161
161
New contributor
New contributor
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
yesterday
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
yesterday
add a comment |
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
yesterday
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
yesterday
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
yesterday
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
yesterday
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
It's a known criticism of AIC.
The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,
$$ textBIC = log(n) k - 2 log mathcalL,$$
though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.
Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.
$endgroup$
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Jason is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401363%2faic-for-increasing-sample-size%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
It's a known criticism of AIC.
The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,
$$ textBIC = log(n) k - 2 log mathcalL,$$
though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.
Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.
$endgroup$
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday
add a comment |
$begingroup$
It's a known criticism of AIC.
The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,
$$ textBIC = log(n) k - 2 log mathcalL,$$
though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.
Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.
$endgroup$
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday
add a comment |
$begingroup$
It's a known criticism of AIC.
The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,
$$ textBIC = log(n) k - 2 log mathcalL,$$
though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.
Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.
$endgroup$
It's a known criticism of AIC.
The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,
$$ textBIC = log(n) k - 2 log mathcalL,$$
though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.
Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.
edited yesterday
answered yesterday
AdamOAdamO
34.4k264142
34.4k264142
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday
add a comment |
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
yesterday
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
yesterday
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
yesterday
add a comment |
Jason is a new contributor. Be nice, and check out our Code of Conduct.
Jason is a new contributor. Be nice, and check out our Code of Conduct.
Jason is a new contributor. Be nice, and check out our Code of Conduct.
Jason is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401363%2faic-for-increasing-sample-size%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
yesterday
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
yesterday
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
yesterday