Ttdfjt

Question

This quote—or some variant of it—has been floating around the news and social media world; this comes from Boston.com on July 26, 2019:

According to a 2015 report by the Federal Reserve Bank of Boston, the median net worth of white households in the city was $247,500. The median net worth for black families was $8.

I have no doubt there is a wealth gap between white families and black families in Boston, but is that number of $8 correct?

And yes, I realize that article points to a 2015 study by Duke University and the Federal Reserve Bank of Boston that can be found here, but is the methodology used to determine that $8 number correct?

Graph from the 2015 study below:

Graph from the “The Color of Wealth in Boston”

Can I ask exactly where you are looking for clarification? I read the (short) Methodology pdf and don't see anything obviously wrong. — Jul 28 at 4:26
@BarryHarrison I respect where you are coming from, but your answer addresses the issue as I see it: Basically explaining the methodology in a way that makes sense. And this all makes sense. Great work! — Jul 28 at 22:42
I'm kind of surprised that all the numbers are positive; I'd assume that in many/most poor households the median net worth would be negative i.e. that student loans + credit card debt + other debt outweighs whatever savings they have. Even for a very high-earning two-income family of e.g. a doctor and a lawyer it takes quite many years until their net worth becomes positive. — Jul 29 at 20:48

score 35 · Accepted Answer · 2019-07-29 03:33:12Z

The full report (downloaded from link in question) talks about methodology on page 8. They compare various assets and debts of different ethnic groups and use these values to calculate net worth for the groups. These values were determined by surveying different communities.

For the NASCC project in general, about 70,000 personalized advanced letters were sent, 87,000 telephone numbers dialed 448,000 times, and 12,113 interviewer hours were spent across three shops to conduct 2,746 completed surveys.

Of the 2,746 completed surveys, 403 were in the Boston MSA (metropolitan statistical area) and used in the report's analysis (see Table 1 on page 13). The rest were not. 78 of the surveyed were whites and 71 were U.S. blacks. Thus, there were 149 samples when comparing whites to blacks.

The methodology appears accurate. However, the sample size (403 residents) does seem small. The white and black sample size (149 residents) is even smaller. This is a fault that the study authors admit.

We evaluated whether the data for whites and nonwhites differ in a statistically significant way. Note that what we report here as statistically significance results are considered to be conservative.¹⁶

¹⁶We report significance at the 90%, 95% and 99% levels. However, given our small sample sizes it may be difficult to detect significance at those levels even if differences exist.

...

Small sample sizes limit the statistical power to detect meaningful differences even when there is good reason to suspect that group-based differences in assets levels and debts exist.

...

Unfortunately, it was not possible to provide data broken down by age for all the groups analyzed in Boston, because the sample size was too small.

Standard deviation data was not included in the final report. My limited understanding of statistics is that to estimate a necessary sample size, it is necessary to know the observed mean and standard deviation in the group (see the last equation on page 1183 here). A moderator and another user on Cross Validated was kind enough to consider the issue of sample size. The moderator concluded "We lack the information to get a sample size." The other user wrote "I would say that the sample sizes are too small."

Another possible flaw of the study is the age and educational differences between the white and non-white groups. The white and U.S. black groups had a median age of 55, so age differences would not apply here. However, 55% of whites had a Bachelor's degree or higher vs. 43% for U.S. blacks. The lack of a more detailed study with these variables is another drawback of a small sample size. The study authors note (emphasis added):

Some of the differences observed may be driven by differences in age or educational attainment. In general, nonwhites in the survey were younger and had much lower educational attainment rates. Unfortunately, it was not possible to provide data broken down by age for all the groups analyzed in Boston, because the sample size was too small.

Some of the reasons that whites have a higher median net worth (included in the report):

Whites are more likely to have liquid assets (e.g. checking accounts and savings accounts). "For every dollar, the typical white household has in liquid assets (excluding cash), U.S. blacks have 2 cents..." (median $25,000 vs $700 for blacks)

Whites are more likely to own homes (79% vs. 33% blacks), retirement accounts (56% vs. 20%), and vehicles (85% vs. 50%).

Whites are less likely to have mortgage debt (60% homeowners vs. 90% black homeowners), student loans (19% vs. 28%), and medical debt (11% vs. 17%).

Whites are less likely to send money to relatives (both abroad and in the US), which increases personal savings.

A point the report makes is while differences in annual income may not be as significant, the difference is magnified over time. This reflects in the net worth. (empasis added below)

In part, racial differences in net worth are derived from racially based differences in income because nonwhites generally earn less. But racial differences in income and racial differences in wealth are only weakly correlated. Rather, the racially based gulf in wealth accumulation widens as income increases and because wealth differences reflect an accumulated lifetime of income disparities, compounded by asset returns (or lack thereof), the racial wealth gap is much greater than the income differences.

To summarize:

Is there a wealth gap in Boston where the median net worth of white households is $247,500 while the median net worth for black families was $8?

The data provided in the full report is limited. I can see that two limitations of the study are a small sample size and educational differences between the white and non-white groups.¹ Without the raw data, we would have to trust that the interviews and subsequent data analysis were performed correctly by the researchers.

¹The researchers did find statistical significance for the median net worth values. "Difference in findings of nonwhite household median or mean net worth values were statistically significant at the 99 percent level."

Comments are not for extended discussion; this conversation has been moved to chat. — yesterday

JakeGould 2,7745 gold badges15 silver badges35 bronze badges · Accepted Answer · 2019-08-01 13:47:31Z

I wasn't going to answer, but since a lot of the discussion has been about the statistical aspects and I have some background in that, maybe I can add something here.

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Stats 101 generally focuses on simple random sample (SRS) designs: we have a population we want to describe, we run a random selection process where every unit (person, household, whatever you're studying) has an equal chance of selection in the sample. From that approach, we can calculate a lot of things about margins of error, sample sizes, etc. etc.

This should be understood as an "assume a spherical cow" approach: it's a useful starting point, but only a starting point. In reality, serious socioeconomic research is almost never based on a pure SRS design and assumptions/rules of thumb drawn from that Stats 101 understanding may not be reliable for assessing it.

I recommend Särndal, Swensson and Wretman's "Model Assisted Survey Sampling" for a solid overview of more complex survey design and analysis methods. See in particular the sections on sampling frames (& multiple frames), stratified sampling & optimal sample allocation, and regression/post-stratification errors. Please take MASS as a general cite for most of the following discussion.

Caveat: MASS concentrates on means (averages) rather than quantile statistics (medians/quartiles/etc.) However, most of the methods presented there have obvious parallels for working with quantiles.

Some of the issues that have been raised:

Is the sample representative?

The real question here is "should it be?"

When you're trying to describe a population as a whole, "representative sampling" is often a reasonable approach. (Loosely speaking: attempting to ensure that the statistical distribution within your sample is roughly the same as in the real population.) Simple random sampling is one way that people attempt to achieve a representative sample, and there are non-SRS options for this purpose too e.g. systematic sampling and some implementations of stratified sampling.

The mathematical analysis is then easy; if you believe that the sample is a representative microcosm of the full population, you can just take the mean and median of the sample and use that as an estimate for the population. This isn't always the most efficient approach possible, but for whole-population stats it's not bad.

However, when you're interested in sub-populations e.g. different racial groups, it's another story.

Suppose we want to explore questions like "are White households more affluent than Black households?" This usually requires estimating typical affluence for each group, typical ranges (e.g. "90% of households have more than $X net worth, and 10% have more than $Y"), and then comparing the two.

Problem here is that for any given group, the accuracy of those estimates is affected by sample size. If we're surveying a population that's 90% White and 10% Black, and we use a "representative" sample, we will end up with 9x as many White households as Black ones. That will give us very good estimates for the characteristics of White households but much poorer estimates for Black ones, making it difficult to compare the two.

One option is just to increase the sample size so that we get enough people/households from the smaller sub-populations, while getting even more from the common ones. This is mathematically nice but it's prohibitively expensive for anything that requires trained interviewers, as economic data usually does.

A more practical alternative is oversampling: one way or another, you adjust your survey design to select a higher proportion of the smaller sub-populations. The Boston study has done this:

Various sampling techniques were used to locate and identify an
ethnically plural sample consisting of the specifically defined ethnic
groups. The techniques included the following: directory-listed
landline samples targeted to census tracts where specific ethnic
groups were known to reside; cell phone random digit dialing samples
drawn from rate centers that covered the targeted ethnic group ZIP
codes; samples drawn from targeted ZIP codes on the basis of billing
address; and the use of surname-based lists targeting specific
national origin groups.

A naïve approach to this can introduce a great deal of bias. If I try to get a sample of Asian households solely by going to a district known for having a high Asian population, obviously I will miss the Asian households outside that district, who may have very different characteristics. However, there are well-known methods for avoiding this problem.

One common method, discussed in MASS, is pi-weighting (Horvitz-Thompson): make sure that everybody (or almost everybody) has some chance of selection, and then factor in that probability of selection when analysing the results. For instance, I might sample 1/100th of the households in District A, known to have a relatively high Asian population, but only 1/500th of the households in District B. The results are then "weighted up" so that each household sampled from District B has 5x as much influence on the results as each household from District A, cancelling out the bias that we would otherwise introduce by sampling more heavily from A.

Here is a peer-reviewed paper that I co-authored some years back, discussing an example of how these concepts were applied in designing an Australian survey, if anybody wants to see a bit more mathematical crunch. (Caveat that this is a design paper, which uses Horvitz-Thompson as a simplified version of the weighting method in order to keep the maths manageable; the actual data analysis post-survey used a somewhat more complex benchmarking approach.) A couple of things to take away from that:

Even a simplification of the method makes for a very dense paper that takes a long time to write and will probably only be read by a handful of people. I would love it if everybody published the full details of their weighting methods, but it may not be easy to find the time.

Nevertheless, these methods are well known and used by people who do socioeconomic research - we weren't the first and won't be the last.

The Boston survey introduces an additional complication: multiple "frames" meaning that some households may have more than one path to selection (e.g. some households might be on a list of names and in a targeted zip code). Even when a household doesn't actually get selected twice, this affects probability of selection and needs to be factored into the analysis. But again, this is a well-known issue and there are standard methods for dealing with it - see Särndal et al on "multiple frames".

The paper mentions that weighting was used, but doesn't go into much detail. Without more detail on the methods used, obviously I can't verify for myself that the researchers have handled these issues correctly. But I also don't see anything to signal that they didn't.

The acknowledgements mention methodological advice from Marcin Hitczenko and Kobi Abayomi and review by Tatjana Meschede, among others. From a quick look at their CVs, I would be astonished if those reviewers/contributors were unaware of this kind of issue or the standard methods for dealing with it.

Is the sample size large enough?

This is addressed at several points in the paper. The paper reports statistical significance thresholds (p-values), which are effectively a statistician's way of answering the question "given the sample sizes, how unlikely is it that random noise alone would be able to create this apparent difference?"

In most cases, the answer as reported in the paper is "pretty unlikely". In particular, Table 9 shows that the median net worth for White households is higher than every other group at the 99% significance threshold, excepting Asian and Cape Verdeans where the sample sizes were too small.

The sample sizes for individual racial groups are not huge (78 White households, 71 US Black). In general, a larger sample gives more accurate estimates, but the question of "how much is enough?" depends very much on the specifics of the problem: what the variable of interest is, how it's distributed among the populations of interest, how it is to be estimated, etc. etc.

As the paper acknowledges, this can make it hard to detect meaningful differences even when they do exist, and they have flagged some cases/groups where sample sizes were too small to be useful, especially when trying to break results down by age.

But I'm not aware of any statistical rule that says it's impossible to obtain serviceable estimates for a comparison between samples of 78 and 71, in this particular context, and I very much doubt it exists.

In other words: the authors have asserted that the sample size is large enough to be pretty confident that White households have higher net worth than Black households. They haven't shown the full working for that, so you'll have to make your own judgements about whether you trust their professional chops, but it is what they're asserting.

As noted in Bryan Krause's answer, the paper doesn't give quantiles other than the median, which makes it difficult to interpret exactly how meaningful the difference is. I think it almost certainly is meaningful and large - one would have to hypothesise some very weird income distributions for it to be otherwise - but without that IQR-type data it's impossible to be sure.

Possibility of non-response bias

The paper reports:

For the NASCC project in general, about 70,000 personalized advanced letters were sent, 87,000 telephone
numbers dialed 448,000 times, and 12,113 interviewer hours were spent
across three shops to conduct 2,746 completed surveys.

This is pretty common for research of this kind. Some people are hard to contact, some don't want to be interviewed or just don't have the time.

This is a perennial concern for survey-based researchers. Obviously it reduces the sample size, but that can be factored in; if you know only 5% of households are going to respond, you budget for 20x as many contact attempts.

The challenging part is estimating what it does to the selection probabilities. Working singles are harder to contact than stay-at-home parents or retirees, and each of those groups will have different selection probabilities.

One approach to this problem, discussed in MASS and elsewhere, is benchmarking, aka post-stratification: you compare the demographic distribution of your sample to some estimates of what the population demographics should be (e.g. Census data), and use that to weight up the groups that have effectively been undersampled due to non-response. There's also regression weighting, an extension of the same principle, and various other methods.

The Boston paper appears to have done something along those lines:

The statistics in the sample used weights based on family
characteristics in the U.S. Census Bureau’s American Community Survey
to generate results representative of specific ethnic group characteristics in the respondent’s metropolitan area of residence. Overall,
the results computed from the unweighted NASCC sample are not
dissimilar from those using the weight- ed NASCC sample, suggesting
that the specific ethnic group observations in the metropolitan
areas covered by the study were fairly representative of their
populations at large.

It's always tough to tell exactly how effective post-strat methods are in neutralising non-response bias, but they seem to have done what they could.

Overall, the report is a bit light on detail for my tastes. It would be nice to have more information about selection/weighting methods and on the distribution of net worth beyond just the medians. But what has been published looks consistent with competent researchers following standard and defensible methods for this kind of work.

score 1 · Accepted Answer · 2019-07-30 00:58:21Z

The existing answer covers the details of the associated study so I won't rehash that all here; the question already covers the main result as well and is mostly asking about methodology.

Sample size and significance

In the comments there is a long discussion about sample size. As discussed in those comments, the sample size is not particularly large, but also the median is less susceptible to outliers.

A very important thing to realize with small sample size is the relationship with power. The authors report statistically significant differences in net worth and income (see the original study here). A statistically significant result with a smaller sample size is actually potentially indicative of a larger effect.

In a very large sample, you can risk declaring significant effects that are not actually important (for example, if you sampled 7 billion people you could perhaps detect a financial net worth difference of 1 cent and then have to retract the study when someone in the control group picks up a nickel). In a small sample, you instead risk making a type II error: failing to detect a real difference. Therefore, you should not criticize a small study that shows a significant effect for being "underpowered": that argument is nonsense.

Karl Friston points this out in one of his "ironic rules" for reviewers in a quite amusing exchange.

Sample size and estimation

That said, the original question is more about the actual numbers reported, and here is a place one could quibble a bit. I see no reason to doubt the actual numbers reported (and again, the other answer already stepped through the methods). However, what is lacking from the report is some measure of the distribution besides the median. Given that they are reporting medians, an appropriate statistic would be something like an interquartile range, or simply presenting the whole distributions as a histogram, pdf, or cdf.

For example, if the IQR for whites is -$100,000 to $1,000,000, and for blacks it is -$110,000 to $900,000, one could still observe the same difference in medians, but the context is different than if the ranges were $50,000 to $400,000 versus -$50,000 to +$50,000. In that case, you'd see that at least 75% of blacks have less net worth than at least 75% of whites.

Additionally, the precision of the measurements for a given sample size depend on these ranges, so without this information it is difficult to contextualize the results properly.

Demographics

The study has some limited demographic information; median age was 55 in both black and white groups. Education was lower in the black group, but that's expected from broader demographics and could certainly relate to the net worth differences, but the authors found similar differences considering only those with a bachelor's degree or only those 51-65 years old:

The net worth differences of whites and blacks were remarkable even when level of education or age were considered. Median wealth among black households that have a bachelor’s
degree or higher ($12,000) was 4 percent of the median for white households ($313,500).
Similarly, if we look at households in the 51- to 65-year-old bracket, the typical white
household holds $311,000 in wealth, compared with only $4,000 for the typical black
household (Table 10).

Summary

Does this study reliably show that whites have greater median net worth than blacks? Yes, it does, and the small sample size does not matter.

Do the actual numbers $250k and $8 reflect the actual distributions? We don't have enough information.

"the median is less susceptible to outliers." This may be true, but should be cited. You hardly address the possibility that this is an apples to oranges comparison, where the median black respondent is a college student and the median white respondent is a fifty-year-old homeowner. You also don't show that the study found statistically significant results. This should be supported with at least a quotation from the study, not a raw assertion. — Jul 30 at 0:50
@Brythan Mean versus median susceptibility to outliers is not really something that needs to be cited, it's a common knowledge fact and can be demonstrated by taking a sample from a normal distribution, adding an outlier, and comparing the change to mean versus median. I can update with demographics; the median age is 55 in both groups, and add a link to the study. — Jul 30 at 0:54
But wealth is not normally distributed. So why are you making (still without citation) a statistical claim based on different circumstances? And your claim about the median age is directly contradicted by the study: "In general, nonwhites in the survey were younger and had much lower educational attainment rates." — Jul 30 at 4:23
@Brythan It doesn't matter. Try it with any distribution. It's hard to find a citation that doesn't just state this because it's very very simple. Google "median outliers" or something. The statement you quote has to do with all nonwhites, not blacks. Blacks and whites had the same median age. It's in a table (Table 1). — Jul 30 at 4:49
@Brythan If you have a list like that then neither the mean or median is actually informative and measures of central tendency don't apply. Wealth data don't look anything like that. I have no idea what you mean by "And I think that you're missing the real point. We're not talking about adding one entry to the list. We're talking about adding most of a thousand." In any case, I don't plan to convince you, so I'll stop the discussion now. — Jul 30 at 5:45

BrythanBrythan 9,3005 gold badges40 silver badges52 bronze badges · Accepted Answer · 2019-07-30 05:11:30Z

If you read the methodology (PDF) portion, there are some questionable practices.

directory-listed landline samples targeted to census tracts where
specific ethnic groups were known to reside

But census tracts are also highly correlated with wealth. From Area characteristics and individual-level socioeconomic position indicators in three population-based epidemiologic studies.

[...] blacks were much more likely to live in more disadvantaged areas than whites.

and

Differences in the residential environments of blacks and whites may need to be taken into account in interpreting race differences in epidemiologic studies.

So what the referenced study did, is they looked for blacks in poor neighborhoods and looked for whites in richer neighborhoods. Unsurprisingly, they found poor blacks and richer whites.

Back to the methodology:

cell phone random digit dialing samples drawn from rate centers that covered the targeted ethnic group ZIP codes; samples drawn from targeted ZIP codes on the basis of billing address

And once again, ZIP code is highly correlated with wealth.

And the last part of the methodology:

the use of surname-based lists targeting specific national origin groups.

But this is going to find more recent immigrants. Because the typical descendants of slaves have no idea what their African ancestors' surnames were. So many use the same surnames as whites, with some variation. For example, White, Washington, and Lincoln are particularly common.

None of these methods produce random samples from the overall black and white population. So when you see claims of statistical significance, remember that what is being claimed is that people found in particular census tracts or zip codes or having particular surnames are being compared to other people with different census tracts, zip codes, or surnames. Not random samples of the overall population.

Of course, none of this means that the claim is not true. It only means that this approach did not prove it.

The study does not claim to compare blacks and whites in the same neighborhood with the same socioeconomic status. It's comparing them within a city. It would be very selective and misleading to not consider in your sampling that different racial groups live in different neighborhoods. — Jul 30 at 5:46
"It would be very selective and misleading to not consider in your sampling that different racial groups live in different neighborhoods." I disagree. Proper sampling would have ignored neighborhoods altogether and just randomly picked people from the overall pool, determining race later. Then we would have been able to look at the random sample and see if it was skewed by comparing it to the characteristics of the pool. As is, we don't know if the sample is representative of the overall population. — Jul 30 at 21:16
What you're talking about here is oversampling, which is a common technique when trying to get usable data about minority groups. (Otherwise you end up sampling 10x as many people as you actually need, in order to get just enough of the smaller groups.) A naïve analysis of oversampled data will produce significant bias, for the reasons you mention, but there are established methods for preventing that bias - basically, you "weight up" people who had a smaller chance of selection. See here for an example. (1/2) — Jul 31 at 21:59
Unfortunately, there's not very much info in the paper about the detail of the analysis, so it's not clear to me how they have/haven't adjusted for oversampling considerations. (I would certainly hope that they had!) (2/2) — Jul 31 at 22:03

score 35 · Accepted Answer · 2019-07-29 03:33:12Z

The full report (downloaded from link in question) talks about methodology on page 8. They compare various assets and debts of different ethnic groups and use these values to calculate net worth for the groups. These values were determined by surveying different communities.

For the NASCC project in general, about 70,000 personalized advanced letters were sent, 87,000 telephone numbers dialed 448,000 times, and 12,113 interviewer hours were spent across three shops to conduct 2,746 completed surveys.

Of the 2,746 completed surveys, 403 were in the Boston MSA (metropolitan statistical area) and used in the report's analysis (see Table 1 on page 13). The rest were not. 78 of the surveyed were whites and 71 were U.S. blacks. Thus, there were 149 samples when comparing whites to blacks.

The methodology appears accurate. However, the sample size (403 residents) does seem small. The white and black sample size (149 residents) is even smaller. This is a fault that the study authors admit.

We evaluated whether the data for whites and nonwhites differ in a statistically significant way. Note that what we report here as statistically significance results are considered to be conservative.¹⁶

¹⁶We report significance at the 90%, 95% and 99% levels. However, given our small sample sizes it may be difficult to detect significance at those levels even if differences exist.

...

Small sample sizes limit the statistical power to detect meaningful differences even when there is good reason to suspect that group-based differences in assets levels and debts exist.

...

Unfortunately, it was not possible to provide data broken down by age for all the groups analyzed in Boston, because the sample size was too small.

Standard deviation data was not included in the final report. My limited understanding of statistics is that to estimate a necessary sample size, it is necessary to know the observed mean and standard deviation in the group (see the last equation on page 1183 here). A moderator and another user on Cross Validated was kind enough to consider the issue of sample size. The moderator concluded "We lack the information to get a sample size." The other user wrote "I would say that the sample sizes are too small."

Another possible flaw of the study is the age and educational differences between the white and non-white groups. The white and U.S. black groups had a median age of 55, so age differences would not apply here. However, 55% of whites had a Bachelor's degree or higher vs. 43% for U.S. blacks. The lack of a more detailed study with these variables is another drawback of a small sample size. The study authors note (emphasis added):

Some of the differences observed may be driven by differences in age or educational attainment. In general, nonwhites in the survey were younger and had much lower educational attainment rates. Unfortunately, it was not possible to provide data broken down by age for all the groups analyzed in Boston, because the sample size was too small.

Some of the reasons that whites have a higher median net worth (included in the report):

Whites are more likely to have liquid assets (e.g. checking accounts and savings accounts). "For every dollar, the typical white household has in liquid assets (excluding cash), U.S. blacks have 2 cents..." (median $25,000 vs $700 for blacks)

Whites are more likely to own homes (79% vs. 33% blacks), retirement accounts (56% vs. 20%), and vehicles (85% vs. 50%).

Whites are less likely to have mortgage debt (60% homeowners vs. 90% black homeowners), student loans (19% vs. 28%), and medical debt (11% vs. 17%).

Whites are less likely to send money to relatives (both abroad and in the US), which increases personal savings.

A point the report makes is while differences in annual income may not be as significant, the difference is magnified over time. This reflects in the net worth. (empasis added below)

In part, racial differences in net worth are derived from racially based differences in income because nonwhites generally earn less. But racial differences in income and racial differences in wealth are only weakly correlated. Rather, the racially based gulf in wealth accumulation widens as income increases and because wealth differences reflect an accumulated lifetime of income disparities, compounded by asset returns (or lack thereof), the racial wealth gap is much greater than the income differences.

To summarize:

Is there a wealth gap in Boston where the median net worth of white households is $247,500 while the median net worth for black families was $8?

The data provided in the full report is limited. I can see that two limitations of the study are a small sample size and educational differences between the white and non-white groups.¹ Without the raw data, we would have to trust that the interviews and subsequent data analysis were performed correctly by the researchers.

¹The researchers did find statistical significance for the median net worth values. "Difference in findings of nonwhite household median or mean net worth values were statistically significant at the 99 percent level."

Comments are not for extended discussion; this conversation has been moved to chat. — yesterday

JakeGould 2,7745 gold badges15 silver badges35 bronze badges · Accepted Answer · 2019-08-01 13:47:31Z

I wasn't going to answer, but since a lot of the discussion has been about the statistical aspects and I have some background in that, maybe I can add something here.

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Stats 101 generally focuses on simple random sample (SRS) designs: we have a population we want to describe, we run a random selection process where every unit (person, household, whatever you're studying) has an equal chance of selection in the sample. From that approach, we can calculate a lot of things about margins of error, sample sizes, etc. etc.

This should be understood as an "assume a spherical cow" approach: it's a useful starting point, but only a starting point. In reality, serious socioeconomic research is almost never based on a pure SRS design and assumptions/rules of thumb drawn from that Stats 101 understanding may not be reliable for assessing it.

I recommend Särndal, Swensson and Wretman's "Model Assisted Survey Sampling" for a solid overview of more complex survey design and analysis methods. See in particular the sections on sampling frames (& multiple frames), stratified sampling & optimal sample allocation, and regression/post-stratification errors. Please take MASS as a general cite for most of the following discussion.

Caveat: MASS concentrates on means (averages) rather than quantile statistics (medians/quartiles/etc.) However, most of the methods presented there have obvious parallels for working with quantiles.

Some of the issues that have been raised:

Is the sample representative?

The real question here is "should it be?"

When you're trying to describe a population as a whole, "representative sampling" is often a reasonable approach. (Loosely speaking: attempting to ensure that the statistical distribution within your sample is roughly the same as in the real population.) Simple random sampling is one way that people attempt to achieve a representative sample, and there are non-SRS options for this purpose too e.g. systematic sampling and some implementations of stratified sampling.

The mathematical analysis is then easy; if you believe that the sample is a representative microcosm of the full population, you can just take the mean and median of the sample and use that as an estimate for the population. This isn't always the most efficient approach possible, but for whole-population stats it's not bad.

However, when you're interested in sub-populations e.g. different racial groups, it's another story.

Suppose we want to explore questions like "are White households more affluent than Black households?" This usually requires estimating typical affluence for each group, typical ranges (e.g. "90% of households have more than $X net worth, and 10% have more than $Y"), and then comparing the two.

Problem here is that for any given group, the accuracy of those estimates is affected by sample size. If we're surveying a population that's 90% White and 10% Black, and we use a "representative" sample, we will end up with 9x as many White households as Black ones. That will give us very good estimates for the characteristics of White households but much poorer estimates for Black ones, making it difficult to compare the two.

One option is just to increase the sample size so that we get enough people/households from the smaller sub-populations, while getting even more from the common ones. This is mathematically nice but it's prohibitively expensive for anything that requires trained interviewers, as economic data usually does.

A more practical alternative is oversampling: one way or another, you adjust your survey design to select a higher proportion of the smaller sub-populations. The Boston study has done this:

Various sampling techniques were used to locate and identify an
ethnically plural sample consisting of the specifically defined ethnic
groups. The techniques included the following: directory-listed
landline samples targeted to census tracts where specific ethnic
groups were known to reside; cell phone random digit dialing samples
drawn from rate centers that covered the targeted ethnic group ZIP
codes; samples drawn from targeted ZIP codes on the basis of billing
address; and the use of surname-based lists targeting specific
national origin groups.

A naïve approach to this can introduce a great deal of bias. If I try to get a sample of Asian households solely by going to a district known for having a high Asian population, obviously I will miss the Asian households outside that district, who may have very different characteristics. However, there are well-known methods for avoiding this problem.

One common method, discussed in MASS, is pi-weighting (Horvitz-Thompson): make sure that everybody (or almost everybody) has some chance of selection, and then factor in that probability of selection when analysing the results. For instance, I might sample 1/100th of the households in District A, known to have a relatively high Asian population, but only 1/500th of the households in District B. The results are then "weighted up" so that each household sampled from District B has 5x as much influence on the results as each household from District A, cancelling out the bias that we would otherwise introduce by sampling more heavily from A.

Here is a peer-reviewed paper that I co-authored some years back, discussing an example of how these concepts were applied in designing an Australian survey, if anybody wants to see a bit more mathematical crunch. (Caveat that this is a design paper, which uses Horvitz-Thompson as a simplified version of the weighting method in order to keep the maths manageable; the actual data analysis post-survey used a somewhat more complex benchmarking approach.) A couple of things to take away from that:

Even a simplification of the method makes for a very dense paper that takes a long time to write and will probably only be read by a handful of people. I would love it if everybody published the full details of their weighting methods, but it may not be easy to find the time.

Nevertheless, these methods are well known and used by people who do socioeconomic research - we weren't the first and won't be the last.

The Boston survey introduces an additional complication: multiple "frames" meaning that some households may have more than one path to selection (e.g. some households might be on a list of names and in a targeted zip code). Even when a household doesn't actually get selected twice, this affects probability of selection and needs to be factored into the analysis. But again, this is a well-known issue and there are standard methods for dealing with it - see Särndal et al on "multiple frames".

The paper mentions that weighting was used, but doesn't go into much detail. Without more detail on the methods used, obviously I can't verify for myself that the researchers have handled these issues correctly. But I also don't see anything to signal that they didn't.

The acknowledgements mention methodological advice from Marcin Hitczenko and Kobi Abayomi and review by Tatjana Meschede, among others. From a quick look at their CVs, I would be astonished if those reviewers/contributors were unaware of this kind of issue or the standard methods for dealing with it.

Is the sample size large enough?

This is addressed at several points in the paper. The paper reports statistical significance thresholds (p-values), which are effectively a statistician's way of answering the question "given the sample sizes, how unlikely is it that random noise alone would be able to create this apparent difference?"

In most cases, the answer as reported in the paper is "pretty unlikely". In particular, Table 9 shows that the median net worth for White households is higher than every other group at the 99% significance threshold, excepting Asian and Cape Verdeans where the sample sizes were too small.

The sample sizes for individual racial groups are not huge (78 White households, 71 US Black). In general, a larger sample gives more accurate estimates, but the question of "how much is enough?" depends very much on the specifics of the problem: what the variable of interest is, how it's distributed among the populations of interest, how it is to be estimated, etc. etc.

As the paper acknowledges, this can make it hard to detect meaningful differences even when they do exist, and they have flagged some cases/groups where sample sizes were too small to be useful, especially when trying to break results down by age.

But I'm not aware of any statistical rule that says it's impossible to obtain serviceable estimates for a comparison between samples of 78 and 71, in this particular context, and I very much doubt it exists.

In other words: the authors have asserted that the sample size is large enough to be pretty confident that White households have higher net worth than Black households. They haven't shown the full working for that, so you'll have to make your own judgements about whether you trust their professional chops, but it is what they're asserting.

As noted in Bryan Krause's answer, the paper doesn't give quantiles other than the median, which makes it difficult to interpret exactly how meaningful the difference is. I think it almost certainly is meaningful and large - one would have to hypothesise some very weird income distributions for it to be otherwise - but without that IQR-type data it's impossible to be sure.

Possibility of non-response bias

The paper reports:

For the NASCC project in general, about 70,000 personalized advanced letters were sent, 87,000 telephone
numbers dialed 448,000 times, and 12,113 interviewer hours were spent
across three shops to conduct 2,746 completed surveys.

This is pretty common for research of this kind. Some people are hard to contact, some don't want to be interviewed or just don't have the time.

This is a perennial concern for survey-based researchers. Obviously it reduces the sample size, but that can be factored in; if you know only 5% of households are going to respond, you budget for 20x as many contact attempts.

The challenging part is estimating what it does to the selection probabilities. Working singles are harder to contact than stay-at-home parents or retirees, and each of those groups will have different selection probabilities.

One approach to this problem, discussed in MASS and elsewhere, is benchmarking, aka post-stratification: you compare the demographic distribution of your sample to some estimates of what the population demographics should be (e.g. Census data), and use that to weight up the groups that have effectively been undersampled due to non-response. There's also regression weighting, an extension of the same principle, and various other methods.

The Boston paper appears to have done something along those lines:

The statistics in the sample used weights based on family
characteristics in the U.S. Census Bureau’s American Community Survey
to generate results representative of specific ethnic group characteristics in the respondent’s metropolitan area of residence. Overall,
the results computed from the unweighted NASCC sample are not
dissimilar from those using the weight- ed NASCC sample, suggesting
that the specific ethnic group observations in the metropolitan
areas covered by the study were fairly representative of their
populations at large.

It's always tough to tell exactly how effective post-strat methods are in neutralising non-response bias, but they seem to have done what they could.

Overall, the report is a bit light on detail for my tastes. It would be nice to have more information about selection/weighting methods and on the distribution of net worth beyond just the medians. But what has been published looks consistent with competent researchers following standard and defensible methods for this kind of work.

score 1 · Accepted Answer · 2019-07-30 00:58:21Z

The existing answer covers the details of the associated study so I won't rehash that all here; the question already covers the main result as well and is mostly asking about methodology.

Sample size and significance

In the comments there is a long discussion about sample size. As discussed in those comments, the sample size is not particularly large, but also the median is less susceptible to outliers.

A very important thing to realize with small sample size is the relationship with power. The authors report statistically significant differences in net worth and income (see the original study here). A statistically significant result with a smaller sample size is actually potentially indicative of a larger effect.

In a very large sample, you can risk declaring significant effects that are not actually important (for example, if you sampled 7 billion people you could perhaps detect a financial net worth difference of 1 cent and then have to retract the study when someone in the control group picks up a nickel). In a small sample, you instead risk making a type II error: failing to detect a real difference. Therefore, you should not criticize a small study that shows a significant effect for being "underpowered": that argument is nonsense.

Karl Friston points this out in one of his "ironic rules" for reviewers in a quite amusing exchange.

Sample size and estimation

That said, the original question is more about the actual numbers reported, and here is a place one could quibble a bit. I see no reason to doubt the actual numbers reported (and again, the other answer already stepped through the methods). However, what is lacking from the report is some measure of the distribution besides the median. Given that they are reporting medians, an appropriate statistic would be something like an interquartile range, or simply presenting the whole distributions as a histogram, pdf, or cdf.

For example, if the IQR for whites is -$100,000 to $1,000,000, and for blacks it is -$110,000 to $900,000, one could still observe the same difference in medians, but the context is different than if the ranges were $50,000 to $400,000 versus -$50,000 to +$50,000. In that case, you'd see that at least 75% of blacks have less net worth than at least 75% of whites.

Additionally, the precision of the measurements for a given sample size depend on these ranges, so without this information it is difficult to contextualize the results properly.

Demographics

The study has some limited demographic information; median age was 55 in both black and white groups. Education was lower in the black group, but that's expected from broader demographics and could certainly relate to the net worth differences, but the authors found similar differences considering only those with a bachelor's degree or only those 51-65 years old:

The net worth differences of whites and blacks were remarkable even when level of education or age were considered. Median wealth among black households that have a bachelor’s
degree or higher ($12,000) was 4 percent of the median for white households ($313,500).
Similarly, if we look at households in the 51- to 65-year-old bracket, the typical white
household holds $311,000 in wealth, compared with only $4,000 for the typical black
household (Table 10).

Summary

Does this study reliably show that whites have greater median net worth than blacks? Yes, it does, and the small sample size does not matter.

Do the actual numbers $250k and $8 reflect the actual distributions? We don't have enough information.

"the median is less susceptible to outliers." This may be true, but should be cited. You hardly address the possibility that this is an apples to oranges comparison, where the median black respondent is a college student and the median white respondent is a fifty-year-old homeowner. You also don't show that the study found statistically significant results. This should be supported with at least a quotation from the study, not a raw assertion. — Jul 30 at 0:50
@Brythan Mean versus median susceptibility to outliers is not really something that needs to be cited, it's a common knowledge fact and can be demonstrated by taking a sample from a normal distribution, adding an outlier, and comparing the change to mean versus median. I can update with demographics; the median age is 55 in both groups, and add a link to the study. — Jul 30 at 0:54
But wealth is not normally distributed. So why are you making (still without citation) a statistical claim based on different circumstances? And your claim about the median age is directly contradicted by the study: "In general, nonwhites in the survey were younger and had much lower educational attainment rates." — Jul 30 at 4:23
@Brythan It doesn't matter. Try it with any distribution. It's hard to find a citation that doesn't just state this because it's very very simple. Google "median outliers" or something. The statement you quote has to do with all nonwhites, not blacks. Blacks and whites had the same median age. It's in a table (Table 1). — Jul 30 at 4:49
@Brythan If you have a list like that then neither the mean or median is actually informative and measures of central tendency don't apply. Wealth data don't look anything like that. I have no idea what you mean by "And I think that you're missing the real point. We're not talking about adding one entry to the list. We're talking about adding most of a thousand." In any case, I don't plan to convince you, so I'll stop the discussion now. — Jul 30 at 5:45

BrythanBrythan 9,3005 gold badges40 silver badges52 bronze badges · Accepted Answer · 2019-07-30 05:11:30Z

If you read the methodology (PDF) portion, there are some questionable practices.

directory-listed landline samples targeted to census tracts where
specific ethnic groups were known to reside

But census tracts are also highly correlated with wealth. From Area characteristics and individual-level socioeconomic position indicators in three population-based epidemiologic studies.

[...] blacks were much more likely to live in more disadvantaged areas than whites.

and

Differences in the residential environments of blacks and whites may need to be taken into account in interpreting race differences in epidemiologic studies.

So what the referenced study did, is they looked for blacks in poor neighborhoods and looked for whites in richer neighborhoods. Unsurprisingly, they found poor blacks and richer whites.

Back to the methodology:

cell phone random digit dialing samples drawn from rate centers that covered the targeted ethnic group ZIP codes; samples drawn from targeted ZIP codes on the basis of billing address

And once again, ZIP code is highly correlated with wealth.

And the last part of the methodology:

the use of surname-based lists targeting specific national origin groups.

But this is going to find more recent immigrants. Because the typical descendants of slaves have no idea what their African ancestors' surnames were. So many use the same surnames as whites, with some variation. For example, White, Washington, and Lincoln are particularly common.

None of these methods produce random samples from the overall black and white population. So when you see claims of statistical significance, remember that what is being claimed is that people found in particular census tracts or zip codes or having particular surnames are being compared to other people with different census tracts, zip codes, or surnames. Not random samples of the overall population.

Of course, none of this means that the claim is not true. It only means that this approach did not prove it.

The study does not claim to compare blacks and whites in the same neighborhood with the same socioeconomic status. It's comparing them within a city. It would be very selective and misleading to not consider in your sampling that different racial groups live in different neighborhoods. — Jul 30 at 5:46
"It would be very selective and misleading to not consider in your sampling that different racial groups live in different neighborhoods." I disagree. Proper sampling would have ignored neighborhoods altogether and just randomly picked people from the overall pool, determining race later. Then we would have been able to look at the random sample and see if it was skewed by comparing it to the characteristics of the pool. As is, we don't know if the sample is representative of the overall population. — Jul 30 at 21:16
What you're talking about here is oversampling, which is a common technique when trying to get usable data about minority groups. (Otherwise you end up sampling 10x as many people as you actually need, in order to get just enough of the smaller groups.) A naïve analysis of oversampled data will produce significant bias, for the reasons you mention, but there are established methods for preventing that bias - basically, you "weight up" people who had a smaller chance of selection. See here for an example. (1/2) — Jul 31 at 21:59
Unfortunately, there's not very much info in the paper about the detail of the analysis, so it's not clear to me how they have/haven't adjusted for oversampling considerations. (I would certainly hope that they had!) (2/2) — Jul 31 at 22:03

搜尋此網誌

Ttdfjt

4 Answers
4

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Is the sample representative?

Is the sample size large enough?

Possibility of non-response bias

4 Answers
4

4 Answers
4

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Is the sample representative?

Is the sample size large enough?

Possibility of non-response bias

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Is the sample representative?

Is the sample size large enough?

Possibility of non-response bias

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Is the sample representative?

Is the sample size large enough?

Possibility of non-response bias

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Is the sample representative?

Is the sample size large enough?

Possibility of non-response bias

Popular posts from this blog

Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

Log in Navigation menu

4 Answers 4

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Is the sample representative?

Is the sample size large enough?

Possibility of non-response bias

4 Answers 4

4 Answers 4

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Is the sample representative?

Is the sample size large enough?

Possibility of non-response bias

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Is the sample representative?

Is the sample size large enough?

Possibility of non-response bias

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Is the sample representative?

Is the sample size large enough?

Possibility of non-response bias

TLDR: The study doesn't give enough detail to fully assess the methods, but what I can see is consistent with what I'd expect from a decent survey.

(That is: I'm not saying it's right, but I don't see any obvious signs that it's wrong.)

Is the sample representative?

Is the sample size large enough?

Possibility of non-response bias

Popular posts from this blog

Grendel Contents Story Scholarship Depictions Notes References Navigation menu10.1093/notesj/gjn112Berserkeree

Log in Navigation menu

4 Answers
4

4 Answers
4

4 Answers
4