GroupBy operation using an entire dataframe to group valuesDoes Python have a ternary conditional operator?How do I sort a dictionary by value?Using group by on multiple columnsPeak detection in a 2D arraySelect first row in each GROUP BY group?Group by in LINQConverting a Pandas GroupBy output from Series to DataFrameDelete column from pandas DataFrameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas
What's a opened solder bridge signifies?
What do you call the action of "describing events as they happen" like sports anchors do?
Why is gun control associated with the socially liberal Democratic party?
Arrows inside a commutative diagram using tikzcd
How can this shape perfectly cover a cube?
Approach sick days in feedback meeting
Why is Skinner so awkward in Hot Fuzz?
What publication claimed that Michael Jackson died in a nuclear holocaust?
I sent an angry e-mail to my interviewers about a conflict at my home institution. Could this affect my application?
Why is it bad to use your whole foot in rock climbing
Harley Davidson clattering noise from engine, backfire and failure to start
How can I find out about the game world without meta-influencing it?
Is there a term for someone whose preferred policies are a mix of Left and Right?
How to represent jealousy in a cute way?
ISP is not hashing the password I log in with online. Should I take any action?
Boss making me feel guilty for leaving the company at the end of my internship
Parallelized for loop in Bash
Must a CPU have a GPU if the motherboard provides a display port (when there isn't any separate video card)?
Fastest way from 10 to 1 with everyone in between
What does this circuit symbol mean?
Has JSON.serialize suppressApexObjectNulls ever worked?
What's the reason for the decade jump in the recent X-Men trilogy?
Jam with honey & without pectin has a saucy consistency always
Does every chapter have to "blow the reader away" so to speak?
GroupBy operation using an entire dataframe to group values
Does Python have a ternary conditional operator?How do I sort a dictionary by value?Using group by on multiple columnsPeak detection in a 2D arraySelect first row in each GROUP BY group?Group by in LINQConverting a Pandas GroupBy output from Series to DataFrameDelete column from pandas DataFrameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have 2 dataframes like this...
np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))
I'd like to find the average of values in a for the 4 groups in b.
This...
a[b==1].sum().sum() / a[b==1].count().sum()
...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.
My expected result is
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
Thanks.
python pandas group-by pandas-groupby
add a comment |
I have 2 dataframes like this...
np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))
I'd like to find the average of values in a for the 4 groups in b.
This...
a[b==1].sum().sum() / a[b==1].count().sum()
...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.
My expected result is
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
Thanks.
python pandas group-by pandas-groupby
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
Jun 6 at 15:09
add a comment |
I have 2 dataframes like this...
np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))
I'd like to find the average of values in a for the 4 groups in b.
This...
a[b==1].sum().sum() / a[b==1].count().sum()
...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.
My expected result is
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
Thanks.
python pandas group-by pandas-groupby
I have 2 dataframes like this...
np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))
I'd like to find the average of values in a for the 4 groups in b.
This...
a[b==1].sum().sum() / a[b==1].count().sum()
...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.
My expected result is
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
Thanks.
python pandas group-by pandas-groupby
python pandas group-by pandas-groupby
edited Jun 6 at 16:21
cs95
152k26200270
152k26200270
asked Jun 6 at 15:03
MJSMJS
5381819
5381819
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
Jun 6 at 15:09
add a comment |
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
Jun 6 at 15:09
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
Jun 6 at 15:09
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
Jun 6 at 15:09
add a comment |
2 Answers
2
active
oldest
votes
You can stack then groupby two Series
a.stack().groupby(b.stack()).mean()
add a comment |
If you want a fast numpy solution, use np.unique and np.bincount:
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
# array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])
To construct a Series, use
pd.Series(np.bincount(i, c) / cnt, index=u)
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
For comparison, stack returns,
a.stack().groupby(b.stack()).mean()
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
%timeit a.stack().groupby(b.stack()).mean()
%%timeit
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3
Great answer, worth noting that this will fail if you don't have every group from1-npresent. I think a fix would be something likef = np.ones(u.max()), and thenf[u-1] = cto divide by that instead
– user3483203
Jun 6 at 15:24
3
@user3483203 That's true. In that case we'd have to call bincount onpd.factorize(b.values.ravel())[0]and proceed as planned!
– cs95
Jun 6 at 15:25
2
You can safeguard withreturn_inverse...u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
Jun 6 at 15:57
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56480318%2fgroupby-operation-using-an-entire-dataframe-to-group-values%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can stack then groupby two Series
a.stack().groupby(b.stack()).mean()
add a comment |
You can stack then groupby two Series
a.stack().groupby(b.stack()).mean()
add a comment |
You can stack then groupby two Series
a.stack().groupby(b.stack()).mean()
You can stack then groupby two Series
a.stack().groupby(b.stack()).mean()
answered Jun 6 at 15:05
WeNYoBenWeNYoBen
140k84878
140k84878
add a comment |
add a comment |
If you want a fast numpy solution, use np.unique and np.bincount:
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
# array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])
To construct a Series, use
pd.Series(np.bincount(i, c) / cnt, index=u)
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
For comparison, stack returns,
a.stack().groupby(b.stack()).mean()
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
%timeit a.stack().groupby(b.stack()).mean()
%%timeit
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3
Great answer, worth noting that this will fail if you don't have every group from1-npresent. I think a fix would be something likef = np.ones(u.max()), and thenf[u-1] = cto divide by that instead
– user3483203
Jun 6 at 15:24
3
@user3483203 That's true. In that case we'd have to call bincount onpd.factorize(b.values.ravel())[0]and proceed as planned!
– cs95
Jun 6 at 15:25
2
You can safeguard withreturn_inverse...u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
Jun 6 at 15:57
add a comment |
If you want a fast numpy solution, use np.unique and np.bincount:
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
# array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])
To construct a Series, use
pd.Series(np.bincount(i, c) / cnt, index=u)
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
For comparison, stack returns,
a.stack().groupby(b.stack()).mean()
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
%timeit a.stack().groupby(b.stack()).mean()
%%timeit
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3
Great answer, worth noting that this will fail if you don't have every group from1-npresent. I think a fix would be something likef = np.ones(u.max()), and thenf[u-1] = cto divide by that instead
– user3483203
Jun 6 at 15:24
3
@user3483203 That's true. In that case we'd have to call bincount onpd.factorize(b.values.ravel())[0]and proceed as planned!
– cs95
Jun 6 at 15:25
2
You can safeguard withreturn_inverse...u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
Jun 6 at 15:57
add a comment |
If you want a fast numpy solution, use np.unique and np.bincount:
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
# array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])
To construct a Series, use
pd.Series(np.bincount(i, c) / cnt, index=u)
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
For comparison, stack returns,
a.stack().groupby(b.stack()).mean()
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
%timeit a.stack().groupby(b.stack()).mean()
%%timeit
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
If you want a fast numpy solution, use np.unique and np.bincount:
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
# array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])
To construct a Series, use
pd.Series(np.bincount(i, c) / cnt, index=u)
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
For comparison, stack returns,
a.stack().groupby(b.stack()).mean()
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
%timeit a.stack().groupby(b.stack()).mean()
%%timeit
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
edited Jun 6 at 16:07
answered Jun 6 at 15:13
cs95cs95
152k26200270
152k26200270
3
Great answer, worth noting that this will fail if you don't have every group from1-npresent. I think a fix would be something likef = np.ones(u.max()), and thenf[u-1] = cto divide by that instead
– user3483203
Jun 6 at 15:24
3
@user3483203 That's true. In that case we'd have to call bincount onpd.factorize(b.values.ravel())[0]and proceed as planned!
– cs95
Jun 6 at 15:25
2
You can safeguard withreturn_inverse...u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
Jun 6 at 15:57
add a comment |
3
Great answer, worth noting that this will fail if you don't have every group from1-npresent. I think a fix would be something likef = np.ones(u.max()), and thenf[u-1] = cto divide by that instead
– user3483203
Jun 6 at 15:24
3
@user3483203 That's true. In that case we'd have to call bincount onpd.factorize(b.values.ravel())[0]and proceed as planned!
– cs95
Jun 6 at 15:25
2
You can safeguard withreturn_inverse...u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
Jun 6 at 15:57
3
3
Great answer, worth noting that this will fail if you don't have every group from
1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead– user3483203
Jun 6 at 15:24
Great answer, worth noting that this will fail if you don't have every group from
1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead– user3483203
Jun 6 at 15:24
3
3
@user3483203 That's true. In that case we'd have to call bincount on
pd.factorize(b.values.ravel())[0] and proceed as planned!– cs95
Jun 6 at 15:25
@user3483203 That's true. In that case we'd have to call bincount on
pd.factorize(b.values.ravel())[0] and proceed as planned!– cs95
Jun 6 at 15:25
2
2
You can safeguard with
return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)– piRSquared
Jun 6 at 15:57
You can safeguard with
return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)– piRSquared
Jun 6 at 15:57
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56480318%2fgroupby-operation-using-an-entire-dataframe-to-group-values%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
Jun 6 at 15:09