Rank groups within a grouped sequence of TRUE/FALSE and NA The 2019 Stack Overflow Developer Survey Results Are InGrouping functions (tapply, by, aggregate) and the *apply familyCharacters counting and subletting specific patternsWhat is the purpose of setting a key in data.table?data.table vs dplyr: can one do something well the other can't or does poorly?how to make a bar plot for a list of dataframes?How to group by unique values in a list in RPandas - Alternative to rank() function that gives unique ordinal ranks for a columnRank within group in for loop in RData transformation: from dyadic to observational data in RGetting map from purrr to work with paste0
Did Scotland spend $250,000 for the slogan "Welcome to Scotland"?
How to translate "being like"?
Is it safe to harvest rainwater that fell on solar panels?
How to notate time signature switching consistently every measure
Accepted by European university, rejected by all American ones I applied to? Possible reasons?
Why does the nucleus not repel itself?
Correct punctuation for showing a character's confusion
The difference between dialogue marks
Star Trek - X-shaped Item on Regula/Orbital Office Starbases
How can I add encounters in the Lost Mine of Phandelver campaign without giving PCs too much XP?
What do hard-Brexiteers want with respect to the Irish border?
Can we generate random numbers using irrational numbers like π and e?
What does Linus Torvalds mean when he says that Git "never ever" tracks a file?
Deal with toxic manager when you can't quit
How to support a colleague who finds meetings extremely tiring?
Ubuntu Server install with full GUI
Why doesn't UInt have a toDouble()?
Getting crown tickets for Statue of Liberty
Is it a good practice to use a static variable in a Test Class and use that in the actual class instead of Test.isRunningTest()?
What to do when moving next to a bird sanctuary with a loosely-domesticated cat?
Is it ethical to upload a automatically generated paper to a non peer-reviewed site as part of a larger research?
Can I have a signal generator on while it's not connected?
Does HR tell a hiring manager about salary negotiations?
Does adding complexity mean a more secure cipher?
Rank groups within a grouped sequence of TRUE/FALSE and NA
The 2019 Stack Overflow Developer Survey Results Are InGrouping functions (tapply, by, aggregate) and the *apply familyCharacters counting and subletting specific patternsWhat is the purpose of setting a key in data.table?data.table vs dplyr: can one do something well the other can't or does poorly?how to make a bar plot for a list of dataframes?How to group by unique values in a list in RPandas - Alternative to rank() function that gives unique ordinal ranks for a columnRank within group in for loop in RData transformation: from dyadic to observational data in RGetting map from purrr to work with paste0
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
Thanks for the help!
r dplyr data.table rank
|
show 1 more comment
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
Thanks for the help!
r dplyr data.table rank
1
you can just about grab what you need with this work of beauty;as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group
– user20650
yesterday
that is a really funny solution. Very good job!
– Humpelstielzchen
yesterday
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
yesterday
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
yesterday
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-bygroup
or consider the discontinuity ingroup
.
– smci
yesterday
|
show 1 more comment
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
Thanks for the help!
r dplyr data.table rank
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
Thanks for the help!
r dplyr data.table rank
r dplyr data.table rank
edited yesterday
Humpelstielzchen
asked yesterday
HumpelstielzchenHumpelstielzchen
1,3801318
1,3801318
1
you can just about grab what you need with this work of beauty;as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group
– user20650
yesterday
that is a really funny solution. Very good job!
– Humpelstielzchen
yesterday
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
yesterday
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
yesterday
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-bygroup
or consider the discontinuity ingroup
.
– smci
yesterday
|
show 1 more comment
1
you can just about grab what you need with this work of beauty;as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group
– user20650
yesterday
that is a really funny solution. Very good job!
– Humpelstielzchen
yesterday
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
yesterday
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
yesterday
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-bygroup
or consider the discontinuity ingroup
.
– smci
yesterday
1
1
you can just about grab what you need with this work of beauty;
as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group– user20650
yesterday
you can just about grab what you need with this work of beauty;
as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group– user20650
yesterday
that is a really funny solution. Very good job!
– Humpelstielzchen
yesterday
that is a really funny solution. Very good job!
– Humpelstielzchen
yesterday
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
yesterday
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
yesterday
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
yesterday
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
yesterday
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by
group
or consider the discontinuity in group
.– smci
yesterday
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by
group
or consider the discontinuity in group
.– smci
yesterday
|
show 1 more comment
4 Answers
4
active
oldest
votes
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
1
Tried withrleid
but didn't get it to work. (+1)
– markus
yesterday
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
yesterday
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x)
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
do.call(rbind,
lapply(split(df, df$group), function(i)i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal :=
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
yesterday
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
yesterday
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
1
Tried withrleid
but didn't get it to work. (+1)
– markus
yesterday
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
yesterday
add a comment |
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
1
Tried withrleid
but didn't get it to work. (+1)
– markus
yesterday
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
yesterday
add a comment |
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
answered yesterday
chinsoon12chinsoon12
9,93611420
9,93611420
1
Tried withrleid
but didn't get it to work. (+1)
– markus
yesterday
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
yesterday
add a comment |
1
Tried withrleid
but didn't get it to work. (+1)
– markus
yesterday
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
yesterday
1
1
Tried with
rleid
but didn't get it to work. (+1)– markus
yesterday
Tried with
rleid
but didn't get it to work. (+1)– markus
yesterday
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
yesterday
works for me. And seems to be the most elegant answer.
– Humpelstielzchen
yesterday
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
answered yesterday
Ronak ShahRonak Shah
46.3k104268
46.3k104268
add a comment |
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x)
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
do.call(rbind,
lapply(split(df, df$group), function(i)i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x)
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
do.call(rbind,
lapply(split(df, df$group), function(i)i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x)
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
do.call(rbind,
lapply(split(df, df$group), function(i)i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x)
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
do.call(rbind,
lapply(split(df, df$group), function(i)i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
edited yesterday
answered yesterday
SotosSotos
31.4k51741
31.4k51741
add a comment |
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal :=
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
yesterday
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
yesterday
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
yesterday
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal :=
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
yesterday
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
yesterday
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
yesterday
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal :=
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal :=
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
edited yesterday
answered yesterday
markusmarkus
15.5k11336
15.5k11336
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
yesterday
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
yesterday
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
yesterday
add a comment |
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
yesterday
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
yesterday
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
yesterday
Wow, impressive. Thanks for introducing me to
rle
and inverse.rle
. Gruß nach Leipzig.– Humpelstielzchen
yesterday
Wow, impressive. Thanks for introducing me to
rle
and inverse.rle
. Gruß nach Leipzig.– Humpelstielzchen
yesterday
1
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
yesterday
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
yesterday
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
yesterday
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
yesterday
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
you can just about grab what you need with this work of beauty;
as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group– user20650
yesterday
that is a really funny solution. Very good job!
– Humpelstielzchen
yesterday
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
yesterday
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
yesterday
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by
group
or consider the discontinuity ingroup
.– smci
yesterday