Century handling in Pandas Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersHow to populate Pandas datetime64[ns] values into MS AccessType of Series (Pandas) in to_datetime() method
Weaponising the Grasp-at-a-Distance spell
Why is the change of basis formula counter-intuitive? [See details]
How much damage would a cupful of neutron star matter do to the Earth?
Random body shuffle every night—can we still function?
What is the difference between CTSS and ITS?
New Order #6: Easter Egg
Resize vertical bars (absolute-value symbols)
Why are vacuum tubes still used in amateur radios?
Is openssl rand command cryptographically secure?
One-one communication
Nose gear failure in single prop aircraft: belly landing or nose-gear up landing?
Trying to understand entropy as a novice in thermodynamics
"klopfte jemand" or "jemand klopfte"?
A term for a woman complaining about things/begging in a cute/childish way
Special flights
Getting out of while loop on console
After Sam didn't return home in the end, were he and Al still friends?
What is the "studentd" process?
How do living politicians protect their readily obtainable signatures from misuse?
Found this skink in my tomato plant bucket. Is he trapped? Or could he leave if he wanted?
What are the main differences between Stargate SG-1 cuts?
Why complex landing gears are used instead of simple,reliability and light weight muscle wire or shape memory alloys?
Did any compiler fully use 80-bit floating point?
Asymptotics question
Century handling in Pandas
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersHow to populate Pandas datetime64[ns] values into MS AccessType of Series (Pandas) in to_datetime() method
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have following data in one of my columns:
df['DOB']
0 01-01-84
1 31-07-85
2 24-08-85
3 30-12-93
4 09-12-77
5 08-09-90
6 01-06-88
7 04-10-89
8 15-11-91
9 01-06-68
Name: DOB, dtype: object
I want to convert this to a datatype column.
I tried following:
print(pd.to_datetime(df1['Date.of.Birth']))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 2068-01-06
Name: DOB, dtype: datetime64[ns]
What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance
python pandas
add a comment |
I have following data in one of my columns:
df['DOB']
0 01-01-84
1 31-07-85
2 24-08-85
3 30-12-93
4 09-12-77
5 08-09-90
6 01-06-88
7 04-10-89
8 15-11-91
9 01-06-68
Name: DOB, dtype: object
I want to convert this to a datatype column.
I tried following:
print(pd.to_datetime(df1['Date.of.Birth']))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 2068-01-06
Name: DOB, dtype: datetime64[ns]
What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance
python pandas
add a comment |
I have following data in one of my columns:
df['DOB']
0 01-01-84
1 31-07-85
2 24-08-85
3 30-12-93
4 09-12-77
5 08-09-90
6 01-06-88
7 04-10-89
8 15-11-91
9 01-06-68
Name: DOB, dtype: object
I want to convert this to a datatype column.
I tried following:
print(pd.to_datetime(df1['Date.of.Birth']))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 2068-01-06
Name: DOB, dtype: datetime64[ns]
What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance
python pandas
I have following data in one of my columns:
df['DOB']
0 01-01-84
1 31-07-85
2 24-08-85
3 30-12-93
4 09-12-77
5 08-09-90
6 01-06-88
7 04-10-89
8 15-11-91
9 01-06-68
Name: DOB, dtype: object
I want to convert this to a datatype column.
I tried following:
print(pd.to_datetime(df1['Date.of.Birth']))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 2068-01-06
Name: DOB, dtype: datetime64[ns]
What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance
python pandas
python pandas
asked 2 days ago
MadanMadan
5514
5514
add a comment |
add a comment |
5 Answers
5
active
oldest
votes
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
2 days ago
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
2 days ago
@jezrael Yup, will edit question to specify that clearly
– gmds
2 days ago
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
2 days ago
add a comment |
You can first convert to datetimes and if years are above or equal 2020
then subtract 100
years created by DateOffset
:
df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Or you can add 19
or 20
to years by Series.str.replace
and set valuies by numpy.where
with condition.
Notice: Solution working also for years 00
for 2000
, up to 2020
.
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
If all years are below 2000
:
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
2 days ago
@Madan - first convert values to datetimes and then if some years is higher as2020
subtract 100 years withdateoffset
– jezrael
2 days ago
add a comment |
Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:
from datetime import datetime, date
df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))
add a comment |
In general (in case of uncertainty), it would be better to explicitly specify the year:
pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
I ran this with the following data frame:
0 1
0 0 01-01-84
1 1 31-07-85
2 2 24-08-85
3 3 30-12-93
4 4 09-12-77
5 5 08-09-90
6 6 01-06-88
7 7 04-10-89
8 8 15-11-91
9 9 01-06-68
pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
Name: 1, dtype: datetime64[ns]
add a comment |
You can use the code below if there are only 19
and 20
as starts, like:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))
And if there are no 20
s anywhere else:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))
And now:
print(df['DOB'])
Is:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55739779%2fcentury-handling-in-pandas%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
2 days ago
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
2 days ago
@jezrael Yup, will edit question to specify that clearly
– gmds
2 days ago
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
2 days ago
add a comment |
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
2 days ago
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
2 days ago
@jezrael Yup, will edit question to specify that clearly
– gmds
2 days ago
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
2 days ago
add a comment |
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
edited 2 days ago
answered 2 days ago
gmdsgmds
6,989830
6,989830
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
2 days ago
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
2 days ago
@jezrael Yup, will edit question to specify that clearly
– gmds
2 days ago
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
2 days ago
add a comment |
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
2 days ago
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
2 days ago
@jezrael Yup, will edit question to specify that clearly
– gmds
2 days ago
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
2 days ago
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
2 days ago
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
2 days ago
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
2 days ago
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
2 days ago
@jezrael Yup, will edit question to specify that clearly
– gmds
2 days ago
@jezrael Yup, will edit question to specify that clearly
– gmds
2 days ago
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
2 days ago
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
2 days ago
add a comment |
You can first convert to datetimes and if years are above or equal 2020
then subtract 100
years created by DateOffset
:
df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Or you can add 19
or 20
to years by Series.str.replace
and set valuies by numpy.where
with condition.
Notice: Solution working also for years 00
for 2000
, up to 2020
.
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
If all years are below 2000
:
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
2 days ago
@Madan - first convert values to datetimes and then if some years is higher as2020
subtract 100 years withdateoffset
– jezrael
2 days ago
add a comment |
You can first convert to datetimes and if years are above or equal 2020
then subtract 100
years created by DateOffset
:
df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Or you can add 19
or 20
to years by Series.str.replace
and set valuies by numpy.where
with condition.
Notice: Solution working also for years 00
for 2000
, up to 2020
.
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
If all years are below 2000
:
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
2 days ago
@Madan - first convert values to datetimes and then if some years is higher as2020
subtract 100 years withdateoffset
– jezrael
2 days ago
add a comment |
You can first convert to datetimes and if years are above or equal 2020
then subtract 100
years created by DateOffset
:
df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Or you can add 19
or 20
to years by Series.str.replace
and set valuies by numpy.where
with condition.
Notice: Solution working also for years 00
for 2000
, up to 2020
.
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
If all years are below 2000
:
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
You can first convert to datetimes and if years are above or equal 2020
then subtract 100
years created by DateOffset
:
df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Or you can add 19
or 20
to years by Series.str.replace
and set valuies by numpy.where
with condition.
Notice: Solution working also for years 00
for 2000
, up to 2020
.
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
If all years are below 2000
:
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
edited 2 days ago
answered 2 days ago
jezraeljezrael
361k26327407
361k26327407
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
2 days ago
@Madan - first convert values to datetimes and then if some years is higher as2020
subtract 100 years withdateoffset
– jezrael
2 days ago
add a comment |
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
2 days ago
@Madan - first convert values to datetimes and then if some years is higher as2020
subtract 100 years withdateoffset
– jezrael
2 days ago
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
2 days ago
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
2 days ago
@Madan - first convert values to datetimes and then if some years is higher as
2020
subtract 100 years with dateoffset
– jezrael
2 days ago
@Madan - first convert values to datetimes and then if some years is higher as
2020
subtract 100 years with dateoffset
– jezrael
2 days ago
add a comment |
Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:
from datetime import datetime, date
df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))
add a comment |
Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:
from datetime import datetime, date
df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))
add a comment |
Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:
from datetime import datetime, date
df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))
Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:
from datetime import datetime, date
df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))
answered 2 days ago
Itamar MushkinItamar Mushkin
315110
315110
add a comment |
add a comment |
In general (in case of uncertainty), it would be better to explicitly specify the year:
pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
I ran this with the following data frame:
0 1
0 0 01-01-84
1 1 31-07-85
2 2 24-08-85
3 3 30-12-93
4 4 09-12-77
5 5 08-09-90
6 6 01-06-88
7 7 04-10-89
8 8 15-11-91
9 9 01-06-68
pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
Name: 1, dtype: datetime64[ns]
add a comment |
In general (in case of uncertainty), it would be better to explicitly specify the year:
pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
I ran this with the following data frame:
0 1
0 0 01-01-84
1 1 31-07-85
2 2 24-08-85
3 3 30-12-93
4 4 09-12-77
5 5 08-09-90
6 6 01-06-88
7 7 04-10-89
8 8 15-11-91
9 9 01-06-68
pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
Name: 1, dtype: datetime64[ns]
add a comment |
In general (in case of uncertainty), it would be better to explicitly specify the year:
pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
I ran this with the following data frame:
0 1
0 0 01-01-84
1 1 31-07-85
2 2 24-08-85
3 3 30-12-93
4 4 09-12-77
5 5 08-09-90
6 6 01-06-88
7 7 04-10-89
8 8 15-11-91
9 9 01-06-68
pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
Name: 1, dtype: datetime64[ns]
In general (in case of uncertainty), it would be better to explicitly specify the year:
pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
I ran this with the following data frame:
0 1
0 0 01-01-84
1 1 31-07-85
2 2 24-08-85
3 3 30-12-93
4 4 09-12-77
5 5 08-09-90
6 6 01-06-88
7 7 04-10-89
8 8 15-11-91
9 9 01-06-68
pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
Name: 1, dtype: datetime64[ns]
answered 2 days ago
bubblebubble
1,015713
1,015713
add a comment |
add a comment |
You can use the code below if there are only 19
and 20
as starts, like:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))
And if there are no 20
s anywhere else:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))
And now:
print(df['DOB'])
Is:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
add a comment |
You can use the code below if there are only 19
and 20
as starts, like:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))
And if there are no 20
s anywhere else:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))
And now:
print(df['DOB'])
Is:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
add a comment |
You can use the code below if there are only 19
and 20
as starts, like:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))
And if there are no 20
s anywhere else:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))
And now:
print(df['DOB'])
Is:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
You can use the code below if there are only 19
and 20
as starts, like:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))
And if there are no 20
s anywhere else:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))
And now:
print(df['DOB'])
Is:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
answered 2 days ago
U9-ForwardU9-Forward
18.6k51744
18.6k51744
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55739779%2fcentury-handling-in-pandas%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown