Century handling in Pandas Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersHow to populate Pandas datetime64[ns] values into MS AccessType of Series (Pandas) in to_datetime() method

Weaponising the Grasp-at-a-Distance spell

Why is the change of basis formula counter-intuitive? [See details]

How much damage would a cupful of neutron star matter do to the Earth?

Random body shuffle every night—can we still function?

What is the difference between CTSS and ITS?

New Order #6: Easter Egg

Resize vertical bars (absolute-value symbols)

Why are vacuum tubes still used in amateur radios?

Is openssl rand command cryptographically secure?

One-one communication

Nose gear failure in single prop aircraft: belly landing or nose-gear up landing?

Trying to understand entropy as a novice in thermodynamics

"klopfte jemand" or "jemand klopfte"?

A term for a woman complaining about things/begging in a cute/childish way

Special flights

Getting out of while loop on console

After Sam didn't return home in the end, were he and Al still friends?

What is the "studentd" process?

How do living politicians protect their readily obtainable signatures from misuse?

Found this skink in my tomato plant bucket. Is he trapped? Or could he leave if he wanted?

What are the main differences between Stargate SG-1 cuts?

Why complex landing gears are used instead of simple,reliability and light weight muscle wire or shape memory alloys?

Did any compiler fully use 80-bit floating point?

Asymptotics question



Century handling in Pandas



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersHow to populate Pandas datetime64[ns] values into MS AccessType of Series (Pandas) in to_datetime() method



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








9















I have following data in one of my columns:



df['DOB']

0 01-01-84
1 31-07-85
2 24-08-85
3 30-12-93
4 09-12-77
5 08-09-90
6 01-06-88
7 04-10-89
8 15-11-91
9 01-06-68
Name: DOB, dtype: object


I want to convert this to a datatype column.
I tried following:



print(pd.to_datetime(df1['Date.of.Birth']))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 2068-01-06
Name: DOB, dtype: datetime64[ns]


What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance










share|improve this question




























    9















    I have following data in one of my columns:



    df['DOB']

    0 01-01-84
    1 31-07-85
    2 24-08-85
    3 30-12-93
    4 09-12-77
    5 08-09-90
    6 01-06-88
    7 04-10-89
    8 15-11-91
    9 01-06-68
    Name: DOB, dtype: object


    I want to convert this to a datatype column.
    I tried following:



    print(pd.to_datetime(df1['Date.of.Birth']))
    0 1984-01-01
    1 1985-07-31
    2 1985-08-24
    3 1993-12-30
    4 1977-09-12
    5 1990-08-09
    6 1988-01-06
    7 1989-04-10
    8 1991-11-15
    9 2068-01-06
    Name: DOB, dtype: datetime64[ns]


    What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance










    share|improve this question
























      9












      9








      9








      I have following data in one of my columns:



      df['DOB']

      0 01-01-84
      1 31-07-85
      2 24-08-85
      3 30-12-93
      4 09-12-77
      5 08-09-90
      6 01-06-88
      7 04-10-89
      8 15-11-91
      9 01-06-68
      Name: DOB, dtype: object


      I want to convert this to a datatype column.
      I tried following:



      print(pd.to_datetime(df1['Date.of.Birth']))
      0 1984-01-01
      1 1985-07-31
      2 1985-08-24
      3 1993-12-30
      4 1977-09-12
      5 1990-08-09
      6 1988-01-06
      7 1989-04-10
      8 1991-11-15
      9 2068-01-06
      Name: DOB, dtype: datetime64[ns]


      What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance










      share|improve this question














      I have following data in one of my columns:



      df['DOB']

      0 01-01-84
      1 31-07-85
      2 24-08-85
      3 30-12-93
      4 09-12-77
      5 08-09-90
      6 01-06-88
      7 04-10-89
      8 15-11-91
      9 01-06-68
      Name: DOB, dtype: object


      I want to convert this to a datatype column.
      I tried following:



      print(pd.to_datetime(df1['Date.of.Birth']))
      0 1984-01-01
      1 1985-07-31
      2 1985-08-24
      3 1993-12-30
      4 1977-09-12
      5 1990-08-09
      6 1988-01-06
      7 1989-04-10
      8 1991-11-15
      9 2068-01-06
      Name: DOB, dtype: datetime64[ns]


      What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 2 days ago









      MadanMadan

      5514




      5514






















          5 Answers
          5






          active

          oldest

          votes


















          4














          In this specific case, I would use this:



          pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])


          Note that this will break if you have DOBs after 1999!



          Output:



          0 1984-01-01
          1 1985-07-31
          2 1985-08-24
          3 1993-12-30
          4 1977-09-12
          5 1990-08-09
          6 1988-01-06
          7 1989-04-10
          8 1991-11-15
          9 1968-01-06
          dtype: datetime64[ns]





          share|improve this answer

























          • Getting error series not defined. Hope that was a typo and have to use column name.

            – Madan
            2 days ago











          • @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

            – gmds
            2 days ago











          • @jezrael Yup, will edit question to specify that clearly

            – gmds
            2 days ago











          • Thanks @jezrael. I will not get dates with year > 1999 in my file.

            – Madan
            2 days ago


















          4














          You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:



          df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
          df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
          #same like
          #mask = df['DOB'].dt.year >= 2020
          #df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
          print (df)
          DOB
          0 1984-01-01
          1 1985-07-31
          2 1985-08-24
          3 1993-12-30
          4 1977-12-09
          5 1990-09-08
          6 1988-06-01
          7 1989-10-04
          8 1991-11-15
          9 1968-06-01



          Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.



          Notice: Solution working also for years 00 for 2000, up to 2020.



          s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
          s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
          mask = df['DOB'].str[-2:].astype(int) <= 20
          df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))

          print (df)
          DOB
          0 1984-01-01
          1 1985-07-31
          2 1985-08-24
          3 1993-12-30
          4 1977-09-12
          5 1990-08-09
          6 1988-01-06
          7 1989-04-10
          8 1991-11-15
          9 1968-01-06



          If all years are below 2000:



          s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
          df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
          print (df)
          DOB
          0 1984-01-01
          1 1985-07-31
          2 1985-08-24
          3 1993-12-30
          4 1977-12-09
          5 1990-09-08
          6 1988-06-01
          7 1989-10-04
          8 1991-11-15
          9 1968-06-01





          share|improve this answer

























          • Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

            – Madan
            2 days ago












          • @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

            – jezrael
            2 days ago


















          1














          Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:



          from datetime import datetime, date

          df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
          df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
          df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))





          share|improve this answer






























            0














            In general (in case of uncertainty), it would be better to explicitly specify the year:



            pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


            I ran this with the following data frame:



             0 1
            0 0 01-01-84
            1 1 31-07-85
            2 2 24-08-85
            3 3 30-12-93
            4 4 09-12-77
            5 5 08-09-90
            6 6 01-06-88
            7 7 04-10-89
            8 8 15-11-91
            9 9 01-06-68


            pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


            0 1984-01-01
            1 1985-07-31
            2 1985-08-24
            3 1993-12-30
            4 1977-09-12
            5 1990-08-09
            6 1988-01-06
            7 1989-04-10
            8 1991-11-15
            9 1968-01-06
            Name: 1, dtype: datetime64[ns]





            share|improve this answer






























              0














              You can use the code below if there are only 19 and 20 as starts, like:



              df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))


              And if there are no 20s anywhere else:



              df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))


              And now:



              print(df['DOB'])


              Is:



              0 1984-01-01
              1 1985-07-31
              2 1985-08-24
              3 1993-12-30
              4 1977-09-12
              5 1990-08-09
              6 1988-01-06
              7 1989-04-10
              8 1991-11-15
              9 1968-01-06
              dtype: datetime64[ns]





              share|improve this answer























                Your Answer






                StackExchange.ifUsing("editor", function ()
                StackExchange.using("externalEditor", function ()
                StackExchange.using("snippets", function ()
                StackExchange.snippets.init();
                );
                );
                , "code-snippets");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "1"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55739779%2fcentury-handling-in-pandas%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                4














                In this specific case, I would use this:



                pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])


                Note that this will break if you have DOBs after 1999!



                Output:



                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06
                dtype: datetime64[ns]





                share|improve this answer

























                • Getting error series not defined. Hope that was a typo and have to use column name.

                  – Madan
                  2 days ago











                • @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                  – gmds
                  2 days ago











                • @jezrael Yup, will edit question to specify that clearly

                  – gmds
                  2 days ago











                • Thanks @jezrael. I will not get dates with year > 1999 in my file.

                  – Madan
                  2 days ago















                4














                In this specific case, I would use this:



                pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])


                Note that this will break if you have DOBs after 1999!



                Output:



                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06
                dtype: datetime64[ns]





                share|improve this answer

























                • Getting error series not defined. Hope that was a typo and have to use column name.

                  – Madan
                  2 days ago











                • @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                  – gmds
                  2 days ago











                • @jezrael Yup, will edit question to specify that clearly

                  – gmds
                  2 days ago











                • Thanks @jezrael. I will not get dates with year > 1999 in my file.

                  – Madan
                  2 days ago













                4












                4








                4







                In this specific case, I would use this:



                pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])


                Note that this will break if you have DOBs after 1999!



                Output:



                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06
                dtype: datetime64[ns]





                share|improve this answer















                In this specific case, I would use this:



                pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])


                Note that this will break if you have DOBs after 1999!



                Output:



                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06
                dtype: datetime64[ns]






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 2 days ago

























                answered 2 days ago









                gmdsgmds

                6,989830




                6,989830












                • Getting error series not defined. Hope that was a typo and have to use column name.

                  – Madan
                  2 days ago











                • @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                  – gmds
                  2 days ago











                • @jezrael Yup, will edit question to specify that clearly

                  – gmds
                  2 days ago











                • Thanks @jezrael. I will not get dates with year > 1999 in my file.

                  – Madan
                  2 days ago

















                • Getting error series not defined. Hope that was a typo and have to use column name.

                  – Madan
                  2 days ago











                • @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                  – gmds
                  2 days ago











                • @jezrael Yup, will edit question to specify that clearly

                  – gmds
                  2 days ago











                • Thanks @jezrael. I will not get dates with year > 1999 in my file.

                  – Madan
                  2 days ago
















                Getting error series not defined. Hope that was a typo and have to use column name.

                – Madan
                2 days ago





                Getting error series not defined. Hope that was a typo and have to use column name.

                – Madan
                2 days ago













                @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                – gmds
                2 days ago





                @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                – gmds
                2 days ago













                @jezrael Yup, will edit question to specify that clearly

                – gmds
                2 days ago





                @jezrael Yup, will edit question to specify that clearly

                – gmds
                2 days ago













                Thanks @jezrael. I will not get dates with year > 1999 in my file.

                – Madan
                2 days ago





                Thanks @jezrael. I will not get dates with year > 1999 in my file.

                – Madan
                2 days ago













                4














                You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:



                df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
                df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
                #same like
                #mask = df['DOB'].dt.year >= 2020
                #df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01



                Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.



                Notice: Solution working also for years 00 for 2000, up to 2020.



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
                mask = df['DOB'].str[-2:].astype(int) <= 20
                df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))

                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06



                If all years are below 2000:



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01





                share|improve this answer

























                • Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                  – Madan
                  2 days ago












                • @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                  – jezrael
                  2 days ago















                4














                You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:



                df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
                df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
                #same like
                #mask = df['DOB'].dt.year >= 2020
                #df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01



                Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.



                Notice: Solution working also for years 00 for 2000, up to 2020.



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
                mask = df['DOB'].str[-2:].astype(int) <= 20
                df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))

                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06



                If all years are below 2000:



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01





                share|improve this answer

























                • Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                  – Madan
                  2 days ago












                • @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                  – jezrael
                  2 days ago













                4












                4








                4







                You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:



                df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
                df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
                #same like
                #mask = df['DOB'].dt.year >= 2020
                #df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01



                Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.



                Notice: Solution working also for years 00 for 2000, up to 2020.



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
                mask = df['DOB'].str[-2:].astype(int) <= 20
                df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))

                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06



                If all years are below 2000:



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01





                share|improve this answer















                You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:



                df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
                df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
                #same like
                #mask = df['DOB'].dt.year >= 2020
                #df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01



                Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.



                Notice: Solution working also for years 00 for 2000, up to 2020.



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
                mask = df['DOB'].str[-2:].astype(int) <= 20
                df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))

                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06



                If all years are below 2000:



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 2 days ago

























                answered 2 days ago









                jezraeljezrael

                361k26327407




                361k26327407












                • Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                  – Madan
                  2 days ago












                • @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                  – jezrael
                  2 days ago

















                • Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                  – Madan
                  2 days ago












                • @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                  – jezrael
                  2 days ago
















                Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                – Madan
                2 days ago






                Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                – Madan
                2 days ago














                @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                – jezrael
                2 days ago





                @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                – jezrael
                2 days ago











                1














                Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:



                from datetime import datetime, date

                df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
                df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
                df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))





                share|improve this answer



























                  1














                  Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:



                  from datetime import datetime, date

                  df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
                  df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
                  df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))





                  share|improve this answer

























                    1












                    1








                    1







                    Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:



                    from datetime import datetime, date

                    df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
                    df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
                    df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))





                    share|improve this answer













                    Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:



                    from datetime import datetime, date

                    df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
                    df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
                    df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered 2 days ago









                    Itamar MushkinItamar Mushkin

                    315110




                    315110





















                        0














                        In general (in case of uncertainty), it would be better to explicitly specify the year:



                        pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                        I ran this with the following data frame:



                         0 1
                        0 0 01-01-84
                        1 1 31-07-85
                        2 2 24-08-85
                        3 3 30-12-93
                        4 4 09-12-77
                        5 5 08-09-90
                        6 6 01-06-88
                        7 7 04-10-89
                        8 8 15-11-91
                        9 9 01-06-68


                        pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                        0 1984-01-01
                        1 1985-07-31
                        2 1985-08-24
                        3 1993-12-30
                        4 1977-09-12
                        5 1990-08-09
                        6 1988-01-06
                        7 1989-04-10
                        8 1991-11-15
                        9 1968-01-06
                        Name: 1, dtype: datetime64[ns]





                        share|improve this answer



























                          0














                          In general (in case of uncertainty), it would be better to explicitly specify the year:



                          pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                          I ran this with the following data frame:



                           0 1
                          0 0 01-01-84
                          1 1 31-07-85
                          2 2 24-08-85
                          3 3 30-12-93
                          4 4 09-12-77
                          5 5 08-09-90
                          6 6 01-06-88
                          7 7 04-10-89
                          8 8 15-11-91
                          9 9 01-06-68


                          pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                          0 1984-01-01
                          1 1985-07-31
                          2 1985-08-24
                          3 1993-12-30
                          4 1977-09-12
                          5 1990-08-09
                          6 1988-01-06
                          7 1989-04-10
                          8 1991-11-15
                          9 1968-01-06
                          Name: 1, dtype: datetime64[ns]





                          share|improve this answer

























                            0












                            0








                            0







                            In general (in case of uncertainty), it would be better to explicitly specify the year:



                            pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                            I ran this with the following data frame:



                             0 1
                            0 0 01-01-84
                            1 1 31-07-85
                            2 2 24-08-85
                            3 3 30-12-93
                            4 4 09-12-77
                            5 5 08-09-90
                            6 6 01-06-88
                            7 7 04-10-89
                            8 8 15-11-91
                            9 9 01-06-68


                            pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                            0 1984-01-01
                            1 1985-07-31
                            2 1985-08-24
                            3 1993-12-30
                            4 1977-09-12
                            5 1990-08-09
                            6 1988-01-06
                            7 1989-04-10
                            8 1991-11-15
                            9 1968-01-06
                            Name: 1, dtype: datetime64[ns]





                            share|improve this answer













                            In general (in case of uncertainty), it would be better to explicitly specify the year:



                            pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                            I ran this with the following data frame:



                             0 1
                            0 0 01-01-84
                            1 1 31-07-85
                            2 2 24-08-85
                            3 3 30-12-93
                            4 4 09-12-77
                            5 5 08-09-90
                            6 6 01-06-88
                            7 7 04-10-89
                            8 8 15-11-91
                            9 9 01-06-68


                            pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                            0 1984-01-01
                            1 1985-07-31
                            2 1985-08-24
                            3 1993-12-30
                            4 1977-09-12
                            5 1990-08-09
                            6 1988-01-06
                            7 1989-04-10
                            8 1991-11-15
                            9 1968-01-06
                            Name: 1, dtype: datetime64[ns]






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered 2 days ago









                            bubblebubble

                            1,015713




                            1,015713





















                                0














                                You can use the code below if there are only 19 and 20 as starts, like:



                                df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))


                                And if there are no 20s anywhere else:



                                df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))


                                And now:



                                print(df['DOB'])


                                Is:



                                0 1984-01-01
                                1 1985-07-31
                                2 1985-08-24
                                3 1993-12-30
                                4 1977-09-12
                                5 1990-08-09
                                6 1988-01-06
                                7 1989-04-10
                                8 1991-11-15
                                9 1968-01-06
                                dtype: datetime64[ns]





                                share|improve this answer



























                                  0














                                  You can use the code below if there are only 19 and 20 as starts, like:



                                  df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))


                                  And if there are no 20s anywhere else:



                                  df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))


                                  And now:



                                  print(df['DOB'])


                                  Is:



                                  0 1984-01-01
                                  1 1985-07-31
                                  2 1985-08-24
                                  3 1993-12-30
                                  4 1977-09-12
                                  5 1990-08-09
                                  6 1988-01-06
                                  7 1989-04-10
                                  8 1991-11-15
                                  9 1968-01-06
                                  dtype: datetime64[ns]





                                  share|improve this answer

























                                    0












                                    0








                                    0







                                    You can use the code below if there are only 19 and 20 as starts, like:



                                    df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))


                                    And if there are no 20s anywhere else:



                                    df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))


                                    And now:



                                    print(df['DOB'])


                                    Is:



                                    0 1984-01-01
                                    1 1985-07-31
                                    2 1985-08-24
                                    3 1993-12-30
                                    4 1977-09-12
                                    5 1990-08-09
                                    6 1988-01-06
                                    7 1989-04-10
                                    8 1991-11-15
                                    9 1968-01-06
                                    dtype: datetime64[ns]





                                    share|improve this answer













                                    You can use the code below if there are only 19 and 20 as starts, like:



                                    df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))


                                    And if there are no 20s anywhere else:



                                    df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))


                                    And now:



                                    print(df['DOB'])


                                    Is:



                                    0 1984-01-01
                                    1 1985-07-31
                                    2 1985-08-24
                                    3 1993-12-30
                                    4 1977-09-12
                                    5 1990-08-09
                                    6 1988-01-06
                                    7 1989-04-10
                                    8 1991-11-15
                                    9 1968-01-06
                                    dtype: datetime64[ns]






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered 2 days ago









                                    U9-ForwardU9-Forward

                                    18.6k51744




                                    18.6k51744



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55739779%2fcentury-handling-in-pandas%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                                        Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

                                        Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?