pandas中对于时间TimeStamp的处理

pandas.to_datetime()

pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)[source]

Parameters

  • arg int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like

    The object to convert to a datetime

  • format str, default None

The strftime to parse time, eg “%d/%m/%Y”, note that “%f” will parse all the way up to nanoseconds.


该函数可以接受一个series,可以接受一个dateFrame。如果不确定它是否可以以默认的格式去解析你的时间,format参数可以不传递。

1
2
3
4
5
6
7
df = pd.DataFrame({'year': [2015, 2016],
... 'month': [2, 3],
... 'day': [4, 5]})
>>> pd.to_datetime(df)
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]

上面的这种产生datetime的方式,在创建dateframe的时候,可以指定缩写或者缩写的复数形式,其他形式不接受:[‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’])

pandas.to_timedelta()

timedelta是两个时间之间的差值,该函数可以帮助我们求两个timestamp之间的差是多少(单位可以是days,hours,minutes,seconds

Parameters

  • arg str, timedelta, list-like or Series

  • unit str, optional

    Denotes the unit of the arg for numeric arg. Defaults to "ns".

    Possible values:

    • ‘W’
    • ‘D’ / ‘days’ / ‘day’
    • ‘hours’ / ‘hour’ / ‘hr’ / ‘h’
    • ‘m’ / ‘minute’ / ‘min’ / ‘minutes’ / ‘T’
    • ‘S’ / ‘seconds’ / ‘sec’ / ‘second’
    • ‘ms’ / ‘milliseconds’ / ‘millisecond’ / ‘milli’ / ‘millis’ / ‘L’
    • ‘us’ / ‘microseconds’ / ‘microsecond’ / ‘micro’ / ‘micros’ / ‘U’
    • ‘ns’ / ‘nanoseconds’ / ‘nano’ / ‘nanos’ / ‘nanosecond’ / ‘N’

这里如果传入的是str,是不允许再传入unit参数了,不然会报错。

1
2
3
4
5
6
7
8
>> pd.to_timedelta('15days 2hours')
Timedelta('15 days 02:00:00')

>> pd.to_timedelta('1 days 06:05:01.00003')
Timedelta('1 days 06:05:01.000030')

>> pd.to_timedelta(4,unit='days')
Timedelta('4 days 00:00:00')

Series.dt()

1
2
3
4
5
6
7
8
9
10
11
>> seconds_series = pd.Series(pd.date_range("2000-01-01", periods=3, freq="s"))
seconds_series
0 2000-01-01 00:00:00
1 2000-01-01 00:00:01
2 2000-01-01 00:00:02
dtype: datetime64[ns]
>> seconds_series.dt.second
0 0
1 1
2 2
dtype: int64

dt是Series的一个方法,当调用dt时,Series中必须是timestamp的格式。

当调用完dt后可以获取时间的具体年份等信息:

1
2
3
4
5
6
7
8
9
seconds_series.dt.date # 2000-01-01
seconds_series.dt.hour # 00
seconds_series.dt.quarter # 返回第几季度
seconds_series.dt.time # 00:00:00
seconds_series.dt.year # 2000
seconds_series.dt.month # 01
seconds_series.dt.day # 01
seconds_series.dt.weekday # 返回一个0-6的数,0表示周一,6表示周日
seconds_series.dt.dayname() # 会返回星期的名字:Monday

有的时候我们想获取某一天是全年中的第几周,这时候weekday就不管用了,此时采用:

1
dseries.dt.isocalendar()['week']