plugin for nlp-compromise
npm install compromise-dates
This library is an earnest attempt to get date information out of text, in a clear way -
- including all informal text formats, and folksy shorthands.
``js
import nlp from 'compromise'
import datePlugin from 'compromise-dates'
nlp.plugin(datePlugin)
let doc = nlp('the second monday of february')
doc.dates().get()[0]
/*
{ start: '2021-02-08T00:00:00.000Z', end: '2021-02-08T23:59:59.999Z'}
*/
`



• Tokenization and disambiguation with compromise.



| explicit-dates | _description_ | Start | End |numeric-dates:
| ----------------------------------- | :-----------------------------------: | ---------------: | ---------------: |
| _march 2nd_ | | March 2, 12:00am | March 2, 11:59pm |
| _2 march_ | | '' | '' |
| _tues march 2_ | | '' | '' |
| _march the second_ | _natural-language number_ | '' | '' |
| _on the 2nd_ | _implicit months_ | '' | '' |
| _tuesday the 2nd_ | _date-reckoning_ | '' | '' |
| | |named-dates:
| _2020/03/02_ | _iso formats_ | '' | '' |
| _2020-03-02_ | | '' | '' |
| _03-02-2020_ | _british formats_ | '' | '' |
| _03/02_ | | '' | '' |
| _2020.08.13_ | _alt-ISO_ | '' | '' |
| | |times:
| _today_ | | - | - |
| _tomorrow_ | | '' | '' |
| _christmas eve_ | _calendar-holidays_ | Dec 24, 12:00am | Dec 24, 11:59pm |
| _easter_ | _astronomical holidays_ | -depends- | - |
| _q1_ | | Jan 1, 12:00am | Mar 31, 11:59pm |
| | |timezones:
| _2pm_ | | '' | '' |
| _2:12pm_ | | '' | '' |
| _2:12_ | | '' | '' |
| _02:12:00_ | _weird iso-times_ | '' | '' |
| _two oclock_ | _written formats_ | '' | '' |
| _before 1_ | | '' | '' |
| _noon_ | | '' | '' |
| _at night_ | _informal daytimes_ | '' | '' |
| _in the morning_ | | '' | '' |
| _tomorrow evening_ | | '' | '' |
| | |relative durations:
| _eastern time_ | _informal zone support_ | '' | '' |
| _est_ | _TZ shorthands_ | '' | '' |
| _peru time_ | | '' | '' |
| _..in beirut_ | _by location_ | '' | '' |
| _GMT+9_ | _by UTC/GMT offset_ | '' | '' |
| _-4h_ | '' | '' | '' |
| _Canada/Eastern_ | _IANA codes_ | '' | '' |
| | |punted dates:
| _this march_ | | '' | '' |
| _this week_ | | '' | '' |
| _this sunday_ | | '' | '' |
| _next april_ | | '' | '' |
| _this past year_ | | '' | '' |
| _second week of march_ | | '' | '' |
| _last weekend of march_ | | '' | '' |
| _last spring_ | | '' | '' |
| _the saturday after next_ | | '' | '' |
| | |start/end:
| _in seven weeks_ | _now+duration_ | '' | '' |
| _two days after june 6th_ | _date+duration_ | '' | '' |
| _2 weeks from now_ | | '' | '' |
| _2 weeks after june_ | | '' | '' |
| _2 years, 4 months, and 5 days ago_ | _complex durations_ | '' | '' |
| _a week and a half before_ | _written-out numbers_ | '' | '' |
| _a week friday_ | _idiom format_ | '' | '' |
| | |date-ranges:
| _end of the week_ | _up-against the ending_ | '' | '' |
| _start of next year_ | _lean-toward starting_ | '' | '' |
| _middle of q2 last year_ | _rough-center calculation_ | '' | '' |
| | |repeating-intervals:
| _between june and july_ | _explicit ranges_ | '' | '' |
| _from today to next haloween_ | | '' | '' |
| _aug 1 - aug 31_ | _dash-ranges_ | '' | '' |
| _22-23 February_ | | '' | '' |
| _today to next friday_ | | '' | '' |
| _during june_ | | '' | '' |
| _aug to june 1999_ | _shared range info_ | '' | '' |
| _before [2019]_ | _up-to a date_ | '' | '' |
| _by march_ | | '' | '' |
| _after february_ | _date-to-infinity_ | '' | '' |
| | |
| _any wednesday_ | _n-repeating dates_ | |
| _any day in June_ | _repeating-date in range_ | June 1 ... | .. June 30 |
| _any wednesday this week_ | | '' | '' |
| _weekends in July_ | _more-complex interval_ | '' | '' |
| _every weekday until February_ | _interval until date_ | '' | '' |

| _hmmm,_ | _description_ | Start | End |
| ------------------------ | :--------------------------------------------: | :-----: | :---: |
| _middle of 2019/June_ | tries to find the sorta-center | June 15 | '' |
| _good friday 2025_ | tries to reckon astronomically-set holidays | '' | '' |
| _Oct 22 1975 2am in PST_ | historical DST changes (assumes current dates) | '' | '' |

| _😓,_ | _description_ | Start | End |
| ------------------------------------------- | :----------------------: | :-----: | :---: |
| _not this Saturday, but the Saturday after_ | self-reference logic | '' | '' |
| _3 years ago tomorrow_ | folksy short-hand | '' | '' |
| _2100_ | military time formats | '' | '' |
| _may 97_ | 'bare' 2-digit years | '' | '' |


- .dates() - find dates like June 8th or 03/03/182 weeks
- .dates().get() - simple start/end json result
- .dates().json() - overloaded output with date metadata
- .dates().format('') - convert the dates to specific formats
- .dates().isBefore(iso) - return only dates occuring before given date
- .dates().isAfter(iso) - return only dates occuring after given date
- .dates().isSame(unit, iso) - return only dates within a given year, month, date
- .durations() - or 5mins4:30pm
- .durations().get() - return simple json for duration
- .durations().json() - overloaded output with duration metadata
- .times() - or half past five
- .durations().get() - return simple json for times
- .times().json() - overloaded output with time metadata

.dates() accepts an optional object, that lets you set the context for the date parsing.
`js
const context = {
timezone: 'Canada/Eastern', //the default timezone is 'ETC/UTC'
today: '2020-02-20', //the implicit, or reference day/year
punt: { weeks: 2 }, // the implied duration to use for 'after june 2nd'
dayStart: '8:00am',
dayEnd: '5:30pm',
dmy : false //assume british-format dates, when unclear
}
nlp('in two days').dates(context).get()
/*
[{ start: '2020-02-22T08:00:00.000+5:00', end: '2020-02-22T17:30:00.000+5:00' }]
*/
`

By default, weeks start on a Monday, and _'next week'_ will run from Monday morning to Sunday night.
This can be configued in spacetime, but right now we are not passing-through this config.
_'after October'_ returns a range starting Nov 1st, and ending 2-weeks after, by default.
This can be configured by setting punt param in the context object:
`js`
doc.dates({ punt: { month: 1 } })
_'May 7th'_ will prefer a May 7th in the future.
The parser will return a past-date though, in the current-month:
`js`
// from march 2nd
nlp('feb 30th').dates({ today: '2021-02-01' }).get()
named-weeks or months eg _'this/next/last week'_ are mostly straight-forward.
#### _This monday_
A bare 'monday' will always refer to itself, or the upcoming monday.
- Saying _'this monday'_ on monday, is itself.
- Saying _'this monday'_ on tuesday , is next week.
Likewise, _'this june'_ in June, is itself. _'this june'_ in any other month, is the nearest June in the future.
Future versions of this library could look at sentence-tense to help disambiguate these dates - _'i paid on monday'_ vs _'i will pay on monday'_.
#### _Last monday_
If it's Tuesday, _'last monday'_ will not mean yesterday.
- Saying _'last monday'_ on a tuesday will be -1 week.
- Saying _'a week ago monday'_ will also work.
- Saying _'this past monday'_ will return yesterday.
For reference, Wit.ai & chronic libraries both return yesterday. Natty and SugarJs returns -1 week, like we do.
_'last X'_ can be less than 7 days backward, if it crosses a week starting-point:
- Saying _'last friday'_ on a monday will be only a few days back.
#### _Next Friday_
If it's Tuesday, _'next wednesday'_ will not be tomorrow. It will be a week after tomorrow.
- Saying _'next wednesday'_ on a tuesday, will be +1 week.
- Saying _'a week wednesday'_ will also be +1 week.
- Saying _'this coming wednesday'_ will be tomorrow.
For reference, Wit.ai, chronic, and Natty libraries all return tomorrow. SugarJs returns +1 week, like we do.
The first week of a month, or a year is the first week _with a thursday in it_. This is a weird, but widely-held standard. I believe it's a military formalism. It cannot be (easily) configued. This means that the start-date for _first week of January_ may be a Monday in December, etc.
As expected, _first monday of January_ will always be in January.
by default, we use the same interpretation of dates as javascript does - we assume 01/02/2020 is Jan 2nd, (US-version) but allow 13/01/2020 to be Jan 13th (UK-version).
if you want to co-erce an interpretation of 02/03/1999, you can set it with the dmy:true option:`js`
nlp('02/03/1999').dates().get() //February 3
nlp('02/03/1999').dates({dmy:true}).get() // March 21999-03-02
ISO dates, (like ) are unaffected by the change.
By default, _'this summer'_ will return June 1 - Sept 1, which is northern hemisphere ISO.
Configuring the default hemisphere should be possible in the future.
There are some hardcoded times for _'lunch time'_ and others, but mainly, a day begins at 12:00am and ends at 11:59pm - the last millisecond of the day.
compromise will tag anything that looks like a date, but not validate the dates until they are parsed.
- _'january 34th 2020'_ will return Jan 31 2020.
- _'tomorrow at 2:62pm'_ will return just return 'tomorrow'.
- _'6th week of february_ will return the 2nd week of march.
- Setting an hour that's skipped, or repeated by a DST change will return the closest valid time to the DST change.
_'between january and march'_ will include all of march. This is usually pretty-ambiguous normally.
This library makes no assumptions about the input text, and is careful to avoid false-positive dates.
If you know your text is a date, you can crank-up the date-tagger with a compromise-plugin, like so:
`js``
nlp.extend(function (Doc, world) {
// ambiguous words
world.addWords({
weds: 'WeekDay',
wed: 'WeekDay',
sat: 'WeekDay',
sun: 'WeekDay',
})
world.postProcess(doc => {
// tag '2nd quarter' as a date
doc.match('#Ordinal quarter').tag('#Date')
// tag '2/2' as a date (not a fraction)
doc.match('/[0-9]{1,2}/[0-9]{1,2}/').tag('#Date')
})
})
- _'thursday the 16th'_ - will set to the 16th, even if it's not thursday
- _'in a few hours/years'_ - in 2 hours/years
- _'jan 5th 2008 to Jan 6th the following year'_ - date-range explicit references
- assume _'half past 5'_ is 5pm



1 - Regular-expressions are too-brittle to parse dates.
2 - Neural-nets are too-wonky to parse dates.
3 - A corporation, or startup is the wrong place to build a universal date-parser.
Parsing _dates_, _times_, _durations_, and _intervals_ from natural language can be a solved-problem.
A rule-based, community open-source library - _one based on simple NLP_ - is the best way to build a natural language date parser - commercial, or otherwise - for the frontend, or the backend.
The _match-syntax_ is effective and easy, _javascript_ is prevailing, and the more people who contribute, the better.

- Duckling - by wit.ai (facebook)
- Sugarjs/dates - by Andrew Plummer (js)
- Chronic - by Tom Preston-Werner (Ruby)
- SUTime - by Angel Chang, Christopher Manning (Java)
- Natty - by Joe Stelmach (Java)
- rrule - repeating date-interval handler (js)
- ParseDateTime by Mike Taylor (Python)

compromise-date is sponsored by 
MIT licenced