Derive Date Spans from Start and End Dates in SQL Server table
-
04-03-2021 - |
質問
I am using SQL Server 2016
I have a table that contains 1 row per month that a patient is assigned to a particular Provider.
A patient can be assigned to multiple providers during the year.
How can I derive date spans (startdate & enddate) to represent the time a patient was assigned to each provider.
My table looks like this:
+----------+---------------+------------+-----------+
| Provider | Patient | StartDate | EndDate |
+----------+---------------+------------+-----------+
| 1922157 | 12345 | 20191201 | 20191231 |
| 1904176 | 12345 | 20191101 | 20191201 |
| 1904176 | 12345 | 20191001 | 20191101 |
| 1904176 | 12345 | 20190901 | 20191001 |
| 1904176 | 12345 | 20190801 | 20190901 |
| 1904176 | 12345 | 20190701 | 20190801 |
| 1904176 | 12345 | 20190601 | 20190701 |
| 1904176 | 12345 | 20190501 | 20190601 |
| 1904176 | 12345 | 20190401 | 20190501 |
| 1904176 | 12345 | 20190301 | 20190401 |
| 1904176 | 12345 | 20190201 | 20190301 |
| 1922157 | 12345 | 20190101 | 20190201 |
| 1922157 | 56789 | 20190101 | 20190201 |
+----------+---------------+------------+-----------+
In this case, patient 12345 was assigned to 2 different providers. One for 2 months, January and then December and the other for the rest of the year (10 months) February through November. Patient 56789 was only assigned to 1 provider (1922157) for 1 month (in December).
I'm trying to make it so my output looks like the below table but I am running into issues I think because the patient is assigned to the same pcp during 2 different times of the year. I tried using the lag function but I only get the correct results for some cases but not all such as this particular case.
+----------+---------------+------------+-----------+
| Provider | Patient | StartDate | EndDate |
+----------+---------------+------------+-----------+
| 1922157 | 12345 | 20190101 | 20190201 |
| 1904176 | 12345 | 20190201 | 20191201 |
| 1922157 | 12345 | 20191201 | 20191231 |
| 1922157 | 56789 | 20191201 | 20191231 |
+----------+---------------+------------+-----------+
Update: Was doing some more research and came across the following post:
https://stackoverflow.com/questions/35900765/ms-sql-combine-date-rows-into-start-end-date
I just fit my table into the code in the answer for above question and tested for a few of my cases and it looks like it might get the job done. Unfortunately, my base table has 140k rows of dates it will need to calculate through so I am not sure how long it will take to run. Has been running now for 6 minutes, I will post back with results.
解決
I think I understand what you're trying to do. You're trying to get the start date and end date of a patient at a provider, as long as there is no gap between the start and end dates of the periodes. I've created a test table with the data you sampled.
Create table test (Provider int, Patient int, startdate date, enddate date)
insert into test (Provider, Patient, StartDate, EndDate)
SELECT * FROM
(SELECT 1922157 as Provider , 12345 as Patient , '2019-12-01' as StartDate , '2019-12-31' as EndDate
union all SELECT 1904176 , 12345 , '2019-11-01' , '2019-12-01'
union all SELECT 1904176 , 12345 , '2019-10-01' , '2019-11-01'
union all SELECT 1904176 , 12345 , '2019-09-01' , '2019-10-01'
union all SELECT 1904176 , 12345 , '2019-08-01' , '2019-09-01'
union all SELECT 1904176 , 12345 , '2019-07-01' , '2019-08-01'
union all SELECT 1904176 , 12345 , '2019-06-01' , '2019-07-01'
union all SELECT 1904176 , 12345 , '2019-05-01' , '2019-06-01'
union all SELECT 1904176 , 12345 , '2019-04-01' , '2019-05-01'
union all SELECT 1904176 , 12345 , '2019-03-01' , '2019-04-01'
union all SELECT 1904176 , 12345 , '2019-02-01' , '2019-03-01'
union all SELECT 1922157 , 12345 , '2019-01-01' , '2019-02-01'
union all SELECT 1922157 , 56789 , '2019-01-01' , '2019-02-01' )t
The Idea is to start by ordering data and trying to get those that start date and end dates match, in order to detect a hole in the dates. I do that with the "ROW_NUMBER" function. I then find all the rows that match and take the first StartDate and max EndDate for those who match, and then I add all the rows that are "alone" and have no match.
I think it works wit the data you provided. I didn't get to test it with other data. Recursivity is another option to find the Min/Max dates of different values but I didn't go with recursivity in this case. (feel free to give better names, I went a little fast)
;With RowsWithNum AS
(
SELECT Provider, Patient, StartDate, EndDate, ROW_NUMBER() OVER (ORDER BY Provider, patient, StartDate) as RowNum
FROM test
)
,BeforeAndAfterDates AS
(
SELECT a.Provider, a.Patient, a.StartDate, a.RowNum, a.EndDate, b.StartDate EndStartDate, DATEPART(DAYOFYEAR, b.StartDate)-DATEPART(DAYOFYEAR,a.EndDate) as DateDiffInDays, b.EndDate as EndEndDate, b.RowNum as EndRowNum
FROM RowsWithNum a
LEFT JOIN RowsWithNum b ON b.Provider=a.Provider and b.Patient=a.Patient and b.StartDate=a.EndDate
)
SELECT Provider, Patient, Min(StartDate) as StartDate, Max(EndEndDate) as EndDate, Min(RowNum) as RowNum
FROM BeforeAndAfterDates
WHERE DateDiffInDays=0
GROUP BY Provider, Patient
UNION
SELECT a.Provider, a.Patient, a.StartDate, a.EndDate, a.RowNum
FROM BeforeAndAfterDates a
LEFT JOIN BeforeAndAfterDates b ON b.EndEndDate=a.enddate
WHERE a.DateDiffInDays IS NULL AND b.RowNum IS NULL
And here is my result.