Please make use of my blog posts for learning purpose only and feel free to ask your questions in the comment box below in case of any doubt.
Click Here for the previous blog-post in the series.
Recommended SQL Courses:
SQL Problem Statement:
You are given a table, Projects, containing three columns: Task_ID, Start_Date and End_Date. It is guaranteed that the difference between the End_Date and the Start_Date is equal to 1 day for each row in the table
Table: Projects |
If the End_Date of the tasks are consecutive, then they are part of the same project. Samantha is interested in finding the total number of different projects completed.
Write a query to output the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then order by the start date of the project.
Sample Input:
Sample Output:
- Project 1: Tasks 1, 2 and 3 are completed on consecutive days, so these are part of the project. Thus start date of project is 2015-10-01 and end date is 2015-10-04, so it took 3 days to complete the project.
- Project 2: Tasks 4 and 5 are completed on consecutive days, so these are part of the project. Thus, the start date of project is 2015-10-13 and end date is 2015-10-15, so it took 2 days to complete the project.
- Project 3: Only task 6 is part of the project. Thus, the start date of project is 2015-10-28 and end date is 2015-10-29, so it took 1 day to complete the project.
- Project 4: Only task 7 is part of the project. Thus, the start date of project is 2015-10-30 and end date is 2015-10-31, so it took 1 day to complete the project.
Solution-1: Using CROSS JOIN, DATEDIFF & SUB-QUERY (MySQL Query):
- Table s will give all the Start_Dates of the project
LOGIC: All the Start_Dates which are not present in column End_Date are the Start_Date of the Project. - Table e will give all the End_Dates of the project
LOGIC: All the End_Dates which are not present in column Start_Date are the End_Date of the Project. - After applying cross_join on table s and e, you will get all possible combinations of Proj_Start_Date & Proj_End_Date. (i.e. you will get multiple Proj_End_Dates for each Proj_Start_Date)
- But, Well select the min(Proj_End_Date) as a valid/acceptable End_Date of the project and will neglect all other End_Dates for the given Start_Date of the Project.
Reason: As given in the question, if the tasks End_Dates are consecutive, then only we will consider those tasks in the same project. - We have to apply GROUP BY Proj_Start_Date to get min(Proj_End_Date)
- Use DATEDIFF function to calculate the difference between project end_date and project start_date
- Apply ORDER BY on above DATEDIFF and Proj_Start_Date.
Solution-2: Using CROSS JOIN, DATEDIFF & SUB-QUERY (MySQL Query):
- Table s will give all the Start_Dates of the project
LOGIC: All the Start_Dates which are not present in column End_Date are the Start_Date of the Project. - Table e will give all the End_Dates of the project
LOGIC: All the End_Dates which are not present in column Start_Date are the End_Date of the Project. - After applying cross_join on table s and e, you will get all possible combinations of Proj_Start_Date & Proj_End_Date. (i.e. you will get multiple Proj_End_Dates for each Proj_Start_Date)
- But, Well select the min(Proj_End_Date) as a valid/acceptable End_Date of the project and will neglect all other End_Dates for the given Start_Date of the Project.
Reason: As given in the question, if the tasks End_Dates are consecutive, then only we will consider those tasks in the same project. - We have to apply GROUP BY Proj_Start_Date to get min(Proj_End_Date)
- Use DATEDIFF function to calculate the difference between project end_date and project start_date
- Apply ORDER BY on above DATEDIFF and Proj_Start_Date.