I would like the application to perform the following tasks -
1. Read the i-Frames from the video file starting from x minutes from end of file
2. Convert those i-frames to images e.g. .jpg
3. Submit each of the generated images to Microsoft Cognitive service ([login to view URL]) which can help to understand what text is present on the submitted image
4. Store the text along with the image Id and the exact time in the video where that image was present. It has to be accurate upto frame.