For the speech recognition as well as commands, I propose to use Google Text to speech api. This will allow for the app to give commands as well as take voice input from the user.
I currently develop apps on Android Studio and am well versed with working with Studio/gradle etc. So, I will make the app such that it is compatible with latest version of Android (5.0) and backward compatible till 4.3 (JellyBean).
I will use Gallery (Phone Storage/SD card) for accessing the pictures (Or any other resource).
Also, I have an experience of developing a similar app for Android wear, where everything happens via voice commands.
Let me know in case there is any ambiguity. I can complete the project within a day.