A cutting-edge platform that transforms text inputs into complete videos through a structured, multi-stage process. This service leverages OpenAI's API to intelligently break down provided text into scene descriptions, which are then used as prompts for generating images. Users can refine these scene descriptions, aligning them with their vision for greater customization. Each scene's visuals are created using either OpenAI's DALLE or Stable Diffusion, and these images are compiled into a cohesive video. Built on a robust technology stack, the platform incorporates React, Redux, and Material UI for the front end, while the backend functionalities, written in Python, operate seamlessly on Firebase.
Challenge
The main challenge was to build a streamlined pipeline that could interpret raw text and translate it into visual scenes, while maintaining a single style for all images with the flexibility for customization.
Solution
Rigorous prompt selection along with models fine tuning and continues retraining allowed to create the process for continuos image generation enhancement.
Components
Frontend: Built with React, Redux, and Material UI to deliver a dynamic, interactive user experience.
Backend: Serverless architecture written in Python, deployed as Firebase Functions, utilizing Firebase Authentication, Firestore, and Google Storage for secure data management and content storage.
Scene and Image Generation: Powered by OpenAI's API, DALLE, and MidJourney to transform text into scene descriptions and images.
Technologies
React, Redux, Material UI, Next.js, Python, Firebase Functions, Firebase Authentication, Firestore, Google Storage, OpenAI API, DALLE, MidJourney, Selenium.