There have been heated debates on the unethical aspects of AI art, or text-to-image synthesis. What's worse, careless people have started to support models that violated Instagram ToS and human privacy rights on HuggingFace. So, I'm planning to train a generative image (multi-modal) model from scratch, using only images in the public domain with a CC0 or similar license. CC0 means the images are legally and ethically free-to-use for all personal and commercial projects.
If you're interested in being a part of this initiative towards 100% ethical and powerful text-to-image models, this is a good opportunity to get involved. I'm looking for a few students interested in developing automated web-scraping algorithms to scrape images off one of 18+ websites in python (unless you're willing to provide computing resources), or interested in creating solutions (perhaps training feature extraction or image classification models) to filter AI-generated images and images with logo/trademarks in our dataset (some websites I collected instructed to not use particular types of images).
I can do all this alone, but it would be a daunting task, so I would really appreciate if anybody can help.
Current contributors: 7 UBC computer science and engineering students