Why image generation for newsrooms?
One of the most common issues local outlets and smaller publications face is a lack of funding to support a formal graphics/illustrations department. At Northwestern's Knight Lab, we set out to reduce bias in AI-generated imagery so under-resourced newsrooms can source and use accurate, inclusive visuals for real stories. We focused on improving portrayals of common professions and making results more consistent and realistic.
We built on prior Knight Lab cohorts' findings about baseline biases in Stable Diffusion XL (SDXL). After desk research on training methods and fine-tuning, we selected professions (e.g., CEO, teacher, nurse) and ran baseline tests to document how the model depicted each role to set up our training. We also built an image parser to help use standardize the visual format and captioning syntax of the training images that we were feeding into our LoRA.
How we approached model training
We ran weekly loops: curate images and label with observational, objective captions (profession, gender, race) → train the LoRA → compare baseline vs. trained outputs → adjust the dataset to correct over/under-representation. We ran into issues with this approach early on: rounds with open-ended prompts reproduced the very biases we were trying to fix (e.g., defaulting to male CEOs) and gave inconsistent results run-to-run.
Our pivot: a structured prompt scaffold. Mid-way, we switched to a simple schema that declares the target attributes first, then fills in scene details. Example: {profession: CEO | nurse | teacher}; {subject: gender/age/skin tone}; {setting: workplace}; {shot: medium portrait}; {style: natural light, editorial}.
This shift kept attributes front-and-center so the model didn't drift into defaults, enabled balanced side-by-side sets for each profession (one prompt per attribute combo, same seed/size for fair comparisons), and made iterative testing more efficient; we could immediately scan an output grid and identify where representation slipped. We also added negative cues (e.g., avoid stereotypical props and clothing) and tightened the dataset to mirror the same attribute balance we asked for in prompts, which improved consistency and reduced stereotype over representation.
What changed & what's next
- 1. More balanced depictions across target professions (less male over-representation for "CEO," more diverse portrayals of "nurse").
- 2. A reusable prompt template and manual-style report that non-technical and technical editors can utilize to guide future generation scenarios.
- 3. Constraints remain (limited testing capabilities with available hardware, minimal training scenarios, etc.) The next cohort will focus on investigating ways to scale the dataset, formalize evaluation methods, and pilot a version of the trained LoRA with a partner newsroom.
Takeaways
- 1. Structure matters→ attribute-first prompts mitigate output and make bias easier to see and measure.
- 2. Data quality and prompt design both need to be prioritized; fixing one without the other won't produce good results.
- 3. Small, frequent training/eval loops surface stereotype "leaks" faster than big, infrequent runs.