DiagramQG is a comprehensive educational dataset focused on scientific diagram question generation. It contains:
Note: Due to the ongoing peer review process of our research paper, we are currently releasing a subset of the DiagramQG dataset.
Figure 1: Four different examples of different subjects in the DiagramQG dataset.
Figure 2: Domain diversity in DiagramQG. Each color corresponds to one subject: Natural Science (blue), Earth Science (yellow), Applied Science (green), and Social Science (orange).
The dataset covers four main subject areas:
Data is organized hierarchically:
Figure 3: Question distribution in DiagramQG.
Figure 4: Distribution of diagrams, questions, and questions per diagram ratios across different concepts in DiagramQG.
Dataset | Questions | Images | Objects/Image | Image Type | Constraints | Knowledge Type |
---|---|---|---|---|---|---|
VQAv2.0 | 1.1M | 20k | 3.5 | natural | answer | N/A |
FVQA | 5,826 | 2k | 2.9 | natural | answer | common-sense |
VQG-COCO | 25,000 | 5k | 3.3 | natural | image, caption | common-sense |
K-VQG | 16,098 | 13k | 2.7 | natural | knowledge triple | common-sense |
DiagramQG | 19,475 | 8,372 | 11.2 | diagram | target, concept | subject knowledge |
Our dataset is released under the Apache-2.0 license. You can download our dataset from DiagramQG or check out our GitHub repository.