Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
feat: add txt/docx export scripts and fix MDX angle bracket parsing
Browse files- Add export-txt.mjs and export-docx.mjs for alternative export formats
- Fix MDX parser error by escaping angle brackets before numbers (e.g., <30B → <30B)
- Update article content from Notion
- Minor improvements to export-pdf.mjs and screenshot-elements.mjs
- .gitignore +1 -0
- app/package-lock.json +0 -0
- app/package.json +0 -0
- app/scripts/README-TXT-EXPORT.md +129 -0
- app/scripts/export-docx.mjs +303 -0
- app/scripts/export-pdf.mjs +16 -4
- app/scripts/export-txt.mjs +527 -0
- app/scripts/notion-importer/mdx-converter.mjs +30 -0
- app/scripts/screenshot-elements.mjs +87 -16
- app/src/content/article.mdx +0 -0
- app/src/pages/dataviz.astro +1 -1
- app/src/pages/index.astro +30 -0
- app/yarn.lock +0 -0
.gitignore
CHANGED
|
@@ -43,3 +43,4 @@ app/public/data/**/*
|
|
| 43 |
.temp-*/
|
| 44 |
.backup-*/
|
| 45 |
|
|
|
|
|
|
| 43 |
.temp-*/
|
| 44 |
.backup-*/
|
| 45 |
|
| 46 |
+
*.docx
|
app/package-lock.json
CHANGED
|
Binary files a/app/package-lock.json and b/app/package-lock.json differ
|
|
|
app/package.json
CHANGED
|
Binary files a/app/package.json and b/app/package.json differ
|
|
|
app/scripts/README-TXT-EXPORT.md
ADDED
|
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# TXT Export for Book Publishing
|
| 2 |
+
|
| 3 |
+
This script exports the article to a simple text format suitable for book publishing software, with custom tags for special elements.
|
| 4 |
+
|
| 5 |
+
## Usage
|
| 6 |
+
|
| 7 |
+
```bash
|
| 8 |
+
npm run export:txt
|
| 9 |
+
```
|
| 10 |
+
|
| 11 |
+
Or with custom filename:
|
| 12 |
+
|
| 13 |
+
```bash
|
| 14 |
+
node scripts/export-txt.mjs --filename=my-article
|
| 15 |
+
```
|
| 16 |
+
|
| 17 |
+
## Output
|
| 18 |
+
|
| 19 |
+
The script generates a `.txt` file in the `dist/` folder with the following format:
|
| 20 |
+
|
| 21 |
+
### Text Tags
|
| 22 |
+
|
| 23 |
+
#### Figures/Images
|
| 24 |
+
```
|
| 25 |
+
<f> NAME ANCHOR DESCRIPTION </f>
|
| 26 |
+
```
|
| 27 |
+
- **NAME**: Figure name (e.g., "Figure 1")
|
| 28 |
+
- **ANCHOR**: HTML anchor/ID for cross-references
|
| 29 |
+
- **DESCRIPTION**: Figure caption/description
|
| 30 |
+
|
| 31 |
+
Example:
|
| 32 |
+
```
|
| 33 |
+
<f>Figure 1 placeholder-image A placeholder image description</f>
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
#### Tables
|
| 37 |
+
```
|
| 38 |
+
<t> NAME DESCRIPTION </t>
|
| 39 |
+
```
|
| 40 |
+
- **NAME**: Table name (e.g., "Table 1")
|
| 41 |
+
- **DESCRIPTION**: Table caption/description
|
| 42 |
+
|
| 43 |
+
Example:
|
| 44 |
+
```
|
| 45 |
+
<t>Table 1 | Comparison of model architectures</t>
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
#### Code Blocks
|
| 49 |
+
```
|
| 50 |
+
<c> CODE | DESCRIPTION </c>
|
| 51 |
+
```
|
| 52 |
+
- **CODE**: The actual code content
|
| 53 |
+
- **DESCRIPTION**: Optional description or caption
|
| 54 |
+
|
| 55 |
+
Example:
|
| 56 |
+
```
|
| 57 |
+
<c>function hello() {
|
| 58 |
+
console.log("Hello world");
|
| 59 |
+
} | JavaScript example function</c>
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
#### Inline Code
|
| 63 |
+
```
|
| 64 |
+
<ic> CODE </ic>
|
| 65 |
+
```
|
| 66 |
+
Example:
|
| 67 |
+
```
|
| 68 |
+
Use the <ic>npm install</ic> command to install dependencies.
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
#### LaTeX Formulas
|
| 72 |
+
```
|
| 73 |
+
<l> katex-number </l>
|
| 74 |
+
```
|
| 75 |
+
References to exported KaTeX formula PNGs, numbered chronologically.
|
| 76 |
+
|
| 77 |
+
Example:
|
| 78 |
+
```
|
| 79 |
+
The equation <l>katex-1</l> shows the relationship...
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
The corresponding PNG files should be exported separately (e.g., `katex-1.png`, `katex-2.png`, etc.)
|
| 83 |
+
|
| 84 |
+
## Standard Markdown Elements
|
| 85 |
+
|
| 86 |
+
The script also preserves standard markdown formatting:
|
| 87 |
+
|
| 88 |
+
- **Headings**: `# ## ###` etc.
|
| 89 |
+
- **Paragraphs**: Plain text with line breaks
|
| 90 |
+
- **Lists**: Bulleted (`-`) and numbered (`1. 2. 3.`)
|
| 91 |
+
- **Blockquotes**: `> Text`
|
| 92 |
+
|
| 93 |
+
## How It Works
|
| 94 |
+
|
| 95 |
+
1. **Build**: Builds the Astro site (if not already built)
|
| 96 |
+
2. **Launch**: Starts a preview server
|
| 97 |
+
3. **Extract**: Uses Playwright to load the page and extract content from the DOM
|
| 98 |
+
4. **Convert**: Transforms HTML elements into the custom tag format
|
| 99 |
+
5. **Export**: Writes the result to `dist/article.txt`
|
| 100 |
+
|
| 101 |
+
## Example Output
|
| 102 |
+
|
| 103 |
+
```
|
| 104 |
+
# Introduction
|
| 105 |
+
|
| 106 |
+
This is a paragraph with <ic>inline code</ic> and a reference to <l>katex-1</l>.
|
| 107 |
+
|
| 108 |
+
<f>Figure 1 training-loss Training loss over time for SmolLM3</f>
|
| 109 |
+
|
| 110 |
+
## Methods
|
| 111 |
+
|
| 112 |
+
We used the following approach:
|
| 113 |
+
|
| 114 |
+
- First step
|
| 115 |
+
- Second step
|
| 116 |
+
- Third step
|
| 117 |
+
|
| 118 |
+
<c>def train_model():
|
| 119 |
+
return model | Python training function</c>
|
| 120 |
+
|
| 121 |
+
<t>Table 1 | Hyperparameters used in training</t>
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
## Notes
|
| 125 |
+
|
| 126 |
+
- The script reuses the same infrastructure as PDF export (`export-pdf.mjs`)
|
| 127 |
+
- It's designed to work with the existing Astro build pipeline
|
| 128 |
+
- All custom components (Image, HtmlEmbed, Note, etc.) are properly handled
|
| 129 |
+
- KaTeX formulas are numbered sequentially for easy reference to exported PNGs
|
app/scripts/export-docx.mjs
ADDED
|
@@ -0,0 +1,303 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env node
|
| 2 |
+
|
| 3 |
+
/**
|
| 4 |
+
* Export TXT to DOCX format for book publishing
|
| 5 |
+
*
|
| 6 |
+
* This script converts the exported TXT file to a simple DOCX document:
|
| 7 |
+
* - Preserves headings, paragraphs, lists
|
| 8 |
+
* - Keeps custom tags (<f>, <t>, <l>, <ic>, <il>, <n>) as-is for manual processing
|
| 9 |
+
* - Formats code blocks
|
| 10 |
+
* - Creates a clean document ready for further editing
|
| 11 |
+
*
|
| 12 |
+
* Usage:
|
| 13 |
+
* node scripts/export-docx.mjs [--input=path/to/file.txt]
|
| 14 |
+
* npm run export:docx
|
| 15 |
+
*/
|
| 16 |
+
|
| 17 |
+
import { Document, Packer, Paragraph, TextRun, HeadingLevel, AlignmentType } from 'docx';
|
| 18 |
+
import { promises as fs } from 'node:fs';
|
| 19 |
+
import { resolve } from 'node:path';
|
| 20 |
+
import process from 'node:process';
|
| 21 |
+
|
| 22 |
+
function parseArgs(argv) {
|
| 23 |
+
const out = {};
|
| 24 |
+
for (const arg of argv.slice(2)) {
|
| 25 |
+
if (!arg.startsWith('--')) continue;
|
| 26 |
+
const [k, v] = arg.replace(/^--/, '').split('=');
|
| 27 |
+
out[k] = v === undefined ? true : v;
|
| 28 |
+
}
|
| 29 |
+
return out;
|
| 30 |
+
}
|
| 31 |
+
|
| 32 |
+
function detectHeadingLevel(line) {
|
| 33 |
+
const match = line.match(/^(#{1,6})\s+(.+)$/);
|
| 34 |
+
if (!match) return null;
|
| 35 |
+
const level = match[1].length;
|
| 36 |
+
const text = match[2].trim();
|
| 37 |
+
return { level, text };
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
function parseInlineFormatting(text) {
|
| 41 |
+
const runs = [];
|
| 42 |
+
let currentPos = 0;
|
| 43 |
+
|
| 44 |
+
// Parse inline tags: <ic>, <il>, <n> (keep as-is with special formatting)
|
| 45 |
+
const tagRegex = /<(ic|il|n)>([^<]*)<\/\1>/g;
|
| 46 |
+
let match;
|
| 47 |
+
|
| 48 |
+
while ((match = tagRegex.exec(text)) !== null) {
|
| 49 |
+
// Add text before the tag
|
| 50 |
+
if (match.index > currentPos) {
|
| 51 |
+
const beforeText = text.substring(currentPos, match.index);
|
| 52 |
+
if (beforeText) {
|
| 53 |
+
runs.push(new TextRun(beforeText));
|
| 54 |
+
}
|
| 55 |
+
}
|
| 56 |
+
|
| 57 |
+
// Add the tagged content with special formatting
|
| 58 |
+
const tagType = match[1];
|
| 59 |
+
const content = match[2];
|
| 60 |
+
|
| 61 |
+
if (tagType === 'ic') {
|
| 62 |
+
// Inline code: monospace, gray background
|
| 63 |
+
runs.push(new TextRun({
|
| 64 |
+
text: content,
|
| 65 |
+
font: 'Courier New',
|
| 66 |
+
color: '333333',
|
| 67 |
+
shading: { fill: 'E8E8E8', type: 'clear' }
|
| 68 |
+
}));
|
| 69 |
+
} else if (tagType === 'il') {
|
| 70 |
+
// Inline LaTeX: italic, keep as-is
|
| 71 |
+
runs.push(new TextRun({
|
| 72 |
+
text: content,
|
| 73 |
+
italics: true,
|
| 74 |
+
color: '0066CC'
|
| 75 |
+
}));
|
| 76 |
+
} else if (tagType === 'n') {
|
| 77 |
+
// Note: keep tag for manual processing
|
| 78 |
+
runs.push(new TextRun({
|
| 79 |
+
text: `<n>${content}</n>`,
|
| 80 |
+
color: 'FF6B00',
|
| 81 |
+
italics: true
|
| 82 |
+
}));
|
| 83 |
+
}
|
| 84 |
+
|
| 85 |
+
currentPos = match.index + match[0].length;
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
// Add remaining text
|
| 89 |
+
if (currentPos < text.length) {
|
| 90 |
+
runs.push(new TextRun(text.substring(currentPos)));
|
| 91 |
+
}
|
| 92 |
+
|
| 93 |
+
return runs.length > 0 ? runs : [new TextRun(text)];
|
| 94 |
+
}
|
| 95 |
+
|
| 96 |
+
async function convertTxtToDocx(txtPath, outputPath) {
|
| 97 |
+
console.log(`📖 Reading TXT file: ${txtPath}`);
|
| 98 |
+
const content = await fs.readFile(txtPath, 'utf-8');
|
| 99 |
+
const lines = content.split('\n');
|
| 100 |
+
|
| 101 |
+
const paragraphs = [];
|
| 102 |
+
let inCodeBlock = false;
|
| 103 |
+
let codeLines = [];
|
| 104 |
+
|
| 105 |
+
for (let i = 0; i < lines.length; i++) {
|
| 106 |
+
const line = lines[i];
|
| 107 |
+
|
| 108 |
+
// Skip empty lines unless in code block
|
| 109 |
+
if (!line.trim() && !inCodeBlock) {
|
| 110 |
+
paragraphs.push(new Paragraph({ text: '' }));
|
| 111 |
+
continue;
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
// Handle code blocks <c>...</c>
|
| 115 |
+
if (line.trim().startsWith('<c>')) {
|
| 116 |
+
inCodeBlock = true;
|
| 117 |
+
codeLines = [];
|
| 118 |
+
const firstLine = line.replace(/^<c>\s*/, '');
|
| 119 |
+
if (firstLine && !firstLine.startsWith('</c>')) {
|
| 120 |
+
codeLines.push(firstLine);
|
| 121 |
+
}
|
| 122 |
+
continue;
|
| 123 |
+
}
|
| 124 |
+
|
| 125 |
+
if (line.trim().endsWith('</c>')) {
|
| 126 |
+
const lastLine = line.replace(/<\/c>\s*$/, '');
|
| 127 |
+
if (lastLine) codeLines.push(lastLine);
|
| 128 |
+
|
| 129 |
+
// Add code block as paragraph(s)
|
| 130 |
+
if (codeLines.length > 0) {
|
| 131 |
+
paragraphs.push(new Paragraph({
|
| 132 |
+
text: codeLines.join('\n'),
|
| 133 |
+
font: 'Courier New',
|
| 134 |
+
size: 20,
|
| 135 |
+
shading: { fill: 'F5F5F5', type: 'clear' },
|
| 136 |
+
spacing: { before: 200, after: 200 }
|
| 137 |
+
}));
|
| 138 |
+
}
|
| 139 |
+
|
| 140 |
+
inCodeBlock = false;
|
| 141 |
+
codeLines = [];
|
| 142 |
+
continue;
|
| 143 |
+
}
|
| 144 |
+
|
| 145 |
+
if (inCodeBlock) {
|
| 146 |
+
codeLines.push(line);
|
| 147 |
+
continue;
|
| 148 |
+
}
|
| 149 |
+
|
| 150 |
+
// Handle figure tags <f>...</f>
|
| 151 |
+
if (line.trim().startsWith('<f>')) {
|
| 152 |
+
paragraphs.push(new Paragraph({
|
| 153 |
+
children: [new TextRun({
|
| 154 |
+
text: line.trim(),
|
| 155 |
+
color: '0066CC',
|
| 156 |
+
bold: true
|
| 157 |
+
})],
|
| 158 |
+
spacing: { before: 200, after: 100 }
|
| 159 |
+
}));
|
| 160 |
+
continue;
|
| 161 |
+
}
|
| 162 |
+
|
| 163 |
+
// Handle table tags <t>...</t>
|
| 164 |
+
if (line.trim().startsWith('<t>')) {
|
| 165 |
+
paragraphs.push(new Paragraph({
|
| 166 |
+
children: [new TextRun({
|
| 167 |
+
text: line.trim(),
|
| 168 |
+
color: '009688',
|
| 169 |
+
bold: true
|
| 170 |
+
})],
|
| 171 |
+
spacing: { before: 200, after: 100 }
|
| 172 |
+
}));
|
| 173 |
+
continue;
|
| 174 |
+
}
|
| 175 |
+
|
| 176 |
+
// Handle LaTeX display tags <l>...</l>
|
| 177 |
+
if (line.trim().startsWith('<l>')) {
|
| 178 |
+
paragraphs.push(new Paragraph({
|
| 179 |
+
children: [new TextRun({
|
| 180 |
+
text: line.trim(),
|
| 181 |
+
color: '9C27B0',
|
| 182 |
+
bold: true
|
| 183 |
+
})],
|
| 184 |
+
alignment: AlignmentType.CENTER,
|
| 185 |
+
spacing: { before: 200, after: 200 }
|
| 186 |
+
}));
|
| 187 |
+
continue;
|
| 188 |
+
}
|
| 189 |
+
|
| 190 |
+
// Handle headings
|
| 191 |
+
const heading = detectHeadingLevel(line);
|
| 192 |
+
if (heading) {
|
| 193 |
+
const headingLevels = {
|
| 194 |
+
1: HeadingLevel.HEADING_1,
|
| 195 |
+
2: HeadingLevel.HEADING_2,
|
| 196 |
+
3: HeadingLevel.HEADING_3,
|
| 197 |
+
4: HeadingLevel.HEADING_4,
|
| 198 |
+
5: HeadingLevel.HEADING_5,
|
| 199 |
+
6: HeadingLevel.HEADING_6
|
| 200 |
+
};
|
| 201 |
+
|
| 202 |
+
paragraphs.push(new Paragraph({
|
| 203 |
+
text: heading.text,
|
| 204 |
+
heading: headingLevels[heading.level],
|
| 205 |
+
spacing: { before: 400, after: 200 }
|
| 206 |
+
}));
|
| 207 |
+
continue;
|
| 208 |
+
}
|
| 209 |
+
|
| 210 |
+
// Handle list items
|
| 211 |
+
if (line.trim().startsWith('- ')) {
|
| 212 |
+
const text = line.trim().substring(2);
|
| 213 |
+
paragraphs.push(new Paragraph({
|
| 214 |
+
children: parseInlineFormatting(text),
|
| 215 |
+
bullet: { level: 0 },
|
| 216 |
+
spacing: { before: 100, after: 100 }
|
| 217 |
+
}));
|
| 218 |
+
continue;
|
| 219 |
+
}
|
| 220 |
+
|
| 221 |
+
// Handle numbered lists
|
| 222 |
+
const numberedMatch = line.trim().match(/^(\d+)\.\s+(.+)$/);
|
| 223 |
+
if (numberedMatch) {
|
| 224 |
+
const text = numberedMatch[2];
|
| 225 |
+
paragraphs.push(new Paragraph({
|
| 226 |
+
children: parseInlineFormatting(text),
|
| 227 |
+
numbering: { reference: 'default-numbering', level: 0 },
|
| 228 |
+
spacing: { before: 100, after: 100 }
|
| 229 |
+
}));
|
| 230 |
+
continue;
|
| 231 |
+
}
|
| 232 |
+
|
| 233 |
+
// Handle blockquotes
|
| 234 |
+
if (line.trim().startsWith('> ')) {
|
| 235 |
+
const text = line.trim().substring(2);
|
| 236 |
+
paragraphs.push(new Paragraph({
|
| 237 |
+
children: parseInlineFormatting(text),
|
| 238 |
+
italics: true,
|
| 239 |
+
indent: { left: 720 },
|
| 240 |
+
spacing: { before: 200, after: 200 }
|
| 241 |
+
}));
|
| 242 |
+
continue;
|
| 243 |
+
}
|
| 244 |
+
|
| 245 |
+
// Regular paragraph
|
| 246 |
+
if (line.trim()) {
|
| 247 |
+
paragraphs.push(new Paragraph({
|
| 248 |
+
children: parseInlineFormatting(line.trim()),
|
| 249 |
+
spacing: { before: 100, after: 100 }
|
| 250 |
+
}));
|
| 251 |
+
}
|
| 252 |
+
}
|
| 253 |
+
|
| 254 |
+
console.log(`📝 Creating DOCX with ${paragraphs.length} paragraphs...`);
|
| 255 |
+
|
| 256 |
+
const doc = new Document({
|
| 257 |
+
sections: [{
|
| 258 |
+
properties: {},
|
| 259 |
+
children: paragraphs
|
| 260 |
+
}]
|
| 261 |
+
});
|
| 262 |
+
|
| 263 |
+
console.log(`💾 Writing DOCX to: ${outputPath}`);
|
| 264 |
+
const buffer = await Packer.toBuffer(doc);
|
| 265 |
+
await fs.writeFile(outputPath, buffer);
|
| 266 |
+
|
| 267 |
+
console.log(`✅ DOCX created successfully!`);
|
| 268 |
+
}
|
| 269 |
+
|
| 270 |
+
async function main() {
|
| 271 |
+
const cwd = process.cwd();
|
| 272 |
+
const args = parseArgs(process.argv);
|
| 273 |
+
|
| 274 |
+
const inputPath = args.input || resolve(cwd, 'dist', 'the-smol-training-playbook-the-secrets-to-building-world-class-llms.txt');
|
| 275 |
+
const outputPath = args.output || inputPath.replace('.txt', '.docx');
|
| 276 |
+
|
| 277 |
+
// Check if input exists
|
| 278 |
+
try {
|
| 279 |
+
await fs.access(inputPath);
|
| 280 |
+
} catch {
|
| 281 |
+
console.error(`❌ Error: Input file not found: ${inputPath}`);
|
| 282 |
+
console.error(' Run "npm run export:txt" first to generate the TXT file.');
|
| 283 |
+
process.exit(1);
|
| 284 |
+
}
|
| 285 |
+
|
| 286 |
+
await convertTxtToDocx(inputPath, outputPath);
|
| 287 |
+
|
| 288 |
+
// Also copy to public folder
|
| 289 |
+
const publicPath = outputPath.replace('/dist/', '/public/');
|
| 290 |
+
try {
|
| 291 |
+
await fs.mkdir(resolve(cwd, 'public'), { recursive: true });
|
| 292 |
+
await fs.copyFile(outputPath, publicPath);
|
| 293 |
+
console.log(`✅ DOCX copied to: ${publicPath}`);
|
| 294 |
+
} catch (e) {
|
| 295 |
+
console.warn('Unable to copy DOCX to public/:', e?.message || e);
|
| 296 |
+
}
|
| 297 |
+
}
|
| 298 |
+
|
| 299 |
+
main().catch((err) => {
|
| 300 |
+
console.error('❌ Error:', err.message);
|
| 301 |
+
console.error(err);
|
| 302 |
+
process.exit(1);
|
| 303 |
+
});
|
app/scripts/export-pdf.mjs
CHANGED
|
@@ -246,6 +246,18 @@ iframe, embed, object { width: 100% !important; max-width: 100% !important; heig
|
|
| 246 |
.html-embed, .html-embed__card { max-width: 100% !important; width: 100% !important; }
|
| 247 |
.html-embed__card > div[id^="frag-"] { width: 100% !important; max-width: 100% !important; }
|
| 248 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 249 |
/* Banner centering */
|
| 250 |
.hero .points { mix-blend-mode: normal !important; }
|
| 251 |
.hero-banner, .hero .hero-banner, [class*="hero-banner"] {
|
|
@@ -282,8 +294,8 @@ iframe, embed, object { width: 100% !important; max-width: 100% !important; heig
|
|
| 282 |
width: auto !important;
|
| 283 |
height: auto !important;
|
| 284 |
max-width: 100% !important;
|
| 285 |
-
/* Limit height to fit on a single page (~
|
| 286 |
-
max-height:
|
| 287 |
display: block !important;
|
| 288 |
object-fit: contain !important;
|
| 289 |
margin-left: auto !important;
|
|
@@ -727,8 +739,8 @@ async function main() {
|
|
| 727 |
|
| 728 |
const browser = await chromium.launch({ headless: true });
|
| 729 |
try {
|
| 730 |
-
// Use
|
| 731 |
-
const deviceScaleFactor =
|
| 732 |
const context = await browser.newContext({
|
| 733 |
deviceScaleFactor
|
| 734 |
});
|
|
|
|
| 246 |
.html-embed, .html-embed__card { max-width: 100% !important; width: 100% !important; }
|
| 247 |
.html-embed__card > div[id^="frag-"] { width: 100% !important; max-width: 100% !important; }
|
| 248 |
|
| 249 |
+
/* Wide mode: remove blur/mask effects for print */
|
| 250 |
+
.wide, .html-embed--wide {
|
| 251 |
+
-webkit-mask: none !important;
|
| 252 |
+
mask: none !important;
|
| 253 |
+
background: transparent !important;
|
| 254 |
+
padding: 0 !important;
|
| 255 |
+
width: 100% !important;
|
| 256 |
+
margin-left: 0 !important;
|
| 257 |
+
transform: none !important;
|
| 258 |
+
border-radius: 0 !important;
|
| 259 |
+
}
|
| 260 |
+
|
| 261 |
/* Banner centering */
|
| 262 |
.hero .points { mix-blend-mode: normal !important; }
|
| 263 |
.hero-banner, .hero .hero-banner, [class*="hero-banner"] {
|
|
|
|
| 294 |
width: auto !important;
|
| 295 |
height: auto !important;
|
| 296 |
max-width: 100% !important;
|
| 297 |
+
/* Limit height to fit on a single page (~269mm printable = ~1015px, with margin) */
|
| 298 |
+
max-height: 950px !important;
|
| 299 |
display: block !important;
|
| 300 |
object-fit: contain !important;
|
| 301 |
margin-left: auto !important;
|
|
|
|
| 739 |
|
| 740 |
const browser = await chromium.launch({ headless: true });
|
| 741 |
try {
|
| 742 |
+
// Use 4x scale factor for high-DPI screenshots
|
| 743 |
+
const deviceScaleFactor = 4;
|
| 744 |
const context = await browser.newContext({
|
| 745 |
deviceScaleFactor
|
| 746 |
});
|
app/scripts/export-txt.mjs
ADDED
|
@@ -0,0 +1,527 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env node
|
| 2 |
+
|
| 3 |
+
/**
|
| 4 |
+
* Export article to TXT format for book publishing
|
| 5 |
+
*
|
| 6 |
+
* This script exports the article to a simple text format with custom tags:
|
| 7 |
+
* - <f> NAME ANCHOR DESCRIPTION </f> for figures/images
|
| 8 |
+
* - <t> NAME DESCRIPTION </t> for tables
|
| 9 |
+
* - <c> CODE | DESCRIPTION </c> for code blocks
|
| 10 |
+
* - <ic> CODE </ic> for inline code
|
| 11 |
+
* - <l> katex-number </l> for LaTeX formulas (references exported PNGs)
|
| 12 |
+
*
|
| 13 |
+
* Usage:
|
| 14 |
+
* node scripts/export-txt.mjs
|
| 15 |
+
* npm run export:txt
|
| 16 |
+
*
|
| 17 |
+
* Output: dist/article.txt
|
| 18 |
+
*/
|
| 19 |
+
|
| 20 |
+
import { spawn } from 'node:child_process';
|
| 21 |
+
import { setTimeout as delay } from 'node:timers/promises';
|
| 22 |
+
import { chromium } from 'playwright';
|
| 23 |
+
import { resolve } from 'node:path';
|
| 24 |
+
import { promises as fs } from 'node:fs';
|
| 25 |
+
import process from 'node:process';
|
| 26 |
+
|
| 27 |
+
async function run(command, args = [], options = {}) {
|
| 28 |
+
return new Promise((resolvePromise, reject) => {
|
| 29 |
+
const child = spawn(command, args, { stdio: 'inherit', shell: false, ...options });
|
| 30 |
+
child.on('error', reject);
|
| 31 |
+
child.on('exit', (code) => {
|
| 32 |
+
if (code === 0) resolvePromise(undefined);
|
| 33 |
+
else reject(new Error(`${command} ${args.join(' ')} exited with code ${code}`));
|
| 34 |
+
});
|
| 35 |
+
});
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
async function waitForServer(url, timeoutMs = 60000) {
|
| 39 |
+
const start = Date.now();
|
| 40 |
+
while (Date.now() - start < timeoutMs) {
|
| 41 |
+
try {
|
| 42 |
+
const res = await fetch(url);
|
| 43 |
+
if (res.ok) return;
|
| 44 |
+
} catch { }
|
| 45 |
+
await delay(500);
|
| 46 |
+
}
|
| 47 |
+
throw new Error(`Server did not start in time: ${url}`);
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
function parseArgs(argv) {
|
| 51 |
+
const out = {};
|
| 52 |
+
for (const arg of argv.slice(2)) {
|
| 53 |
+
if (!arg.startsWith('--')) continue;
|
| 54 |
+
const [k, v] = arg.replace(/^--/, '').split('=');
|
| 55 |
+
out[k] = v === undefined ? true : v;
|
| 56 |
+
}
|
| 57 |
+
return out;
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
function slugify(text) {
|
| 61 |
+
return String(text || '')
|
| 62 |
+
.normalize('NFKD')
|
| 63 |
+
.replace(/\p{Diacritic}+/gu, '')
|
| 64 |
+
.toLowerCase()
|
| 65 |
+
.replace(/[^a-z0-9]+/g, '-')
|
| 66 |
+
.replace(/^-+|-+$/g, '')
|
| 67 |
+
.slice(0, 120) || 'article';
|
| 68 |
+
}
|
| 69 |
+
|
| 70 |
+
/**
|
| 71 |
+
* Clean text content: remove extra whitespace, normalize line breaks
|
| 72 |
+
*/
|
| 73 |
+
function cleanText(text) {
|
| 74 |
+
return String(text || '')
|
| 75 |
+
.replace(/\s+/g, ' ')
|
| 76 |
+
.trim();
|
| 77 |
+
}
|
| 78 |
+
|
| 79 |
+
/**
|
| 80 |
+
* Strip HTML tags from text
|
| 81 |
+
*/
|
| 82 |
+
function stripHtml(html) {
|
| 83 |
+
return String(html || '')
|
| 84 |
+
.replace(/<[^>]*>/g, '')
|
| 85 |
+
.replace(/ /g, ' ')
|
| 86 |
+
.replace(/&/g, '&')
|
| 87 |
+
.replace(/</g, '<')
|
| 88 |
+
.replace(/>/g, '>')
|
| 89 |
+
.replace(/"/g, '"')
|
| 90 |
+
.replace(/'/g, "'")
|
| 91 |
+
.trim();
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
/**
|
| 95 |
+
* Convert heading level to markdown syntax
|
| 96 |
+
*/
|
| 97 |
+
function headingToMarkdown(level, text) {
|
| 98 |
+
const hashes = '#'.repeat(Math.min(level, 6));
|
| 99 |
+
return `${hashes} ${text}`;
|
| 100 |
+
}
|
| 101 |
+
|
| 102 |
+
/**
|
| 103 |
+
* Extract and convert article content to TXT format
|
| 104 |
+
*/
|
| 105 |
+
async function extractArticleContent(page) {
|
| 106 |
+
return await page.evaluate(() => {
|
| 107 |
+
const output = [];
|
| 108 |
+
let globalCounter = 0; // Global counter for all visual elements (matches screenshot script)
|
| 109 |
+
const katexMap = new Map(); // Track unique katex formulas for referencing
|
| 110 |
+
|
| 111 |
+
// Helper: clean text
|
| 112 |
+
const cleanText = (text) => String(text || '').replace(/\s+/g, ' ').trim();
|
| 113 |
+
|
| 114 |
+
// Helper: strip HTML
|
| 115 |
+
const stripHtml = (html) => {
|
| 116 |
+
const div = document.createElement('div');
|
| 117 |
+
div.innerHTML = html;
|
| 118 |
+
return cleanText(div.textContent || '');
|
| 119 |
+
};
|
| 120 |
+
|
| 121 |
+
// Helper: get element ID or generate anchor
|
| 122 |
+
const getAnchor = (el) => {
|
| 123 |
+
if (el.id) return el.id;
|
| 124 |
+
// Try to find ID in parent figure
|
| 125 |
+
const figure = el.closest('figure');
|
| 126 |
+
if (figure?.id) return figure.id;
|
| 127 |
+
return '';
|
| 128 |
+
};
|
| 129 |
+
|
| 130 |
+
// Helper: parse caption to extract name and description
|
| 131 |
+
const parseCaptionText = (captionText, type = 'Figure') => {
|
| 132 |
+
if (!captionText) return { name: '', description: '' };
|
| 133 |
+
|
| 134 |
+
// Try to match patterns like:
|
| 135 |
+
// "Figure 1: Description"
|
| 136 |
+
// "Table 2: Description"
|
| 137 |
+
// "Fig. 3: Description"
|
| 138 |
+
const patterns = [
|
| 139 |
+
new RegExp(`^(${type}\\s*\\d+[a-z]?)\\s*[:\\-–—]\\s*(.+)$`, 'i'),
|
| 140 |
+
new RegExp(`^(Fig\\.?\\s*\\d+[a-z]?)\\s*[:\\-–—]\\s*(.+)$`, 'i'),
|
| 141 |
+
new RegExp(`^(Table\\s*\\d+[a-z]?)\\s*[:\\-–—]\\s*(.+)$`, 'i'),
|
| 142 |
+
];
|
| 143 |
+
|
| 144 |
+
for (const pattern of patterns) {
|
| 145 |
+
const match = captionText.match(pattern);
|
| 146 |
+
if (match) {
|
| 147 |
+
return { name: match[1].trim(), description: match[2].trim() };
|
| 148 |
+
}
|
| 149 |
+
}
|
| 150 |
+
|
| 151 |
+
// No pattern found, entire text is description
|
| 152 |
+
return { name: '', description: captionText.trim() };
|
| 153 |
+
};
|
| 154 |
+
|
| 155 |
+
// Process main content
|
| 156 |
+
const main = document.querySelector('main');
|
| 157 |
+
if (!main) return 'Error: main element not found';
|
| 158 |
+
|
| 159 |
+
// Helper: get all visual elements in DOM order (same as screenshot script)
|
| 160 |
+
const allVisualElements = Array.from(main.querySelectorAll('.html-embed, .table-scroll > table, .image-wrapper, figure, .katex-display'));
|
| 161 |
+
const elementIndexMap = new Map();
|
| 162 |
+
|
| 163 |
+
// Pre-process: assign global indices to visual elements
|
| 164 |
+
allVisualElements.forEach((el, idx) => {
|
| 165 |
+
elementIndexMap.set(el, idx + 1);
|
| 166 |
+
});
|
| 167 |
+
|
| 168 |
+
// Walk through all child nodes
|
| 169 |
+
const processNode = (node) => {
|
| 170 |
+
const tag = node.tagName?.toLowerCase();
|
| 171 |
+
|
| 172 |
+
// Headings
|
| 173 |
+
if (/^h[1-6]$/.test(tag)) {
|
| 174 |
+
const level = parseInt(tag[1]);
|
| 175 |
+
const text = cleanText(node.textContent);
|
| 176 |
+
const hashes = '#'.repeat(level);
|
| 177 |
+
output.push(`\n${hashes} ${text}\n`);
|
| 178 |
+
return;
|
| 179 |
+
}
|
| 180 |
+
|
| 181 |
+
// Paragraphs
|
| 182 |
+
if (tag === 'p') {
|
| 183 |
+
const text = node.textContent?.trim();
|
| 184 |
+
if (text) {
|
| 185 |
+
// Process inline elements within paragraph
|
| 186 |
+
let processedText = '';
|
| 187 |
+
const processInline = (n) => {
|
| 188 |
+
if (n.nodeType === Node.TEXT_NODE) {
|
| 189 |
+
processedText += n.textContent;
|
| 190 |
+
} else if (n.tagName === 'CODE' && !n.closest('pre')) {
|
| 191 |
+
// Inline code
|
| 192 |
+
const code = cleanText(n.textContent);
|
| 193 |
+
processedText += `<ic>${code}</ic>`;
|
| 194 |
+
} else if (n.classList?.contains('katex')) {
|
| 195 |
+
// Inline katex - wrap in <il> tags
|
| 196 |
+
const formula = cleanText(n.textContent || '');
|
| 197 |
+
processedText += `<il>${formula}</il>`;
|
| 198 |
+
} else if (n.childNodes) {
|
| 199 |
+
n.childNodes.forEach(processInline);
|
| 200 |
+
}
|
| 201 |
+
};
|
| 202 |
+
|
| 203 |
+
node.childNodes.forEach(processInline);
|
| 204 |
+
output.push(processedText.trim() + '\n');
|
| 205 |
+
}
|
| 206 |
+
return;
|
| 207 |
+
}
|
| 208 |
+
|
| 209 |
+
// Display math (KaTeX)
|
| 210 |
+
if (node.classList?.contains('katex-display')) {
|
| 211 |
+
const globalIndex = elementIndexMap.get(node);
|
| 212 |
+
if (globalIndex) {
|
| 213 |
+
output.push(`<l>katex-${globalIndex}</l>\n`);
|
| 214 |
+
}
|
| 215 |
+
return;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// Code blocks
|
| 219 |
+
if (tag === 'pre') {
|
| 220 |
+
const code = node.querySelector('code');
|
| 221 |
+
if (code) {
|
| 222 |
+
const codeText = code.textContent || '';
|
| 223 |
+
const language = code.className.match(/language-(\w+)/)?.[1] || '';
|
| 224 |
+
|
| 225 |
+
// Try to find description from parent or next sibling
|
| 226 |
+
let description = '';
|
| 227 |
+
const figure = node.closest('figure');
|
| 228 |
+
if (figure) {
|
| 229 |
+
const caption = figure.querySelector('figcaption');
|
| 230 |
+
if (caption) description = stripHtml(caption.innerHTML);
|
| 231 |
+
}
|
| 232 |
+
|
| 233 |
+
if (description) {
|
| 234 |
+
output.push(`<c>${codeText.trim()} | ${description}</c>\n`);
|
| 235 |
+
} else {
|
| 236 |
+
output.push(`<c>${codeText.trim()}</c>\n`);
|
| 237 |
+
}
|
| 238 |
+
}
|
| 239 |
+
return;
|
| 240 |
+
}
|
| 241 |
+
|
| 242 |
+
// Tables
|
| 243 |
+
if (tag === 'table') {
|
| 244 |
+
// Check if this table is in a .table-scroll container (visual element)
|
| 245 |
+
const tableScroll = node.closest('.table-scroll');
|
| 246 |
+
const globalIndex = tableScroll ? elementIndexMap.get(node) : null;
|
| 247 |
+
|
| 248 |
+
// Skip if not a tracked table, but still recurse
|
| 249 |
+
if (!globalIndex) {
|
| 250 |
+
return;
|
| 251 |
+
}
|
| 252 |
+
|
| 253 |
+
const figure = node.closest('figure');
|
| 254 |
+
let name = '';
|
| 255 |
+
let description = '';
|
| 256 |
+
let anchor = '';
|
| 257 |
+
|
| 258 |
+
if (figure) {
|
| 259 |
+
anchor = getAnchor(figure);
|
| 260 |
+
const caption = figure.querySelector('figcaption');
|
| 261 |
+
if (caption) {
|
| 262 |
+
const captionText = stripHtml(caption.innerHTML);
|
| 263 |
+
const parsed = parseCaptionText(captionText, 'Table');
|
| 264 |
+
name = parsed.name;
|
| 265 |
+
description = parsed.description;
|
| 266 |
+
}
|
| 267 |
+
}
|
| 268 |
+
|
| 269 |
+
// If no name found, generate one with global index (matching filename format)
|
| 270 |
+
if (!name) {
|
| 271 |
+
name = `table-${globalIndex}`;
|
| 272 |
+
}
|
| 273 |
+
|
| 274 |
+
// Build the tag
|
| 275 |
+
const parts = [name];
|
| 276 |
+
if (anchor) parts.push(anchor);
|
| 277 |
+
if (description) parts.push(description);
|
| 278 |
+
|
| 279 |
+
output.push(`<t>${parts.join(' | ')}</t>\n`);
|
| 280 |
+
|
| 281 |
+
// Extract table as simple text representation
|
| 282 |
+
const rows = Array.from(node.querySelectorAll('tr'));
|
| 283 |
+
const tableText = rows.map(row => {
|
| 284 |
+
const cells = Array.from(row.querySelectorAll('th, td'));
|
| 285 |
+
return cells.map(cell => cleanText(cell.textContent)).join(' | ');
|
| 286 |
+
}).join('\n');
|
| 287 |
+
|
| 288 |
+
output.push(tableText + '\n\n');
|
| 289 |
+
return;
|
| 290 |
+
}
|
| 291 |
+
|
| 292 |
+
// Figures (images, embeds)
|
| 293 |
+
if (tag === 'figure') {
|
| 294 |
+
const img = node.querySelector('img');
|
| 295 |
+
const htmlEmbed = node.querySelector('.html-embed, .html-embed--screenshot');
|
| 296 |
+
const imageWrapper = node.querySelector('.image-wrapper');
|
| 297 |
+
const caption = node.querySelector('figcaption');
|
| 298 |
+
|
| 299 |
+
// Skip if it's not really a figure (no img, no embed, no caption)
|
| 300 |
+
if (!img && !htmlEmbed && !imageWrapper && !caption) return;
|
| 301 |
+
|
| 302 |
+
// Try to find the global index from the visual element
|
| 303 |
+
const visualElement = htmlEmbed || imageWrapper || node;
|
| 304 |
+
const globalIndex = elementIndexMap.get(visualElement);
|
| 305 |
+
|
| 306 |
+
if (!globalIndex) return; // Skip if not tracked
|
| 307 |
+
|
| 308 |
+
let name = '';
|
| 309 |
+
let anchor = getAnchor(node);
|
| 310 |
+
let description = '';
|
| 311 |
+
|
| 312 |
+
if (caption) {
|
| 313 |
+
const captionText = stripHtml(caption.innerHTML);
|
| 314 |
+
const parsed = parseCaptionText(captionText, 'Figure');
|
| 315 |
+
name = parsed.name;
|
| 316 |
+
description = parsed.description;
|
| 317 |
+
}
|
| 318 |
+
|
| 319 |
+
// Get image alt text as fallback for description
|
| 320 |
+
if (!description && img?.alt) {
|
| 321 |
+
description = img.alt;
|
| 322 |
+
}
|
| 323 |
+
|
| 324 |
+
// If no name found in caption, generate one with global index (matching filename format)
|
| 325 |
+
if (!name) {
|
| 326 |
+
// Determine type for naming (matches screenshot script naming)
|
| 327 |
+
const type = htmlEmbed ? 'embed' : 'image';
|
| 328 |
+
name = `${type}-${globalIndex}`;
|
| 329 |
+
}
|
| 330 |
+
|
| 331 |
+
// Build the tag: <f> NAME ANCHOR DESCRIPTION </f>
|
| 332 |
+
const parts = [name];
|
| 333 |
+
if (anchor) parts.push(anchor);
|
| 334 |
+
if (description) parts.push(description);
|
| 335 |
+
|
| 336 |
+
output.push(`<f>${parts.join(' | ')}</f>\n\n`);
|
| 337 |
+
return;
|
| 338 |
+
}
|
| 339 |
+
|
| 340 |
+
// Lists
|
| 341 |
+
if (tag === 'ul' || tag === 'ol') {
|
| 342 |
+
const items = Array.from(node.querySelectorAll(':scope > li'));
|
| 343 |
+
items.forEach((item, idx) => {
|
| 344 |
+
const bullet = tag === 'ul' ? '-' : `${idx + 1}.`;
|
| 345 |
+
const text = cleanText(item.textContent);
|
| 346 |
+
output.push(`${bullet} ${text}\n`);
|
| 347 |
+
});
|
| 348 |
+
output.push('\n');
|
| 349 |
+
return;
|
| 350 |
+
}
|
| 351 |
+
|
| 352 |
+
// Blockquotes
|
| 353 |
+
if (tag === 'blockquote') {
|
| 354 |
+
const text = cleanText(node.textContent);
|
| 355 |
+
output.push(`> ${text}\n\n`);
|
| 356 |
+
return;
|
| 357 |
+
}
|
| 358 |
+
|
| 359 |
+
// Notes (Note component and Sidenote)
|
| 360 |
+
if (node.classList?.contains('note') || node.classList?.contains('sidenote')) {
|
| 361 |
+
const title = node.querySelector('.note__title, .note-title')?.textContent || '';
|
| 362 |
+
const content = cleanText(node.textContent);
|
| 363 |
+
|
| 364 |
+
if (title) {
|
| 365 |
+
output.push(`<n>${title} | ${content}</n>\n\n`);
|
| 366 |
+
} else {
|
| 367 |
+
output.push(`<n>${content}</n>\n\n`);
|
| 368 |
+
}
|
| 369 |
+
return;
|
| 370 |
+
}
|
| 371 |
+
|
| 372 |
+
// Recurse through children for unhandled elements
|
| 373 |
+
if (node.children && node.children.length > 0 && !['pre', 'code', 'table', 'figure'].includes(tag)) {
|
| 374 |
+
try {
|
| 375 |
+
Array.from(node.children).forEach(processNode);
|
| 376 |
+
} catch (e) {
|
| 377 |
+
console.error('Error processing children:', e);
|
| 378 |
+
}
|
| 379 |
+
}
|
| 380 |
+
};
|
| 381 |
+
|
| 382 |
+
// Process all direct children of main
|
| 383 |
+
Array.from(main.children).forEach(processNode);
|
| 384 |
+
|
| 385 |
+
// Add metadata about visual elements
|
| 386 |
+
const katexCount = Array.from(main.querySelectorAll('.katex-display')).length;
|
| 387 |
+
if (katexCount > 0) {
|
| 388 |
+
output.push(`\n\n<!-- Visual elements are numbered globally in DOM order (1, 2, 3...) to match exported screenshots -->\n`);
|
| 389 |
+
output.push(`<!-- KaTeX formulas: ${katexCount} formulas exported as N-katex.png where N is the global index -->\n`);
|
| 390 |
+
}
|
| 391 |
+
|
| 392 |
+
return output.join('');
|
| 393 |
+
});
|
| 394 |
+
}
|
| 395 |
+
|
| 396 |
+
async function main() {
|
| 397 |
+
const cwd = process.cwd();
|
| 398 |
+
const args = parseArgs(process.argv);
|
| 399 |
+
|
| 400 |
+
let outFileBase = args.filename || 'article';
|
| 401 |
+
outFileBase = outFileBase.replace(/\.txt$/i, '');
|
| 402 |
+
|
| 403 |
+
// Build only if dist/ does not exist
|
| 404 |
+
const distDir = resolve(cwd, 'dist');
|
| 405 |
+
let hasDist = false;
|
| 406 |
+
try {
|
| 407 |
+
const st = await fs.stat(distDir);
|
| 408 |
+
hasDist = st && st.isDirectory();
|
| 409 |
+
} catch { }
|
| 410 |
+
|
| 411 |
+
if (!hasDist) {
|
| 412 |
+
console.log('> Building Astro site…');
|
| 413 |
+
await run('npm', ['run', 'build']);
|
| 414 |
+
} else {
|
| 415 |
+
console.log('> Skipping build (dist/ exists)…');
|
| 416 |
+
}
|
| 417 |
+
|
| 418 |
+
console.log('> Starting Astro preview…');
|
| 419 |
+
// Capture stdout to detect the actual port used
|
| 420 |
+
let capturedPort = 8080;
|
| 421 |
+
const preview = spawn('npm', ['run', 'preview'], {
|
| 422 |
+
cwd,
|
| 423 |
+
stdio: ['ignore', 'pipe', 'pipe'],
|
| 424 |
+
detached: true
|
| 425 |
+
});
|
| 426 |
+
|
| 427 |
+
// Listen for port in output
|
| 428 |
+
preview.stdout.on('data', (data) => {
|
| 429 |
+
const output = data.toString();
|
| 430 |
+
process.stdout.write(output);
|
| 431 |
+
const match = output.match(/http:\/\/localhost:(\d+)/);
|
| 432 |
+
if (match) {
|
| 433 |
+
capturedPort = parseInt(match[1]);
|
| 434 |
+
}
|
| 435 |
+
});
|
| 436 |
+
|
| 437 |
+
preview.stderr.on('data', (data) => {
|
| 438 |
+
process.stderr.write(data);
|
| 439 |
+
});
|
| 440 |
+
|
| 441 |
+
const previewExit = new Promise((resolvePreview) => {
|
| 442 |
+
preview.on('close', (code, signal) => resolvePreview({ code, signal }));
|
| 443 |
+
});
|
| 444 |
+
|
| 445 |
+
// Wait a bit for the server to start and output the port
|
| 446 |
+
await delay(3000);
|
| 447 |
+
const baseUrl = `http://localhost:${capturedPort}/`;
|
| 448 |
+
|
| 449 |
+
try {
|
| 450 |
+
await waitForServer(baseUrl, 60000);
|
| 451 |
+
console.log('> Server ready, extracting content…');
|
| 452 |
+
|
| 453 |
+
const browser = await chromium.launch({ headless: true });
|
| 454 |
+
try {
|
| 455 |
+
const context = await browser.newContext();
|
| 456 |
+
const page = await context.newPage();
|
| 457 |
+
|
| 458 |
+
// Set viewport
|
| 459 |
+
await page.setViewportSize({ width: 1200, height: 1400 });
|
| 460 |
+
|
| 461 |
+
// Load page (use 'load' instead of 'networkidle' to avoid timeout on heavy pages)
|
| 462 |
+
await page.goto(baseUrl, { waitUntil: 'load', timeout: 60000 });
|
| 463 |
+
|
| 464 |
+
// Wait for content to be ready
|
| 465 |
+
await page.waitForTimeout(3000);
|
| 466 |
+
|
| 467 |
+
// Wait for main content to be present
|
| 468 |
+
await page.waitForSelector('main', { timeout: 10000 });
|
| 469 |
+
|
| 470 |
+
// Get article title for filename
|
| 471 |
+
if (!args.filename) {
|
| 472 |
+
const title = await page.evaluate(() => {
|
| 473 |
+
const h1 = document.querySelector('h1.hero-title');
|
| 474 |
+
const t = h1 ? h1.textContent : document.title;
|
| 475 |
+
return (t || '').replace(/\s+/g, ' ').trim();
|
| 476 |
+
});
|
| 477 |
+
outFileBase = slugify(title);
|
| 478 |
+
}
|
| 479 |
+
|
| 480 |
+
console.log('> Extracting article content…');
|
| 481 |
+
const txtContent = await extractArticleContent(page);
|
| 482 |
+
|
| 483 |
+
// Write output
|
| 484 |
+
const outPath = resolve(cwd, 'dist', `${outFileBase}.txt`);
|
| 485 |
+
await fs.writeFile(outPath, txtContent, 'utf-8');
|
| 486 |
+
console.log(`✅ TXT exported: ${outPath}`);
|
| 487 |
+
|
| 488 |
+
// Copy to public folder
|
| 489 |
+
const publicPath = resolve(cwd, 'public', `${outFileBase}.txt`);
|
| 490 |
+
try {
|
| 491 |
+
await fs.mkdir(resolve(cwd, 'public'), { recursive: true });
|
| 492 |
+
await fs.copyFile(outPath, publicPath);
|
| 493 |
+
console.log(`✅ TXT copied to: ${publicPath}`);
|
| 494 |
+
} catch (e) {
|
| 495 |
+
console.warn('Unable to copy TXT to public/:', e?.message || e);
|
| 496 |
+
}
|
| 497 |
+
|
| 498 |
+
} finally {
|
| 499 |
+
await browser.close();
|
| 500 |
+
}
|
| 501 |
+
} finally {
|
| 502 |
+
// Clean shutdown
|
| 503 |
+
try {
|
| 504 |
+
if (process.platform !== 'win32') {
|
| 505 |
+
try { process.kill(-preview.pid, 'SIGINT'); } catch { }
|
| 506 |
+
}
|
| 507 |
+
try { preview.kill('SIGINT'); } catch { }
|
| 508 |
+
await Promise.race([previewExit, delay(3000)]);
|
| 509 |
+
|
| 510 |
+
if (!preview.killed) {
|
| 511 |
+
try {
|
| 512 |
+
if (process.platform !== 'win32') {
|
| 513 |
+
try { process.kill(-preview.pid, 'SIGKILL'); } catch { }
|
| 514 |
+
}
|
| 515 |
+
try { preview.kill('SIGKILL'); } catch { }
|
| 516 |
+
} catch { }
|
| 517 |
+
await Promise.race([previewExit, delay(1000)]);
|
| 518 |
+
}
|
| 519 |
+
} catch { }
|
| 520 |
+
}
|
| 521 |
+
}
|
| 522 |
+
|
| 523 |
+
main().catch((err) => {
|
| 524 |
+
console.error('❌ Error:', err.message);
|
| 525 |
+
console.error(err);
|
| 526 |
+
process.exit(1);
|
| 527 |
+
});
|
app/scripts/notion-importer/mdx-converter.mjs
CHANGED
|
@@ -670,6 +670,33 @@ function addSpacingAroundComponents(content) {
|
|
| 670 |
return processedContent;
|
| 671 |
}
|
| 672 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 673 |
/**
|
| 674 |
* Fix smart quotes (curly quotes) and replace them with straight quotes
|
| 675 |
* @param {string} content - Markdown content
|
|
@@ -732,6 +759,9 @@ async function processMdxContent(content, pageId = null, notionToken = null, out
|
|
| 732 |
// Fix smart quotes first
|
| 733 |
processedContent = fixSmartQuotes(processedContent);
|
| 734 |
|
|
|
|
|
|
|
|
|
|
| 735 |
// Process external images first (before other transformations)
|
| 736 |
if (outputDir) {
|
| 737 |
// Create a temporary external images directory in the output folder
|
|
|
|
| 670 |
return processedContent;
|
| 671 |
}
|
| 672 |
|
| 673 |
+
/**
|
| 674 |
+
* Escape angle brackets before numbers to prevent MDX parsing errors
|
| 675 |
+
* In MDX, <30B would be interpreted as a JSX element, but element names can't start with numbers
|
| 676 |
+
* @param {string} content - Markdown content
|
| 677 |
+
* @returns {string} - Content with escaped angle brackets
|
| 678 |
+
*/
|
| 679 |
+
function escapeAngleBracketsBeforeNumbers(content) {
|
| 680 |
+
console.log(' 🔧 Escaping angle brackets before numbers...');
|
| 681 |
+
|
| 682 |
+
let fixedCount = 0;
|
| 683 |
+
|
| 684 |
+
// Replace < followed by a digit with < (but not inside code blocks or HTML tags)
|
| 685 |
+
// Pattern: < followed by a digit, not preceded by = (to avoid <=)
|
| 686 |
+
const processed = content.replace(/(<)(\d)/g, (match, bracket, digit) => {
|
| 687 |
+
fixedCount++;
|
| 688 |
+
return `<${digit}`;
|
| 689 |
+
});
|
| 690 |
+
|
| 691 |
+
if (fixedCount > 0) {
|
| 692 |
+
console.log(` ✅ Escaped ${fixedCount} angle bracket(s) before numbers`);
|
| 693 |
+
} else {
|
| 694 |
+
console.log(' ℹ️ No angle brackets before numbers found');
|
| 695 |
+
}
|
| 696 |
+
|
| 697 |
+
return processed;
|
| 698 |
+
}
|
| 699 |
+
|
| 700 |
/**
|
| 701 |
* Fix smart quotes (curly quotes) and replace them with straight quotes
|
| 702 |
* @param {string} content - Markdown content
|
|
|
|
| 759 |
// Fix smart quotes first
|
| 760 |
processedContent = fixSmartQuotes(processedContent);
|
| 761 |
|
| 762 |
+
// Escape angle brackets before numbers (e.g., <30B -> <30B)
|
| 763 |
+
processedContent = escapeAngleBracketsBeforeNumbers(processedContent);
|
| 764 |
+
|
| 765 |
// Process external images first (before other transformations)
|
| 766 |
if (outputDir) {
|
| 767 |
// Create a temporary external images directory in the output folder
|
app/scripts/screenshot-elements.mjs
CHANGED
|
@@ -2,11 +2,16 @@ import { chromium } from 'playwright';
|
|
| 2 |
import { mkdir } from 'fs/promises';
|
| 3 |
import { join } from 'path';
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
const URL = 'http://localhost:4321/?viz=true';
|
| 6 |
const OUTPUT_DIR = './screenshots';
|
| 7 |
const SELECTORS = ['.html-embed', '.table-scroll > table', '.image-wrapper', '.katex-display'];
|
| 8 |
-
const DEVICE_SCALE_FACTOR =
|
| 9 |
const BASE_VIEWPORT = { width: 1200, height: 800 };
|
|
|
|
| 10 |
|
| 11 |
const slugify = (value) =>
|
| 12 |
String(value || '')
|
|
@@ -20,6 +25,7 @@ async function main() {
|
|
| 20 |
await mkdir(OUTPUT_DIR, { recursive: true });
|
| 21 |
|
| 22 |
console.log('🚀 Launching browser...');
|
|
|
|
| 23 |
const browser = await chromium.launch({ headless: true });
|
| 24 |
const context = await browser.newContext({
|
| 25 |
deviceScaleFactor: DEVICE_SCALE_FACTOR,
|
|
@@ -97,7 +103,7 @@ async function main() {
|
|
| 97 |
});
|
| 98 |
|
| 99 |
const slug = slugify(label);
|
| 100 |
-
const baseName = `${i + 1}-${type}${slug ? `--${slug}` : ''}`;
|
| 101 |
const filename = `${baseName}.png`;
|
| 102 |
const filepath = join(OUTPUT_DIR, filename);
|
| 103 |
|
|
@@ -108,7 +114,7 @@ async function main() {
|
|
| 108 |
}
|
| 109 |
|
| 110 |
if (type !== 'table' && type !== 'katex') {
|
| 111 |
-
await element.evaluate((el) => {
|
| 112 |
const stash = (node) => {
|
| 113 |
if (!node || !(node instanceof HTMLElement)) return;
|
| 114 |
node.dataset.__prevStyle = node.getAttribute('style') ?? '';
|
|
@@ -131,19 +137,65 @@ async function main() {
|
|
| 131 |
// Aggressive cleanup only for banners
|
| 132 |
const all = el.querySelectorAll('*');
|
| 133 |
all.forEach((node) => stash(node));
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
const svgRects = el.querySelectorAll('svg rect');
|
| 136 |
-
svgRects.forEach((rect) => {
|
| 137 |
-
rect.
|
| 138 |
-
|
| 139 |
-
|
|
|
|
| 140 |
});
|
| 141 |
}
|
| 142 |
-
});
|
| 143 |
}
|
| 144 |
|
| 145 |
if (type === 'table') {
|
| 146 |
-
const cloneId = await element.evaluate((el, idx) => {
|
| 147 |
const existing = document.getElementById(`__table-clone-wrapper-${idx}`);
|
| 148 |
if (existing) existing.remove();
|
| 149 |
|
|
@@ -176,12 +228,18 @@ async function main() {
|
|
| 176 |
clone.style.minWidth = '0';
|
| 177 |
clone.style.maxWidth = 'none';
|
| 178 |
clone.style.tableLayout = 'auto';
|
|
|
|
|
|
|
|
|
|
| 179 |
|
| 180 |
const cells = clone.querySelectorAll('th, td');
|
| 181 |
cells.forEach(cell => {
|
| 182 |
cell.style.width = 'auto';
|
| 183 |
cell.style.minWidth = '0';
|
| 184 |
cell.style.maxWidth = 'none';
|
|
|
|
|
|
|
|
|
|
| 185 |
});
|
| 186 |
|
| 187 |
tableScroll.appendChild(clone);
|
|
@@ -191,7 +249,7 @@ async function main() {
|
|
| 191 |
document.body.appendChild(wrapper);
|
| 192 |
|
| 193 |
return clone.id;
|
| 194 |
-
}, i);
|
| 195 |
|
| 196 |
const wrapperSelector = `#__table-clone-wrapper-${i}`;
|
| 197 |
const cloneSelector = `#${cloneId}`;
|
|
@@ -213,7 +271,8 @@ async function main() {
|
|
| 213 |
|
| 214 |
await page.locator(cloneSelector).screenshot({
|
| 215 |
path: filepath,
|
| 216 |
-
type: 'png'
|
|
|
|
| 217 |
});
|
| 218 |
|
| 219 |
await page.evaluate((selector) => {
|
|
@@ -221,7 +280,7 @@ async function main() {
|
|
| 221 |
if (el) el.remove();
|
| 222 |
}, wrapperSelector);
|
| 223 |
} else if (type === 'katex') {
|
| 224 |
-
const cloneId = await element.evaluate((el, idx) => {
|
| 225 |
const existing = document.getElementById(`__katex-clone-wrapper-${idx}`);
|
| 226 |
if (existing) existing.remove();
|
| 227 |
|
|
@@ -243,12 +302,22 @@ async function main() {
|
|
| 243 |
clone.style.width = 'max-content';
|
| 244 |
clone.style.maxWidth = 'none';
|
| 245 |
clone.style.margin = '0';
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 246 |
|
| 247 |
wrapper.appendChild(clone);
|
| 248 |
document.body.appendChild(wrapper);
|
| 249 |
|
| 250 |
return clone.id;
|
| 251 |
-
}, i);
|
| 252 |
|
| 253 |
const wrapperSelector = `#__katex-clone-wrapper-${i}`;
|
| 254 |
const cloneSelector = `#${cloneId}`;
|
|
@@ -270,7 +339,8 @@ async function main() {
|
|
| 270 |
|
| 271 |
await page.locator(cloneSelector).screenshot({
|
| 272 |
path: filepath,
|
| 273 |
-
type: 'png'
|
|
|
|
| 274 |
});
|
| 275 |
|
| 276 |
await page.evaluate((selector) => {
|
|
@@ -280,7 +350,8 @@ async function main() {
|
|
| 280 |
} else {
|
| 281 |
await element.screenshot({
|
| 282 |
path: filepath,
|
| 283 |
-
type: 'png'
|
|
|
|
| 284 |
});
|
| 285 |
}
|
| 286 |
|
|
@@ -316,7 +387,7 @@ async function main() {
|
|
| 316 |
});
|
| 317 |
|
| 318 |
await page.waitForTimeout(150);
|
| 319 |
-
await element.screenshot({ path: openFilepath, type: 'png' });
|
| 320 |
console.log(` ✅ ${openFilename}`);
|
| 321 |
|
| 322 |
await selectHandle.evaluate((el) => {
|
|
|
|
| 2 |
import { mkdir } from 'fs/promises';
|
| 3 |
import { join } from 'path';
|
| 4 |
|
| 5 |
+
// Parse CLI arguments
|
| 6 |
+
const args = process.argv.slice(2);
|
| 7 |
+
const TRANSPARENT = args.includes('--transparent');
|
| 8 |
+
|
| 9 |
const URL = 'http://localhost:4321/?viz=true';
|
| 10 |
const OUTPUT_DIR = './screenshots';
|
| 11 |
const SELECTORS = ['.html-embed', '.table-scroll > table', '.image-wrapper', '.katex-display'];
|
| 12 |
+
const DEVICE_SCALE_FACTOR = 4; // 4x for high-quality print
|
| 13 |
const BASE_VIEWPORT = { width: 1200, height: 800 };
|
| 14 |
+
const FILENAME_SUFFIX = TRANSPARENT ? '-transparent' : '';
|
| 15 |
|
| 16 |
const slugify = (value) =>
|
| 17 |
String(value || '')
|
|
|
|
| 25 |
await mkdir(OUTPUT_DIR, { recursive: true });
|
| 26 |
|
| 27 |
console.log('🚀 Launching browser...');
|
| 28 |
+
if (TRANSPARENT) console.log('🔲 Transparent mode enabled (omitBackground: true)');
|
| 29 |
const browser = await chromium.launch({ headless: true });
|
| 30 |
const context = await browser.newContext({
|
| 31 |
deviceScaleFactor: DEVICE_SCALE_FACTOR,
|
|
|
|
| 103 |
});
|
| 104 |
|
| 105 |
const slug = slugify(label);
|
| 106 |
+
const baseName = `${i + 1}-${type}${slug ? `--${slug}` : ''}${FILENAME_SUFFIX}`;
|
| 107 |
const filename = `${baseName}.png`;
|
| 108 |
const filepath = join(OUTPUT_DIR, filename);
|
| 109 |
|
|
|
|
| 114 |
}
|
| 115 |
|
| 116 |
if (type !== 'table' && type !== 'katex') {
|
| 117 |
+
await element.evaluate((el, isTransparent) => {
|
| 118 |
const stash = (node) => {
|
| 119 |
if (!node || !(node instanceof HTMLElement)) return;
|
| 120 |
node.dataset.__prevStyle = node.getAttribute('style') ?? '';
|
|
|
|
| 137 |
// Aggressive cleanup only for banners
|
| 138 |
const all = el.querySelectorAll('*');
|
| 139 |
all.forEach((node) => stash(node));
|
| 140 |
+
}
|
| 141 |
+
|
| 142 |
+
// Also target d3-loss-curves (banner component)
|
| 143 |
+
const lossCurves = el.querySelector('.d3-loss-curves');
|
| 144 |
+
if (lossCurves) {
|
| 145 |
+
lossCurves.style.background = 'transparent';
|
| 146 |
+
lossCurves.style.border = 'none';
|
| 147 |
+
lossCurves.style.borderRadius = '0';
|
| 148 |
+
}
|
| 149 |
+
|
| 150 |
+
// In transparent mode, neutralize backgrounds but preserve UI elements
|
| 151 |
+
if (isTransparent) {
|
| 152 |
+
// Step 1: Save computed backgrounds of UI elements we want to preserve
|
| 153 |
+
const uiSelectors = '.legend, [class*="legend"], .tooltip, [class*="tooltip"], .d3-tooltip, select, button, input, [class*="swatch"], [class*="label"]';
|
| 154 |
+
const uiElements = el.querySelectorAll(uiSelectors);
|
| 155 |
+
const savedStyles = new Map();
|
| 156 |
+
uiElements.forEach((uiEl) => {
|
| 157 |
+
const computed = window.getComputedStyle(uiEl);
|
| 158 |
+
savedStyles.set(uiEl, {
|
| 159 |
+
background: computed.background,
|
| 160 |
+
backgroundColor: computed.backgroundColor
|
| 161 |
+
});
|
| 162 |
+
});
|
| 163 |
+
|
| 164 |
+
// Step 2: Apply transparency to EVERYTHING
|
| 165 |
+
el.style.setProperty('background', 'transparent', 'important');
|
| 166 |
+
el.style.setProperty('background-color', 'transparent', 'important');
|
| 167 |
+
el.style.setProperty('background-image', 'none', 'important');
|
| 168 |
+
|
| 169 |
+
const allElements = el.querySelectorAll('*');
|
| 170 |
+
allElements.forEach((node) => {
|
| 171 |
+
if (node instanceof HTMLElement) {
|
| 172 |
+
node.style.setProperty('background', 'transparent', 'important');
|
| 173 |
+
node.style.setProperty('background-color', 'transparent', 'important');
|
| 174 |
+
node.style.setProperty('background-image', 'none', 'important');
|
| 175 |
+
}
|
| 176 |
+
});
|
| 177 |
|
| 178 |
+
// Step 3: Restore UI elements backgrounds
|
| 179 |
+
savedStyles.forEach((styles, uiEl) => {
|
| 180 |
+
if (styles.backgroundColor && styles.backgroundColor !== 'rgba(0, 0, 0, 0)') {
|
| 181 |
+
uiEl.style.setProperty('background-color', styles.backgroundColor, 'important');
|
| 182 |
+
}
|
| 183 |
+
});
|
| 184 |
+
|
| 185 |
+
// Target SVG rect elements that look like backgrounds
|
| 186 |
const svgRects = el.querySelectorAll('svg rect');
|
| 187 |
+
svgRects.forEach((rect, idx) => {
|
| 188 |
+
const fill = (rect.getAttribute('fill') || '').toLowerCase();
|
| 189 |
+
if (idx === 0 || fill === 'white' || fill.startsWith('#fff') || fill.includes('255, 255, 255')) {
|
| 190 |
+
rect.setAttribute('fill', 'none');
|
| 191 |
+
}
|
| 192 |
});
|
| 193 |
}
|
| 194 |
+
}, TRANSPARENT);
|
| 195 |
}
|
| 196 |
|
| 197 |
if (type === 'table') {
|
| 198 |
+
const cloneId = await element.evaluate((el, idx, isTransparent) => {
|
| 199 |
const existing = document.getElementById(`__table-clone-wrapper-${idx}`);
|
| 200 |
if (existing) existing.remove();
|
| 201 |
|
|
|
|
| 228 |
clone.style.minWidth = '0';
|
| 229 |
clone.style.maxWidth = 'none';
|
| 230 |
clone.style.tableLayout = 'auto';
|
| 231 |
+
if (isTransparent) {
|
| 232 |
+
clone.style.background = 'transparent';
|
| 233 |
+
}
|
| 234 |
|
| 235 |
const cells = clone.querySelectorAll('th, td');
|
| 236 |
cells.forEach(cell => {
|
| 237 |
cell.style.width = 'auto';
|
| 238 |
cell.style.minWidth = '0';
|
| 239 |
cell.style.maxWidth = 'none';
|
| 240 |
+
if (isTransparent) {
|
| 241 |
+
cell.style.background = 'transparent';
|
| 242 |
+
}
|
| 243 |
});
|
| 244 |
|
| 245 |
tableScroll.appendChild(clone);
|
|
|
|
| 249 |
document.body.appendChild(wrapper);
|
| 250 |
|
| 251 |
return clone.id;
|
| 252 |
+
}, i, TRANSPARENT);
|
| 253 |
|
| 254 |
const wrapperSelector = `#__table-clone-wrapper-${i}`;
|
| 255 |
const cloneSelector = `#${cloneId}`;
|
|
|
|
| 271 |
|
| 272 |
await page.locator(cloneSelector).screenshot({
|
| 273 |
path: filepath,
|
| 274 |
+
type: 'png',
|
| 275 |
+
omitBackground: TRANSPARENT
|
| 276 |
});
|
| 277 |
|
| 278 |
await page.evaluate((selector) => {
|
|
|
|
| 280 |
if (el) el.remove();
|
| 281 |
}, wrapperSelector);
|
| 282 |
} else if (type === 'katex') {
|
| 283 |
+
const cloneId = await element.evaluate((el, idx, isTransparent) => {
|
| 284 |
const existing = document.getElementById(`__katex-clone-wrapper-${idx}`);
|
| 285 |
if (existing) existing.remove();
|
| 286 |
|
|
|
|
| 302 |
clone.style.width = 'max-content';
|
| 303 |
clone.style.maxWidth = 'none';
|
| 304 |
clone.style.margin = '0';
|
| 305 |
+
if (isTransparent) {
|
| 306 |
+
clone.style.background = 'transparent';
|
| 307 |
+
// Neutralize white backgrounds in katex elements
|
| 308 |
+
const allElements = clone.querySelectorAll('*');
|
| 309 |
+
allElements.forEach((node) => {
|
| 310 |
+
if (node instanceof HTMLElement) {
|
| 311 |
+
node.style.background = 'transparent';
|
| 312 |
+
}
|
| 313 |
+
});
|
| 314 |
+
}
|
| 315 |
|
| 316 |
wrapper.appendChild(clone);
|
| 317 |
document.body.appendChild(wrapper);
|
| 318 |
|
| 319 |
return clone.id;
|
| 320 |
+
}, i, TRANSPARENT);
|
| 321 |
|
| 322 |
const wrapperSelector = `#__katex-clone-wrapper-${i}`;
|
| 323 |
const cloneSelector = `#${cloneId}`;
|
|
|
|
| 339 |
|
| 340 |
await page.locator(cloneSelector).screenshot({
|
| 341 |
path: filepath,
|
| 342 |
+
type: 'png',
|
| 343 |
+
omitBackground: TRANSPARENT
|
| 344 |
});
|
| 345 |
|
| 346 |
await page.evaluate((selector) => {
|
|
|
|
| 350 |
} else {
|
| 351 |
await element.screenshot({
|
| 352 |
path: filepath,
|
| 353 |
+
type: 'png',
|
| 354 |
+
omitBackground: TRANSPARENT
|
| 355 |
});
|
| 356 |
}
|
| 357 |
|
|
|
|
| 387 |
});
|
| 388 |
|
| 389 |
await page.waitForTimeout(150);
|
| 390 |
+
await element.screenshot({ path: openFilepath, type: 'png', omitBackground: TRANSPARENT });
|
| 391 |
console.log(` ✅ ${openFilename}`);
|
| 392 |
|
| 393 |
await selectHandle.evaluate((el) => {
|
app/src/content/article.mdx
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
app/src/pages/dataviz.astro
CHANGED
|
@@ -247,7 +247,7 @@ const visualsWithMeta = visuals.map((item: any) => {
|
|
| 247 |
<p class="header-desc">{item.desc || item.caption}</p>
|
| 248 |
)}
|
| 249 |
{item.anchorId && (
|
| 250 |
-
<a href={`/#${item.anchorId}`} class="header-link"
|
| 251 |
View in article →
|
| 252 |
</a>
|
| 253 |
)}
|
|
|
|
| 247 |
<p class="header-desc">{item.desc || item.caption}</p>
|
| 248 |
)}
|
| 249 |
{item.anchorId && (
|
| 250 |
+
<a href={`/#${item.anchorId}`} class="header-link">
|
| 251 |
View in article →
|
| 252 |
</a>
|
| 253 |
)}
|
app/src/pages/index.astro
CHANGED
|
@@ -197,6 +197,36 @@ const licence =
|
|
| 197 |
} catch {}
|
| 198 |
})();
|
| 199 |
</script>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
<script type="module" src="/scripts/color-palettes.js"></script>
|
| 201 |
|
| 202 |
<script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js"></script>
|
|
|
|
| 197 |
} catch {}
|
| 198 |
})();
|
| 199 |
</script>
|
| 200 |
+
<!-- Hash Router for HF Spaces compatibility -->
|
| 201 |
+
<script is:inline>
|
| 202 |
+
(() => {
|
| 203 |
+
// Routes map: #/route -> actual page path
|
| 204 |
+
const routes = {
|
| 205 |
+
'/dataviz': '/dataviz',
|
| 206 |
+
'/trackio': '/trackio',
|
| 207 |
+
};
|
| 208 |
+
|
| 209 |
+
function handleHashRoute() {
|
| 210 |
+
const hash = window.location.hash;
|
| 211 |
+
// Only handle hashes that start with #/ (route pattern)
|
| 212 |
+
if (!hash.startsWith('#/')) return;
|
| 213 |
+
|
| 214 |
+
const route = hash.slice(1); // Remove the # prefix
|
| 215 |
+
const targetPath = routes[route];
|
| 216 |
+
|
| 217 |
+
if (targetPath) {
|
| 218 |
+
// Redirect to the actual page
|
| 219 |
+
window.location.href = targetPath;
|
| 220 |
+
}
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
// Check on page load
|
| 224 |
+
handleHashRoute();
|
| 225 |
+
|
| 226 |
+
// Also listen for hash changes (in case user navigates via hash)
|
| 227 |
+
window.addEventListener('hashchange', handleHashRoute);
|
| 228 |
+
})();
|
| 229 |
+
</script>
|
| 230 |
<script type="module" src="/scripts/color-palettes.js"></script>
|
| 231 |
|
| 232 |
<script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js"></script>
|
app/yarn.lock
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|