Table of Contents
Overview
This is your complete toolkit for managing Chia news content, built to transition from manual HTML editing to a modern JSON-based system while maintaining your current workflow.
2,746+
Total Posts
129+
Weeks Covered
95.4%
Data Quality
100%
Completion Rate
Project Structure
/mnt/artoo/e/DEV/twic-parser/
|
+-- Data Files
| +-- chia_news_cleaned.json # Your main dataset (validated & cleaned)
| +-- chia_news.json # Original parsed data
| +-- test_data.json # Safe testing environment
| +-- chia_news_validation_report.json
|
+-- Core Tools
| +-- tweet-converter.js # Convert single tweets to JSON
| +-- batch-tweets.js # Batch processing & interactive mode + HTML gen
| +-- json-manager.js # Manage & clean JSON data
| +-- validate.js # Data quality validation
| +-- cleanup.js # Fix data quality issues
|
+-- Development
| +-- parser.js # Original HTML parser
| +-- index.html # Current production site
|
+-- Node Dependencies
+-- node_modules/ # (not needed for basic tools)
Tool Reference Guide
π§ Tweet Converter (tweet-converter.js)
Purpose: Convert individual Twitter/X URLs into your JSON format
β‘ Basic Usage
# Convert single tweet
node tweet-converter.js https://x.com/username/status/1234567890
# With custom JSON file
node tweet-converter.js https://x.com/username/status/1234567890 my_data.json
β¨ What It Does
- Fetches tweet data via Twitter oEmbed API
- Fallback HTML scraping if oEmbed fails
- Resolves t.co shortened URLs to real destinations
- Smart categorization (Space/Video/News/Release/etc.)
- Extracts author info, mentions, links, topics
- Auto-detects content type and assigns proper metadata
β‘ Batch Tweet Processor (batch-tweets.js) β NEW FEATURES
Purpose: Process multiple tweets efficiently with interactive workflow + HTML generation
π Interactive Mode RECOMMENDED
node batch-tweets.js interactive
Workflow:
Paste tweet URLs as you find them
Each gets automatically converted and added
π HTML code generated for current website
Copy/paste HTML directly into your site
Type
stats
to see current totalsType
quit
when doneπ§ͺ Test Mode π§ͺ NEW
node batch-tweets.js test
Perfect for:
- Testing HTML generation without affecting production data
- Experimenting with different tweet types
- Verifying formatting before going live
- Uses separate
test_data.json
file
π NEW HTML Generation Features:
- Real-time HTML output for current website format
- Quote tweet detection - converts URLs to "Quote Tweet" links
- Smart link formatting (Watch/Listen/Read article/etc.)
- Category icon display with proper tooltips
- Author role handling (Hosted by vs regular attribution)
- HTML entity escaping (prevents double-encoding)
- Copy/paste ready format matching your current site
π JSON Manager (json-manager.js)
Purpose: Manage, clean, and organize your JSON data
β‘ Quick Commands
# List recent posts
node json-manager.js list 20
# Search posts
node json-manager.js search "chia wallet"
# Remove specific post
node json-manager.js remove 2025-06-17-123
# Remove multiple posts
node json-manager.js bulk 2025-06-17-123,2025-06-16-456
# Show statistics
node json-manager.js stats
π‘οΈ Safety Features:
- Automatic backups before saving
- Changes not saved until you confirm
- Detailed preview before removal
- Bulk operations with confirmation
β Data Validator (validate.js)
Purpose: Check data quality and identify issues
# Validate your main dataset
node validate.js chia_news_cleaned.json
# Validate with detailed reporting
node validate.js chia_news.json
π What It Checks
- JSON structure integrity
- Required fields presence
- Data consistency across weeks
- Author information completeness
- Link and mention formatting
- Category and type validation
π§Ή Data Cleanup (cleanup.js)
Purpose: Automatically fix common data quality issues
# Analyze warning patterns
node cleanup.js analyze chia_news.json
# Clean the data automatically
node cleanup.js clean chia_news.json
π Results: Input: 204 warnings β Output: 23 warnings
π― Health Score: Improved from 0.0 to 95.4/100
π― Health Score: Improved from 0.0 to 95.4/100
Daily Workflow Recommendations
π± For Regular News Posting (UPDATED)
β‘ Option 1: Interactive Mode with HTML β Fastest
cd /mnt/artoo/e/DEV/twic-parser
node batch-tweets.js interactive
- Paste URLs throughout the day as you find them
- Copy HTML output directly to current website
- JSON data builds automatically for future site
π§ͺ Option 2: Test Mode for Experimentation π§ͺ
node batch-tweets.js test
- Safe environment to test HTML formatting
- Try different tweet types without affecting production
- Perfect for learning the system
π Option 3: Batch at End of Day
# Collect URLs in a file during the day
# Process all at once:
node batch-tweets.js file todays_tweets.txt
- Collect URLs in a file during the day
- Process all at once at the end
- Good for organized workflow
π§ For Data Maintenance
π Weekly Validation
# Check data health
node validate.js chia_news_cleaned.json
# Clean if needed
node cleanup.js clean chia_news_cleaned.json
ποΈ Monthly Cleanup
# Interactive cleanup session
node json-manager.js interactive
# Use: list, search, remove as needed
HTML Generation Features NEW
β¨ What Gets Generated
- π― Category Icons: ππ with proper tooltips
- π Post Types: Special styling for X Spaces, Releases, etc.
- π€ Author Attribution: "Hosted by" for spaces, regular for others
- π Quote Tweets: Plain URLs β "Quote Tweet" links
- π¨ Smart Links: Watch/Listen/Read article based on destination
- π‘οΈ Proper Escaping: Prevents HTML entity issues (like —)
π HTML Output Examples
π X Space
<li class='post'><span title="Community">π</span> <span title="Space">π</span> <span style='color:#8C52FE'>X Space</span> - Hosted by <a href='https://x.com/DracattusDev' target='_blank'>@DracattusDev</a> "Weekly Chia discussion". β’ <a href='https://example.com' target='_blank'>Listen</a> β’ <a href='https://x.com/source' target='_blank'>Source</a></li>
π’ Announcement with Quote Tweet
<li class='post'><span title="Chia">π±</span> <span title="News">π’</span> <a href='https://x.com/chia_project' target='_blank'>@chia_project</a> announces new partnership! <a href='https://x.com/partner/status/123' target='_blank'>Quote Tweet</a> for details. β’ <a href='https://x.com/chia_project/status/456' target='_blank'>Source</a></li>
βΆοΈ Video Release
<li class='post'><span title="Community">π</span> <span title="Video">βΆοΈ</span> <a href='https://x.com/creator' target='_blank'>@creator</a> releases new tutorial video! β’ <a href='https://youtube.com/watch?v=123' target='_blank'>Watch</a> β’ <a href='https://x.com/creator/status/789' target='_blank'>Source</a></li>
Data Quality Metrics
π Current Status
2,746+
Total Posts
95.4/100
Health Score
100%
Success Rate
π― Quality Indicators
- π’ Green (90-100): Production ready
- π‘ Yellow (70-89): Minor issues, still usable
- π΄ Red (0-69): Requires cleanup before use
Troubleshooting Guide
π§ Common Issues & Solutions
β "Failed to convert tweet: All fetch methods failed"
- π Cause: Twitter API rate limiting or network issues
- β Solution: Tool automatically retries 2-3 times
- π οΈ Manual: Wait 10 seconds, try again
- π Batch Mode: Failed URLs saved to reprocessable log file
β οΈ Quote tweets showing duplicate links
- π Cause: Fixed in latest version
- β Solution: Update to latest batch-tweets.js
- π Result: Only shows "Quote Tweet" link, not duplicate
β οΈ HTML entities showing as —
- π Cause: Fixed in latest version
- β Solution: Proper entity handling now implemented
- π Result: Clean — in HTML output
π Network Issues & Error Recovery
# Check error log files (auto-generated)
ls -la failed_tweets_*.txt
# Reprocess failed URLs
node batch-tweets.js file failed_tweets_2025-06-20T15-30-45-123Z.txt
# Test connectivity
curl -I https://publish.twitter.com
curl -I https://x.com
Advanced Usage
π§ͺ Working with Test Mode
# Start clean testing session
node batch-tweets.js test
# Test with custom file
node batch-tweets.js test my_experiments.json
# Commands available in test mode:
# clear - Reset test data
# stats - View test statistics
# quit - Exit test mode
π Batch Error Recovery
# After batch processing, check for error logs
ls -la failed_tweets_*.txt
# Reprocess failed URLs (file format is ready-to-use)
node batch-tweets.js file failed_tweets_2025-06-20T15-30-45-123Z.txt
# Or manually retry individual URLs
node batch-tweets.js interactive
# Then paste the failed URLs one by one
π Integration with Current Site
π NEW WORKFLOW:
- Use interactive mode to get both JSON + HTML
- Copy HTML to current website immediately
- JSON builds automatically for future site
- Best of both worlds!
π Data Flow (UPDATED)
Twitter URL β tweet-converter.js β JSON Entry β batch-tweets.js β HTML + JSON
β
Current Website β Copy/Paste HTML
β
chia_news_cleaned.json β Future Site Data
β‘ Performance Metrics
Operation | Time | Notes |
---|---|---|
Single tweet | ~2-3 seconds | Including URL resolution |
With retries | ~4-8 seconds max | For problematic URLs |
Batch processing | ~2-4 seconds per tweet | With rate limiting |
HTML generation | Near-instant | No network calls |
JSON operations | Near-instant | In-memory processing |
Validation | ~1-2 seconds | For 2,746+ posts |
πΊοΈ Development Roadmap
β COMPLETED
- HTML to JSON parser
- Tweet converter with URL resolution
- Batch processing tools
- Data validation & cleanup
- JSON management utilities
- HTML generation for current website
- Test mode for safe experimentation
- Retry logic and error handling
- Quote tweet link conversion
π§ IN PROGRESS
- Documentation updates (this file!)
π PLANNED
- Twitter List webapp (keyboard-driven workflow)
- Manual entry tool (web form)
- Production server deployment
- Weekly/monthly automated reports
π― Success Metrics
You'll know the tools are working well when:
- Daily posting takes < 30 seconds per tweet
- HTML generated instantly for current website
- Data quality stays > 90% health score
- No manual JSON editing required
- No manual HTML coding required
- Backups created automatically
- Search finds content quickly
- Failed URLs automatically retried and logged
- Test mode experiments safely
π What's New in This Version
π HTML Generation System
- Real-time HTML output matching your current website format
- Smart quote tweet detection and link conversion
- Proper HTML entity handling (no more encoding issues)
- Category icons with tooltips
- Author role detection (Hosted by vs regular)
π§ͺ Test Mode
- Safe experimentation environment (
test_data.json
) - All interactive features work without affecting production
- Clear command to reset test data
- Visual indicators for test vs production mode
π Retry Logic & Error Handling
- Automatic retries for network failures
- Error logging with reprocessable URL files
- Detailed progress reporting during retries
- Success/failure statistics with actionable next steps
β¨ Quality of Life Improvements
- Better error messages with specific solutions
- Automatic backup creation
- Improved performance with rate limiting
- Cross-platform compatibility