@brendt_gd Talking about HtmlDocument? DOMDocument is full of security holes, corruption, and cannot parse HTML (only XML).
WordPress’ HTML API was built as a performant, streaming, compliant HTML parser easy to embed without WordPress.
Happy to chat if you are interested.
So, I may have to write a new HTML parser because PHP's built-in one is too strict. (It follows the spec just fine, it's just that tempest/view adds on top of HTML, and the parser can't deal with it).
How hard could it be?
🌊 "We're so used to WordPress development being a drop in a massive ocean, but with Playground, *we are* the ocean. We get to choose everything that goes out to our customers," said @dmsnell23, sharing unique opportunities with Playground at #WCUS. 💻✨
#WordPress#PlayGround
@laxmariappan@phpcampers Thanks for clarifying. Exciting stuff coming soon in the HTML Processor, doing what PHP 8.4's Dom\HTMLDocument won't even be able to do: fully isolate inner HTML
@dmsnell23@phpcampers The meme was intended to pique curiosity.
I'm glad it got noticed by you!
Your examples are great, adding them to the list of resources.
Even spec-compliant #html parsers can "break." HTML cannot represent all possible (even invalid) DOM trees. #wordpress' #htmlapi speaks HTML and ensures full encapsulation and isolation when manipulating a document so this doesn't happen.
You can't parse #HTML with #XML. Because HTML can't be parsed by XML. XML is not a tool that can be used to correctly parse HTML. The use of XML will not allow you to consume HTML. XML is a tool that is insufficiently sophisticated.
Have you tried using an HTML parser instead?
@HowellsMead@nickmdiego It's a repurposed XML parser (cannot parse HTML properly) and unaware of HTML's rules. it's vulnerable to many attacks based on these omissions. it detects tags inside TEXTAREA, it's missing hundreds of character references, unaware of TEMPLATEs, removes content, and much more.
Building block extensions in WordPress can be a lot of fun. Here's my latest experiment, linked Group blocks. ✨
How I built it, the code, and a Playground demo are available here: nickdiego.com/enabling-linke…
@rossmorsali@stevejonesdev@nickmdiego@BlockVisibility oh I could totally see that adding latency. we're exploring adding a final HTML-processing pass in WordPress.
also: the HTML API doesn't _yet_ extend documents but it's designed to stream, meaning output buffering with a callback and no additional latency.
@dmsnell23@stevejonesdev@nickmdiego@BlockVisibility Up against a deadline but I'll open a ticket with test case after - I've set a reminder.
My exp was that it added around 200ms of overhead, but I was parsing fully rendered pages after capturing the content with output buffering, which I don't think is the standard use case :/
Sitting on an airplane deleting incomprehensible regex in @BlockVisibility and using the WordPress HTML API instead. Continually amazed by this API.
42 → 7 lines of code 🤯
Did you know that HTML API is built from 0, specially for #WordPress, and there's no other framework or CMS that has anything similar? They all still use regex 🥲 There's also an interesting effect on performance 👯
I'll show you its magic in two weeks at @PHPSrbija#PHPSRB 🥰