Giorgio Pallocca

205 posts

Giorgio Pallocca

@GPallocca

SaaS Builder, serial enterpreneur, 2 exits. Currently building Archimedes, cursor for ERP building.

Roma, Lazio Katılım Mart 2013

715 Takip Edilen273 Takipçiler

Sabitlenmiş Tweet

Giorgio Pallocca@GPallocca·4d

At Archimedes we push AI code generation to the daily limit — dozens of parallel Claude sessions across the team, real production modules. At that scale you hit a wall nobody talks about. Here's what we found and what we built 👇

English

235

Giorgio Pallocca@GPallocca·1d

What's the best path here ?

Giorgio Pallocca@GPallocca

Just closed a six-figure deal for our AI-native ERP. €150k. First enterprise contract. Here's the part nobody prepares you for: two thirds of the contract is services. Implementation, training, stitching with legacy systems. €100k of work that has nothing to do with building the product. The founder dilemma: Do it in-house — cash in, reduce burn, delay the next raise. But your engineering team is now doing consulting instead of shipping product. Subcontract it out — stay focused on core product, ship faster, keep the roadmap clean. But you give away most of the margin and lose control of the customer experience. Genuinely torn on this one. What would you do?

English

Giorgio Pallocca@GPallocca·1d

The methodological choice in this paper from the university of Munich released yesterday, should rewire how you evaluate LLMs in production: the surface fidelity gap. It measures the distance between how on-task a model sounds and whether it actually respected the brief. Llama-70B has the largest gap. +0.80. The output reads aligned. The constraints are gone. The knows-but-violates rate measures non-compliance despite the model correctly recalling the rule. It ranges from 8% to 99% across seven models. Structured checkpointing reduces it. It does not close the gap. For anyone designing business processes on top of LLMs, the lesson is methodological. Reading the output is not evaluation. Asking the model 'what were the rules?' is not evaluation. Evaluation is checking the output against a structured brief with binary-decidable constraints. That was the only mechanism the paper found that reliably catches violations. If your QA layer is a human skim or a model self-check, you are measuring how aligned the output sounds. Not whether it complied.

English

Giorgio Pallocca@GPallocca·1d

@PawelHuryn exactly

English

Paweł Huryn@PawelHuryn·1d

@GPallocca 100% if you mean building a meta layer. Way more important than agents doing the job.

English

Paweł Huryn@PawelHuryn·1d

Two camps of people. One rebuilt how they work: delegation, evals, trust. Agents run the loop. The system compounds. 10-20x output. Others stay in the chat. The old workflow. Same bottlenecks. Diminishing returns. For anyone willing to change how they work, there's never been more opportunity.

Sam Altman@sama

we want to build tools to augment and elevate people, not entities to replace them.

English

2.3K

Giorgio Pallocca@GPallocca·1d

@levelsio The real question is who's watching what they shipped while you slept

English

688

@levelsio@levelsio·1d

Another reason why I run Claude Code on the VPS of the site I'm working on directly in production I can close my laptop and it keeps going!

roon@tszzl

people are walking around with their laptops slightly ajar to keep their agents running

English

1.1K

133.6K

Giorgio Pallocca@GPallocca·1d

@dan__rosenthal Layer 5 is the one that matters most and the one everyone will skip. We built an entire quality pipeline for the same reason — agents produce at 10x speed but nobody catches the complexity debt they leave behind until it breaks in production

English

532

Dan Rosenthal@dan__rosenthal·1d

‘Service-as-a-software’ is here... We moved our entire company brain to GitHub and wired 25+ tools through MCPs. Any one of our 20+ team members can now spin up a contextualized AI assistant in seconds. The system has 5 layers: 1. Markdown company OS ↳ SOPs and campaign playbooks converted into .md files using research agents ↳ Most SOPs turned into agents that handle 70% of the task ↳ Output: 50+ actionable Claude skills 2. Context environment ↳ One Company OS GitHub repo propagated to every session via org-wide plugin ↳ Each client gets their own repo with Slack DMs, call transcripts, GDrive changes, and campaign data auto-synced through n8n ↳ Zero configuration needed per session 3. MCPs ↳ 25+ tools connected including InstantlyAI, HeyReach, Apollo, HubSpot, Slack, Notion, n8n, Supabase, Pinecone, Browserbase, Apify ↳ Not just research. Action through AI. ↳ We went from researching work to actually doing it 4. Self-improvement engines ↳ Pinecone database stores 1000s of LinkedIn posts and outbound campaigns with performance metrics ↳ Copywriting skills query this data to find winning formats to reuse ↳ Human corrections get fed back in so the system gets sharper over time 5. Operating principles ↳ Every repo has a safeguard file that prevents certain operations ↳ 100% AI outputs are not acceptable, everyone owns their work and every mistake ↳ Agent swarms split one task into 5-20 sub-agents when needed Our goal is to become the most advanced AI-native services company for our niche (GTM).

English

80.2K

Giorgio Pallocca@GPallocca·1d

@Yoroomie so much to learn yet

English

Yoroomie@Yoroomie·1d

@GPallocca Sounds like you know this well!

English

Yoroomie@Yoroomie·1d

There’s never a boring day when you’re building a marketplace.

English

965

Giorgio Pallocca@GPallocca·1d

@antirez 11k system prompt is prompt obesity. Same disease as the 2000-line God class, just a different layer. We'll end up building linters for prompts the way we built them for code

English

202

antirez@antirez·1d

Look at this. Also opencode uses freaking 11k tokens of system prompt. Even at decent pre-fill of ~130 t/s it means waiting 84 seconds to start a session. What's the point? :D The pi agent is a lot saner here. Moreover, one could say, let's cache on disk very long common KV cache chunks, no? Hash it with all the parameters and put a sensible TTL if not used. But also: only cache it if you see it repeated N times across different sessions.

English

340

42.4K

Giorgio Pallocca@GPallocca·1d

@businessbarista The moat isn't the tech anymore — it's the boring stuff nobody wants to build. Tax compliance, e-invoicing mandates, labor law across jurisdictions. AI made building easy. Regulation made it defensible

English

136

Alex Lieberman@businessbarista·1d

I don't envy preseed founders right now. It's such a fun time to build but it's also hard as hell. Moats are fewer and shorter than they've ever been. Which also means you need to be more agile & pivot faster than ever. Hard to do when you're underresourced & underfunded. Huge kudos to those battling in the arena given the climate.

English

218

16.2K

Giorgio Pallocca@GPallocca·1d

@NateMatherson Managing agents is still managing. We had to build an entire quality pipeline just to catch what our AI sessions were shipping too fast. Agents don't need 1:1s — they need instrumentation

English

Nate Matherson@NateMatherson·1d

I don’t doubt that AI will change work. I do doubt that the guy tweeting “agents > employees” has ever managed either.

English

143

Giorgio Pallocca@GPallocca·1d

@james406 Still not enough context to fit a single enterprise client's "small list of changes"

English

james hawkins@james406·1d

it's wild to think about how massive 1M token context windows in LLMs really are that's roughly equivalent to: - the complete works of Shakespeare - 11 hours of audio - VCs telling you how they got into investing

English

4.7K

Giorgio Pallocca@GPallocca·1d

@dunkhippo33 The hardest version of this is when the shiny object is revenue. A big services contract lands and suddenly you're choosing between cash now and product later. Staying the course means saying no to money that's already on the table

English

Elizabeth Yin 💛@dunkhippo33·1d

It can be really easy to chase shiny new objects, especially when things are not going "rocket ship style." The decision to stay the course or to change is something that is often very hard.

English

1.1K

Giorgio Pallocca@GPallocca·1d

The services vs software tension is real from the other side too. We just closed a six-figure ERP deal — two thirds of it is implementation services. The temptation is to do it all in-house for the cash. But every hour your engineers spend on services is an hour they're not shipping product. The hard part isn't choosing — it's admitting which business you're actually building

English

155

Alex Vacca@itsalexvacca·1d

x.com/i/article/2050…

ZXX

347

38.1K

Giorgio Pallocca@GPallocca·1d

@gillianxobrien The best companies don't fit a category — they create one. VCs who need you to fit a box are telling you more about their limits than yours

English

gillian@gillianxobrien·1d

too b2b for NY too consumer for SF

English

106

7.6K

Giorgio Pallocca@GPallocca·1d

@ConnerBean ok let's give Codex another chance ...

English

Conner Bean@ConnerBean·1d

I still can't believe how good GPT5.5 is. It blows Opus clear outta the water Codex desktop app is clean too 😮‍💨

English

125

2.9K

Giorgio Pallocca@GPallocca·1d

@SergioRocks We're living this. Dozens of Claude sessions ship code faster than ever — but the real engineering moved to instrumentation. Who's watching output? Who catches the complexity drift across sessions that don't know about each other? The building got easier. The seeing got harder

English

Sergio Pereira@SergioRocks·1d

It’s easy to think that AI is reducing Engineering work. In reality, it’s multiplying it. Because the barrier to building software products just dropped: - More founders are building products - More teams are launching tools - More ideas are getting shipped This creates a whole new wave. Every one of those products will need: - Fixing - Scaling - Improving - Maintaining That’s where the real work is. The key difference is: - Less effort in getting started. - More demand in making things actually work in prod. Software Engineers who can step into that second phase will have more opportunities than before.

English

1.9K

Giorgio Pallocca@GPallocca·1d

@girdley The hardest part is admitting when the wind changed. Most founders feel it six months before they accept it. The ones who move fast sell at the top. The ones who "give it one more quarter" sell at the bottom

English

Michael Girdley@girdley·1d

Nothing feels better than owning a business with major tailwinds that’s run well. Nothing feels worse than owning a business with major headwinds that run well. The former kills it. The latter struggles. When your tailwinds turn to headwinds, it’s time to get out.

English

6.4K

Giorgio Pallocca@GPallocca·1d

@mogulinfluence Point 8 is the one most people will skip: "recurring only works if people use it daily." This is why we built Archimedes as a daily operating system, not a monthly reporting tool. If your users don't open it every morning, they'll cancel. Simple as that

English

295

Founder Thoughts & Strategies@mogulinfluence·1d

A $ 300M founder just shared the 10 business models that will actually print money in 2026:

Founder Thoughts & Strategies tweet media

Brian Moran@realbrianmoran

x.com/i/article/2020…

English

228

45.7K

Giorgio Pallocca@GPallocca·1d

@uttkarsh_42 The "let's pilot first" tax is real. Every enterprise deal starts with a month of free work disguised as "evaluation". The ones who survive it are the ones who built the product to sell itself

English

Utkarsh Agrawal@uttkarsh_42·1d

Thread 1/4 Building real B2B SaaS with just 2 founders is brutal. No one trusts a new vendor. Every deal starts with let’s pilot for atleast a month which means your product has to be rock solid from day 1

English

Keşfet

@PawelHuryn @levelsio @dan__rosenthal @Yoroomie @antirez @businessbarista @NateMatherson @james406