Optimizing LLM tokens by not using JSON

Over the last couple of months we have had a rather interesting finding when it comes to working with LLMs: sometimes they don't like JSON.

In one specific instance we were able to experience GPT4o blatantly ignoring parts of a JSON array:

[
    "entry 1",
     "entry 2",
     // ...about 100 more entries here...
]

As you can see, we only had a couple of entries.

Even with the instructions added, we didn't have a ton of token usage. Barely reaching 10k. So any "context window is stretched too thin" effects surely could not be the cause.

The surprising workaround

I really couldn't find anything useful on this by other people. My available time to spend on this was also quite short. So, I simply tried different things:

Creating a 'human' list (an unsorted list using Markdown formatting)
Newline separated list
Semicolons

Imagine the look on my face when the first two did not work, but the third idea did:

entry 1; entry 2; entry 3; # …

Not only did it work, it also used the least tokens.

So, the next time you simply throw JSON into a prompt, think again and try another embedding method. It might lead to a higher success rate.

Optimizing LLM tokens by not using JSON

The surprising workaround

Andere Beiträge

pylibmc and the dreaded "memcached.h file not found" error

Tinybird, JSON & JavaScript's bigint