Of Ants and Turmites - Output interpretation | AI, Programming, and Systems - With some big data, small data, mislabeled data

Of Ants and Turmites - Output interpretation

21 March 2021

Presentation of a turmite’s output is a matter of interpretation. My turmite lives on a grid of numbers. There certainly is interest in this for fans of mathematics, data science, computer science, and numbers geekery, but alternative interpretation approaches are needed for other people to appreciate the elegance of my tiny, digital friends.

In this post, I’ll touch upon:

In memory representations
Approaches to saving raw data
Numerical conversions to support image production
Color selection approaches I’ve used

Why bother with different representations or bother changing representation? Ultimately I’m in search of pretty pictures, but building up to there it’s data in use, data at rest, and finally simply an interpretation of the data through some sort of one-way function. For each turmite (aka instruction set), there are a lot of possible pictures for each raw output. Oh and since I use max-stepcount as a scaling factor for most final conversions, shorter vs longer runs can expose different characteristics or have different aesthetics.

In-Memory Representations

Let’s start off by acknowledging that none of this has to be fancy. My alterations to the turmites have been expansions past the basic setup and my descriptions of further expansions are pure “possibility thinking”. I haven’t implemented all the wild ideas I’ve proposed. Likewise, your in-memory representation can just be a multi-dimensional array of ints. Perfectly viable, absolutely functional. You can use that and focus your attention (as I have) on another aspect you find more interesting. In fact, below is a sample from one of my favorite seeds. The data was in a multi-dimensional array when in memory and I output it as a CSV:

	\(C_{251}\)	\(C_{252}\)	\(C_{253}\)	\(C_{254}\)	\(C_{255}\)	\(C_{256}\)	\(C_{257}\)	\(C_{258}\)
\(R_{334}\)	1	1	1	1	1	2	1	1
\(R_{335}\)	1	0	1	0	1	0	3	0
\(R_{336}\)	1	1	1	1	1	2	1	1
\(R_{337}\)	2	0	2	0	2	0	3	0
\(R_{338}\)	1	1	1	2	1	1	1	1
\(R_{339}\)	5	0	2	0	5	0	5	0
\(R_{340}\)	1	1	1	2	1	1	1	2
\(R_{341}\)	3	0	2	0	5	0	2	0

MD Array of integers

So what do I have? I’ve got a multi-dimensional array of ints. Each element holds the value as the turmite has left it. The first expansion past that which I’m considering is a 3rd dimension to hold things like the true number of times the turmite has stepped there and perhaps a pure “step counter” which would be an ordered list of timesteps at which the turmite stepped onto the cell. This would make the 3rd dimension jagged, which produces kind of a neat “landscape topography”. This could form another component of the element state. Maybe the turmite likes/doesn’t like walking uphill?

MD Array of Objects

Want to expand the data you hold without getting so clever you can’t figure out what the indexes mean anymore? Go with objects. Again, it doesn’t have to be fancy, but with a solution like this (perhaps substitute dictionary for object?) you can provide some clarity by naming the attributes.

Sparse Matrix

Sparse matrices are an interesting data structure/algorithm combination. They’re worth Googling if you are so inclined, but the basic idea is that you only store matrix data for the filled parts of a matrix. “Empty” elements are “skipped”. The turmite’s default field is 2D squares which will start out completely blank (aka have a zero if you do it like me). At the end of any given run, most of the squares will still be blank. RAM will likely not be a concern as you run this, however if you really want to reduce the memory footprint you could use a sparse matrix. SciPy has several implementations to meet your needs if you’re in Python.

Dictionary

This follows the sparse matrix idea. An empty field is just the truth for a turmite’s playing field. A dictionary approach allows constant time lookups and modification of the used cells while reducing the memory usage. There’s another benefit though: unlike starting in the center of a pre-defined field, the dictionary will also allow the field to expand in any direction automatically and with no burden to memory nor require a recreation of the field. Further: min and max used coordinate is trivially retrieved, so image production can be targeted and reduce unused whitespace.

Approaches to Saving Raw Data

Flat File

My approach to file saving has been to use a base-10 version of the instruction set as the file name and save the data in a flat-file format. The size isn’t so large that I need to take a saner solution plus I can peek in easily for debugging. Since inception, I’ve kept the data in CSVs. Perfect match to the multi-dimensional array I’m keeping in memory, it’s easily readable, interpretable, and transferable.

I could use JSON just as easily. In fact, any addition of metadata from the current point means I’d have to either save the metadata to a separate file or use JSON if I wanted to keep it in flat-file format.

Object file

I could save the data in a binary format. I’ll be reimplementing the code and porting it to a few languages. If I should have a need to save the raw output at all, I may go in this direction.

Don’t save at all

Not a bad option either. Depends how broadly you want to examine a set of instructions or a certain configuration of instruction or turmite options. This is a standard case of storage cost vs compute cost. The compute cost isn’t high, but if you want to check a bunch of visual transforms across a massive swath of “output”, it’s certainly faster to have the outputs precomputed. Ultimately, the compute cost is low enough to skip saving the raw output. 250k steps for each of the 65k turmites was completed in my C# implementation in 40 minutes (early termination allowed if they hit the edge of the field).

Numerical conversions to support image production

I had a few ideas here, but I’m open to more. Standard approaches are:

Evens/Odds \(\Rightarrow\) Black and white (or whatever pair of colors you want)
- Standard approach.
Mod N
- Extension of the above. Preselect colors and assign them (number % N) values
Normalization
- Most simply \(outputValue = maxOutputValue * stepValue / float(maxStepValue)\)
- Spreads the available values over the range of intensities. Normally \(maxOutputValue = 255\) which would put \(outputValue\) in the range [0,255]
- If you allowed negative step values, subtract the lowest value from all values then proceed as above.
- I use normalization in conjunction with a bunch of other options
Log-Normalization
- It’s frequent for there to be a massive disparity between the high and low values, crushing low values to ~0.
- Normalization on the log of the value tends to produce a more expressive image
Quantile
- Sort the values and split into N groups. Instead of the actual value, now use the group number.
- Most values will be low and abundantly common, know ahead of time how you’ll deal with uneven splits.
Set-based split
- Provide the mod-N, quantile, or other approach with a list of unique values to determine split rather than the full list/bag/md-array of values. In this case, use mod-N (or quantile, I suppose) on the indices of the list.

Color selection approaches I’ve used

Likewise, tried a few things here. Came up with a few new ideas while writing this. They new ideas won’t work, but they’re interesting!

Black and white
- It’s a classic!
Discrete, pre-selected colors
- Works with any partitionable number conversion method (mod-N, quantile, etc)
Continuous colors
- This was fun, though I had to make a HSL to RGB converter for C# because I couldn’t find a ready made one at the time I first did this.
- Because the color (H in HSL) is continuous, you can rotate (apply an offset) or limit (normalize a range) to achieve different results.
- Used the value for H and supplied S and L values that felt good.
- Fond of this because it’s a natural, default “heatmap” feel
Grayscale
- Natural for normalization

Conclusion

I think that’s about it except for the pretty, pretty pictures!

img54446-1.png

img54446-2.png

img54446-3.png

img54446-4.png

img54446-5.png

img54446-6.png

img54446-7.png

img54446-8.png