<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>William Kearney</title>
<generator>Emacs webfeeder.el</generator>
<link href="https://www.wskearney.com/"/>
<link href="https://www.wskearney.com/atom.xml" rel="self"/>
<id>https://www.wskearney.com/atom.xml</id>
<updated>2025-09-17T17:52:55+02:00</updated>
<entry>
  <title>Tries are really neat</title>
  <author><name>William Kearney</name></author>
  <content type="html"><![CDATA[<main id="content" class="content"> <article class="h-entry"> <h1 class="p-name title">Tries are really neat</h1>
 <p> <span class="byline">Published on  <a class="u-url u-uid" href="https://www.wskearney.com/posts/tries.html"> <time class="dt-published date" datetime="2023-11-21T14:11:00+0100">2023-11-21</time></a> by  <a class="p-author h-card" href="https://www.wskearney.com">William Kearney</a></span></p>
 <p> <span class="tags">Tags: 
 <a href="/tags/cs.html" rel="tag" class="p-category">cs</a>
 <a href="/tags/data-structures.html" rel="tag" class="p-category">data-structures</a>
</span></p> <div class="e-content">
 <p>
While working with some  <a href="goneplax.html">sonar data</a>, I found myself wanting to index a data file. I needed a key-value map where both the keys and the values are 64 bit integers. The values are essentially offsets into the original data file, but the keys are the concatenation of three integer fields from the data record itself. The first byte is the sonar subsystem that the data originated from, the next six bytes are a millisecond-resolution timestamp, and the final byte distinguishes the port from the starboard channel. I organize the keys this way because I have found that a lot of data access patterns require accessing the two channels of data from each timestamp for a particular subsystem. To scan the index in this way, we need to be able to iterate over keys that share a common prefix. A trie is a natural fit for this problem.
</p>

 <p>
Tries are trees where the nodes represent common prefixes of the keys, and the edges represent the next symbol in the key. With integer keys, we can interpret the key as a string of bits, so that a very simple trie node consists of two pointers, one of which points to the subtrie where the prefix of the present node is followed by a 0 bit and one of which points to the subtrie where the prefix is followed by a 1 bit. In C, this might look something like
</p>

 <div class="org-src-container">
 <pre class="src src-C"> <span style="color: #859900; font-weight: bold;">typedef</span>  <span style="color: #859900; font-weight: bold;">struct</span>  <span style="color: #b58900;">trie</span>  <span style="color: #b58900;">trie</span>;
 <span style="color: #859900; font-weight: bold;">struct</span>  <span style="color: #b58900;">trie</span> {
     <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">children</span>[2];
};
</pre>
</div>

 <p>
A 64 bit integer key can be found in the trie by scanning from the root node, which represents the empty prefix, and following 64 pointers by consuming bits from the key starting from the most significant bit. If we ever hit a NULL pointer, the key is not present, and we break out from the loop early. We don't need to store the keys in the trie, because they are implicitly represented by the path from the root node.
</p>

 <div class="org-src-container">
 <pre class="src src-C"> <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">trie_lookup</span>( <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">t</span>,  <span style="color: #b58900;">uint64_t</span>  <span style="color: #268bd2;">key</span>)
{
     <span style="color: #859900; font-weight: bold;">for</span> ( <span style="color: #b58900;">int32_t</span>  <span style="color: #268bd2;">level</span> = 63; level >=0; level--) {
         <span style="color: #b58900;">uint8_t</span>  <span style="color: #268bd2;">child</span> = (key >> level) & 1;
         <span style="color: #859900; font-weight: bold;">if</span> ( <span style="color: #b58900; font-weight: bold;">!</span>t->children[child]) {
             <span style="color: #859900; font-weight: bold;">return</span>  <span style="color: #268bd2; font-weight: bold;">NULL</span>;
        }
        t = t->children[(key >> level) & 1];
    }
     <span style="color: #859900; font-weight: bold;">return</span> t;
}
</pre>
</div>

 <p>
 <code>trie_lookup</code> returns a pointer to a trie, and a NULL pointer indicates that the key was not found. Note that this requires that the child pointers at level 0 are non-zero. They don't necessarily need to point to a valid  <code>trie</code>, if we can ensure that we never dereference the pointer returned from  <code>trie_lookup</code>. However, it is helpful when we start storing values in the trie to instead allocate a leaf node at a fictitious level equivalent to -1, so that the level 0 child pointers are either NULL or valid pointers to leaf nodes. Insertion is very similar to lookup.
</p>

 <div class="org-src-container">
 <pre class="src src-C"> <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">new_node</span>();
 <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">trie_insert</span>( <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">t</span>,  <span style="color: #b58900;">uint64_t</span>  <span style="color: #268bd2;">key</span>)
{
     <span style="color: #859900; font-weight: bold;">for</span> ( <span style="color: #b58900;">int32_t</span>  <span style="color: #268bd2;">level</span> = 64; level>=0; level--) {
         <span style="color: #b58900;">uint8_t</span>  <span style="color: #268bd2;">child</span> = (key >> level) & 1;
         <span style="color: #859900; font-weight: bold;">if</span> ( <span style="color: #b58900; font-weight: bold;">!</span>t->children[child]) {
            t->children[child] = new_node();
        }
        t = t->children[child];
    }
}
</pre>
</div>

 <p>
The function  <code>new_node</code> depends on your chosen strategy for memory allocation. One could simply use  <code>malloc(sizeof(trie))</code>, but I would usually allocate from a pool of  <code>trie</code> nodes or a  <a href="https://www.gingerbill.org/article/2019/02/08/memory-allocation-strategies-002/">linear allocator</a>. A simple pool allocator is just a buffer of  <code>trie</code> nodes that tracks the next available node.
</p>

 <div class="org-src-container">
 <pre class="src src-C"> <span style="color: #859900; font-weight: bold;">typedef</span>  <span style="color: #859900; font-weight: bold;">struct</span> {
     <span style="color: #b58900;">trie</span>*  <span style="color: #268bd2;">buffer</span>;
     <span style="color: #b58900;">ptrdiff_t</span>  <span style="color: #268bd2;">capacity</span>;
     <span style="color: #b58900;">ptrdiff_t</span>  <span style="color: #268bd2;">count</span>;
}  <span style="color: #b58900;">trie_pool</span>;

 <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">pool_allocate</span>( <span style="color: #b58900;">trie_pool</span> * <span style="color: #268bd2;">pool</span>)
{
    assert(pool->count < pool->capacity);
     <span style="color: #859900; font-weight: bold;">return</span> pool->buffer[pool->count++];
}
</pre>
</div>

 <p>
Before considering how to iterate over a trie, we can store values in the trie by changing the  <code>trie</code> to a union.
</p>

 <div class="org-src-container">
 <pre class="src src-C"> <span style="color: #859900; font-weight: bold;">typedef</span>  <span style="color: #859900; font-weight: bold;">union</span>  <span style="color: #b58900;">trie</span>  <span style="color: #b58900;">trie</span>;
 <span style="color: #859900; font-weight: bold;">union</span>  <span style="color: #b58900;">trie</span> {
     <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">children</span>[2];
     <span style="color: #859900; font-weight: bold;">struct</span> {
         <span style="color: #b58900;">uint64_t</span>  <span style="color: #268bd2;">key</span>;
         <span style="color: #b58900;">uint64_t</span>  <span style="color: #268bd2;">value</span>;
    };
};
</pre>
</div>

 <p>
The nodes on the fictitious -1 level will be leaf nodes with key-value pairs stored instead of child pointers. Rather than tagging the union, we will distinguish the internal from the leaf nodes by their level, which we will always keep track of outside of the data structure. While we don't need to store the keys, there is a spare 8 bytes in the leaf nodes that we might as well fill up, and storing the keys prevents us from having to keep track of the keys while iterating.
</p>

 <p>
Our trie is a binary tree and iterating over it amounts to a depth-first tree traversal, which means we need a stack.
</p>

 <div class="org-src-container">
 <pre class="src src-C"> <span style="color: #859900; font-weight: bold;">typedef</span>  <span style="color: #859900; font-weight: bold;">struct</span> {
     <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">stack</span>[65];
     <span style="color: #b58900;">int32_t</span>  <span style="color: #268bd2;">levels</span>[65];
     <span style="color: #b58900;">int32_t</span>  <span style="color: #268bd2;">top</span>;
}  <span style="color: #b58900;">trie_iterator</span>;

 <span style="color: #b58900;">void</span>  <span style="color: #268bd2;">trie_push</span>( <span style="color: #b58900;">trie_iterator</span> * <span style="color: #268bd2;">iter</span>,  <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">node</span>,  <span style="color: #b58900;">int32_t</span>  <span style="color: #268bd2;">level</span>)
{
    assert(iter->top < 65);
    iter->levels[iter->top] = level;
    iter->stack[iter->top++] = node;
}

 <span style="color: #b58900;">int32_t</span>  <span style="color: #268bd2;">trie_pop</span>( <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">node</span>,  <span style="color: #b58900;">trie_iterator</span> * <span style="color: #268bd2;">iter</span>)
{
    assert(iter->top > 0);
    node = iter->stack[--iter->top];
     <span style="color: #859900; font-weight: bold;">return</span> iter->levels[iter->top];
}
</pre>
</div>

 <p>
The trie has a fixed depth, so the stack is bounded by the number of levels, which is 64 plus the leaf level, so 65 nodes will suffice. We maintain a stack of  <code>trie</code> pointers and their corresponding levels. If we didn't store the keys in the leaves, we would also want to have a stack of keys, which we would reconstruct as we progress through the trie.
</p>

 <p>
We initialize the iterator by pushing the root node at level 63:
</p>

 <div class="org-src-container">
 <pre class="src src-C"> <span style="color: #b58900;">trie_iterator</span>  <span style="color: #268bd2;">trie_begin</span>( <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">root</span>)
{
     <span style="color: #b58900;">trie_iterator</span>  <span style="color: #268bd2;">iter</span> = {0}; 
    trie_push(&iter,root,63);
     <span style="color: #859900; font-weight: bold;">return</span> iter;
}
</pre>
</div>

 <p>
To iterate, we pop nodes from the stack and push their children.
</p>

 <div class="org-src-container">
 <pre class="src src-C"> <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">trie_iterate</span>( <span style="color: #b58900;">trie_iterator</span> * <span style="color: #268bd2;">iter</span>,  <span style="color: #b58900;">int32_t</span>  <span style="color: #268bd2;">end_level</span>)
{
     <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">node</span>;
     <span style="color: #b58900;">int32_t</span>  <span style="color: #268bd2;">level</span>;
     <span style="color: #859900; font-weight: bold;">while</span> (iter->top > 0) {
        level = trie_pop(node,iter);
         <span style="color: #859900; font-weight: bold;">if</span> (level < end_level) {
             <span style="color: #859900; font-weight: bold;">return</span> node;
        }
         <span style="color: #859900; font-weight: bold;">if</span> (node->children[1]) {
            trie_push(iter,node->children[1],level - 1);
        }
         <span style="color: #859900; font-weight: bold;">if</span> (node->children[0]) {
            trie_push(iter,node->children[0],level - 1);
        }
    }
     <span style="color: #859900; font-weight: bold;">return</span>  <span style="color: #268bd2; font-weight: bold;">NULL</span>;
}
</pre>
</div>

 <p>
We use the  <code>end_level</code> parameter to control which nodes we iterate over. If we want to iterate over the leaf nodes, we pass 0 for  <code>end_level</code>. If we want to iterate over prefixes, we pass  <code>end_level</code> corresponding to the level of the prefix. For our sonar data example, we might iterate first over the subsystems, then over the timestamps, then the channels.
</p>

 <div class="org-src-container">
 <pre class="src src-C"> <span style="color: #b58900;">trie_iterator</span>  <span style="color: #268bd2;">subsystem_iterator</span> = trie_begin(root);
 <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">subsystem</span>;
 <span style="color: #859900; font-weight: bold;">while</span> ((subsystem = trie_iterate(&subsystem_iterator,56))) {
     <span style="color: #b58900;">trie_iterator</span>  <span style="color: #268bd2;">timestamp_iterator</span>;
    trie_push(&timestamp_iterator,subsystem,55);
     <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">timestamp</span>;
     <span style="color: #859900; font-weight: bold;">while</span> ((timestamp = trie_iterate(&subsystem_iterator,8))) {
         <span style="color: #b58900;">trie_iterator</span>  <span style="color: #268bd2;">channel_iterator</span>;
        trie_push(&channel_iterator,timestamp,7);
         <span style="color: #b58900;">trie</span> * <span style="color: #268bd2;">channel</span>;
         <span style="color: #859900; font-weight: bold;">while</span> ((channel = trie_iterate(&channel_iterator,0))) {
            process_data(channel->key,channel->value);
        }
    }
}
</pre>
</div>

 <p>
And that is all we need to start building and scanning indices for the sonar data. There are a lot of improvements that we can make on this basic design. For example, we can consume more than a single bit of the key at a time, which speeds up lookups while requiring more space to store the child pointers. We can also construct a variety of hybrid tree variants that use other search data structures at each level of the trie. It is also relatively straightforward to implement lock-free concurrent inserts to this simple array-based trie.
</p>
 <section id="outline-container-org008e36e" class="outline-2"> <h2 id="org008e36e">References</h2>
 <div class="outline-text-2" id="text-org008e36e">
 <ul class="org-ul"> <li>Tries were  <a href="https://doi.org/10.1145/367390.367400">named</a> but not invented by Edward Fredkin.</li>
 <li>Douglas Comer provides a  <a href="https://doi.org/10.1145/320083.320102">very clear description</a> of tries in an analysis of techniques for saving space in tries.</li>
 <li>Phil Bagwell likewise gives  <a href="https://infoscience.epfl.ch/record/64394">a good introduction</a> to tries while developing a succinct trie data structure.</li>
 <li>A lot of the specific implementation choices here are inspired by Chris Wellons'  <a href="https://nullprogram.com/blog/2023/09/30/">investigation of hash-keyed tries</a>.</li>
</ul></div>
</section></div></article></main>]]></content>
  <link href="https://www.wskearney.com/posts/tries.html"/>
  <id>https://www.wskearney.com/posts/tries.html</id>
  <updated>2023-11-21T00:00:00+01:00</updated>
</entry>
<entry>
  <title>Introducing goneplax, a command line tool for inspecting Edgetech JSF files</title>
  <author><name>William Kearney</name></author>
  <content type="html"><![CDATA[<main id="content" class="content"> <article class="h-entry"> <h1 class="p-name title">Introducing goneplax, a command line tool for inspecting Edgetech JSF files</h1>
 <p> <span class="byline">Published on  <a class="u-url u-uid" href="https://www.wskearney.com/posts/goneplax.html"> <time class="dt-published date" datetime="2023-07-24T15:44:00+0200">2023-07-24</time></a> by  <a class="p-author h-card" href="https://www.wskearney.com">William Kearney</a></span></p>
 <p> <span class="tags">Tags: 
 <a href="/tags/benthos.html" rel="tag" class="p-category">benthos</a>
 <a href="/tags/goneplax.html" rel="tag" class="p-category">goneplax</a>
 <a href="/tags/tools.html" rel="tag" class="p-category">tools</a>
 <a href="/tags/small-projects.html" rel="tag" class="p-category">small-projects</a>
</span></p> <div class="e-content">
 <p>
I have just published a little command line tool called  <a href="https://sr.ht/~wkearn/goneplax/">goneplax</a> to help look at the JSF files that are output by  <a href="https://www.edgetech.com/">Edgetech</a> sonars. Imaging sonar data is an interesting problem because publicly accessible data usually is either georeferenced mosaics or raw instrument data. The former come in geospatial formats like GeoTIFFs but have lost a lot of information that I would like to have as I think about statistical models for imaging sonar data. The latter typically need proprietary software to access, and that software often locks you into particular kinds of data analysis like the production of georeferenced mosaics. There are some open source sonar data tools, notably  <a href="https://github.com/dwcaress/MB-System">MB-System</a> from the Monterey Bay Aquarium Research Institute, but those aim to be fully featured data processing systems, whereas I am mostly interested in tools that allow me to build my own data analysis pipelines around sonar data.
</p>

 <p>
However, my  <a href="https://github.com/wkearn/sdw">earlier attempt</a> at this kind of tooling quickly ballooned out of control because there is a lot of stuff you can do with sonar data like format translation, visualization or even data analysis tools – I even briefly messed around with designing a  <a href="https://github.com/wkearn/shoal">programming language</a>. I took a step back, thought about  <a href="https://schroer.ca/2022/04/10/the-joy-of-small-projects/">The Joy of Small Projects</a>, and this morning came up with  <code>goneplax</code>. I have a lot of JSF files sitting around (two particularly good sources for sonar data are  <a href="https://www.pangaea.de/">PANGAEA</a> and the  <a href="https://www.data.gov.uk/">UK government data portal</a>), and the first step to using them for anything is to figure out what is in them. JSF files are basically sequences of records, each of which has a header that provides some information about the record.  <code>goneplax</code> reads through these records, and writes out to the command line the type of each record, the subsystem it comes from and the channel within that subsystem (i.e. the port or starboard channels for sidescan data).
</p>

 <p>
If we try it out on some data from  <a href="https://doi.org/10.1594/PANGAEA.907463">Papenmeier and Hass (2019)</a>, we get something like
</p>

 <div class="org-src-container">
 <pre class="src src-nil">> goneplax HE501_Hydro1_001.jsf | head
System Information:0:0
Navigation Offsets:0:0
Pitch Roll Data:101:1
Pitch Roll Data:101:1
Sonar Data Message:20:0
Sonar Data Message:20:1
Pitch Roll Data:101:1
Unknown Message:102:0
Pitch Roll Data:101:1
Sonar Data Message:20:0
</pre>
</div>

 <p>
The first field is the type of the message, the second is the subsystem number and the third is the channel. For instance, in the first 10 records shown above, we have three Sonar Data Messages that contain sidescan data. These are all from subsystem 20, which is the code for the lowest available sidescan sonar frequency (these data only have one frequency). The first and the last are from the port side of the sonar (channel 0) and the second is from the starboard (channel 1).
</p>

 <p>
 <code>goneplax</code> outputs text data like this so we can use Unix command line tools to further process it. One common operation is to count the number of messages of each kind in a given file:
</p>

 <div class="org-src-container">
 <pre class="src src-nil">> goneplax HE501_Hydro1_001.jsf | sort | uniq -c
    1 Navigation Offsets:0:0
 9370 NMEA String:100:0
19440 Pitch Roll Data:101:1
 8721 Sonar Data Message:20:0
 8720 Sonar Data Message:20:1
    1 System Information:0:0
 1500 Unknown Message:102:0
</pre>
</div>

 <p>
which matches the output of  <a href="https://github.com/Geosvy/jsfmesgtype">jsfmesgtype</a>, another command line tool that does something similar:
</p>

 <div class="org-src-container">
 <pre class="src src-nil">> jsfmesgtype HE501_Hydro1_001.jsf
Sonar = 17441  Pitch = 19440  NMEA = 9370  Analog = 1500  S_info = 1
</pre>
</div>

 <p>
If  <code>goneplax</code> seems like it might be useful for you, give it a try and let me know how it goes at  <a href="mailto:~wkearn/goneplax-discuss@lists.sr.ht">~wkearn/goneplax-discuss@lists.sr.ht</a>.
</p>
 <section id="outline-container-org495e167" class="outline-2"> <h2 id="org495e167">Lessons learned</h2>
 <div class="outline-text-2" id="text-org495e167">
 <ol class="org-ol"> <li>Completing a constrained project in a limited time span is a very good exercise.  <code>goneplax</code> is not particularly useful, but it works. If I ever need to peek at a JSF file, I can just run  <code>goneplax <filename></code> from my terminal.</li>
 <li>I decided to host this project at  <a href="https://sr.ht/">Sourcehut</a> rather than Github. I do like its minimalism compared to Github, though I have not had the opportunity to use its email-based patch workflow in earnest.</li>
 <li>This is written in Rust as opposed to Julia, which has been my preferred language for many years. I've been using Rust a lot over the last few months, and it is pretty well-suited to a tool like this.</li>
 <li>I tried to apply the ideas of matklad's  <a href="https://matklad.github.io//2022/10/06/hard-mode-rust.html">Hard Mode Rust</a> and the TigerBeetle developers'  <a href="https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/TIGER_STYLE.md">Tiger Style</a> to  <code>goneplax</code>. However, it just isn't all that complicated, so its allocation-free library just parses the JSF message headers. What little complexity there is in  <code>goneplax</code> lies in the I/O, which is all handled using Rust's standard library. There are no dependencies beyond Rust's standard library.</li>
 <li> <a href="https://en.wikipedia.org/wiki/Goneplax_rhomboides"> <i>Goneplax rhomboides</i></a> is a kind of crab.</li>
</ol></div>
</section></div></article></main>]]></content>
  <link href="https://www.wskearney.com/posts/goneplax.html"/>
  <id>https://www.wskearney.com/posts/goneplax.html</id>
  <updated>2023-07-24T00:00:00+02:00</updated>
</entry>
<entry>
  <title>Random maps with spatially dependent Pitman-Yor processes</title>
  <author><name>William Kearney</name></author>
  <content type="html"><![CDATA[<main id="content" class="content"> <article class="h-entry"> <h1 class="p-name title">Random maps with spatially dependent Pitman-Yor processes</h1>
 <p> <span class="byline">Published on  <a class="u-url u-uid" href="https://www.wskearney.com/posts/patchwork-kingdom.html"> <time class="dt-published date" datetime="2023-06-30T10:16:00+0200">2023-06-30</time></a> by  <a class="p-author h-card" href="https://www.wskearney.com">William Kearney</a></span></p>
 <p> <span class="tags">Tags: 
 <a href="/tags/statistics.html" rel="tag" class="p-category">statistics</a>
 <a href="/tags/gaussian-processes.html" rel="tag" class="p-category">gaussian-processes</a>
</span></p> <div class="e-content">
 <p>
Gaussian processes are powerful tools for modeling spatially varying fields like sea surface temperature or  <a href="nsea.html">sand ripple elevation</a>. But what can you do if you want a model for discrete-valued fields like a land cover classification? There are many approaches that you could use for this problem, but I particularly like the maps you can generate using spatially dependent Pitman-Yor processes ( <a href="https://proceedings.neurips.cc/paper/2008/hash/883e881bb4d22a7add958f2d6b052c9f-Abstract.html">Sudderth and Jordan 2008</a>).
</p>


 <figure id="org6a7c879"> <img src="../images/patchwork-map1.png" alt="A map made using a spatially dependent Pitman-Yor process"></img></figure> <p>
A  <a href="https://en.wikipedia.org/wiki/Pitman%E2%80%93Yor_process">Pitman-Yor process</a> is a generalization of the  <a href="https://en.wikipedia.org/wiki/Dirichlet_process">Dirichlet process</a>. Both are probability distributions over probability distributions in a sense that is quite tricky to understand, but which is very useful in developing nonparametric Bayesian models for data: you can use a Dirichlet/Pitman-Yor prior for an unknown probability distribution and then learn the distribution from your data.
</p>

 <p>
Another way of looking at these processes is that they provide distributions over infinite partitions of the interval \([0,1]\). This is often called the "stick-breaking" construction of the Dirichlet process, and it works like this. Imagine we have a stick with a unit length. We first break off a randomly sized fraction of the stick, \(v_1\). This is the first segment of our partition with length \(p_1 = v_1\). Now we take the remaining stick, which has a length \(1 - v_1\), and we break off another piece that is a fraction, \(v_2\), of the remaining length. This piece is our second segment, and it has a length of \(p_2 = v_2 (1 - v_1)\). We repeat this process an infinite number of times, so that the $k$-th segment has a length \(p_k = v_k \prod_{i=1}^{k-1} (1 - v_k)\), and the segments all add up to the length of the original stick \(\sum_{k=1}^{\infty} p_k = 1\).
</p>

 <p>
The difference between the Dirichlet process and the Pitman-Yor process is simply in the distribution of the fractions, \(v_k\). The Dirichlet process has each fraction distributed by a \(\text{Beta}(1,\alpha)\) distribution while the Pitman-Yor process has
</p>

 <p>
\[
v_k \sim \text{Beta}(1 - \beta,\alpha + k \beta)
\]
</p>

 <p>
so that the distribution changes with \(k\). When \(\beta = 0\), this is just the Dirichlet process while higher values of \(\beta\) result in a distribution with heavier tails than the Dirichlet process.
</p>

 <p>
For our Pitman-Yor process maps, we only need to sample the fractions, \(v_k\), which we can do easily in Julia using  <a href="https://github.com/JuliaStats/Distributions.jl">Distributions.jl</a> to sample from the Beta distribution
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">sample_py_fractions</span>(α,β,K)
  v = zeros(K)
   <span style="color: #859900; font-weight: bold;">for</span> k  <span style="color: #859900; font-weight: bold;">in</span> 1:K
      v[k] = rand(Beta(1 - β, α + k * β))
   <span style="color: #859900; font-weight: bold;">end</span>
  v
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
Note that even though the partition formally has an infinite number of segments, we only sample the first \(K\) fractions. One interpretation of the Dirichlet process is as a mixture model with an infinite number of components. The partition lengths, \(p_k\), are the relative frequencies of each class: \(p_1\) percent of your data will come from class 1, \(p_2\) come from class 2, and so on. Since \(p_k\) is always smaller than \(p_{k-1}\), only a finite number of classes will have a meaningful probability of being drawn, even though the model theoretically represents an infinite number of classes. As long as \(K\) is high enough that we don't hit that threshold, it is okay to only think about the first \(K\) classes.
</p>

 <p>
In this classification model, it turns out that the stick-breaking fraction \(v_k\) is the conditional probability of assigning a data point to class \(k\) given that all the classes with indices less that \(k\) have been rejected. So we can think about generating class assignments by first choosing class 1 with probability \(v_1\). If we don't choose class one then we choose class 2 with probability \(v_2\), and so on, until all of our data points are assigned to classes. In code, this process looks something like
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">generate_classes</span>(v,N)
    z = zeros(Int,N)
     <span style="color: #859900; font-weight: bold;">for</span> i  <span style="color: #859900; font-weight: bold;">in</span> 1:N
        k = 1
         <span style="color: #859900; font-weight: bold;">while</span> (k <= length(v)) && (rand() > v[k])
            k += 1
         <span style="color: #859900; font-weight: bold;">end</span>
        z[i] = k
     <span style="color: #859900; font-weight: bold;">end</span>
    z
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
The insight that allows Sudderth and Jordan to create spatially dependent Pitman-Yor processes is that instead of drawing uniform random variables and checking whether they are less than the stick-breaking fraction, we can draw random variables from a different distribution, apply the cumulative distribution function for that distribution to the random variable and compare that value to the stick-breaking fraction. This is the same idea behind  <a href="https://en.wikipedia.org/wiki/Inverse_transform_sampling">inverse transform sampling</a>, which lets you convert uniformly distributed random variables into arbitrarily distributed variables provided that you know the (inverse) cumulative distribution function.
</p>

 <p>
In particular, we can choose standard normal random variables and apply the CDF of the normal distribution to sample from the same distribution of classes. Furthermore, instead of drawing independent samples, they can have correlations across the data points, as long as the marginal distributions are standard normal. It is important to note that the correlations are across data points (the index \(i\) in the above code) not the classes. Those still need to be independent to get the class proportions correct.
</p>

 <p>
Gaussian processes are just normally distributed random variables with spatial correlations, so we can build spatially dependent Pitman-Yor processes by first drawing \(K\) independent realizations of Gaussian processes over our region of interest, drawing stick-breaking fractions as above, and then applying our classification scheme for each point in space.
</p>

 <p>
There are a lot of details to sampling from Gaussian processes, so we won't look into them in great detail here. Instead we will just use the following function that generates random fields on regular grids using fast Fourier transforms.
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">generate_field</span>(k0,ν,Nx,Ny,B)
  kx = FFTW.rfftfreq(Nx)
  ky = FFTW.fftfreq(Ny)
  k2 = abs2.(kx) .+ abs2.(ky')
  v = inv(sqrt(π/ν)) * (k0^ν) .* (abs2(k0) .+ k2).^(-(ν+1)/2)
  irfft(v .* rfft(randn(Nx,Ny,B),(1,2)),Nx,(1,2))
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
The parameter \(k_0\) sets the correlation length of the field: higher \(k_0\) corresponds to shorter correlation lengths, and the parameter \(\nu\) controls the smoothness of the field, with higher \(\nu\) giving smoother fields. The following figure illustrates some random fields as shown in the following figure
</p>

 <div class="org-src-container">
 <pre class="src src-julia">k0 = [0.01;0.05;0.1]
ν  = [1.0;3.0;7.0]

fields = [generate_field(k0,ν,256,256,1)[:,:,1]  <span style="color: #859900; font-weight: bold;">for</span> k0 in k0, ν  <span style="color: #859900; font-weight: bold;">in</span> ν]
fig = Figure()
 <span style="color: #859900; font-weight: bold;">for</span> j  <span style="color: #859900; font-weight: bold;">in</span> 1:3
     <span style="color: #859900; font-weight: bold;">for</span> i  <span style="color: #859900; font-weight: bold;">in</span> 1:3
        ax = Axis(fig[i,j],title= <span style="color: #2aa198;">"k0 = $(k0[i]), ν = $(ν[j])"</span>)
        heatmap!(ax,fields[i,j])
        hidedecorations!(ax)
     <span style="color: #859900; font-weight: bold;">end</span>
 <span style="color: #859900; font-weight: bold;">end</span>

save( <span style="color: #2aa198;">"patchwork-fields.png"</span>,fig)
</pre>
</div>


 <figure id="org0e0d25a"> <img src="../images/patchwork-fields.png" alt="Random fields sampled with different values of the correlation length and smoothness."></img></figure> <p>
Finally we just need to modify  <code>generate_classes</code> so that it uses the random fields as input to generate the classes.
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #268bd2;">Φi</span>(v) = -erfcinv(2v) * sqrt(2)
 <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">classify_fields</span>(u,v)
  z = zeros(Int,size(u,1),size(u,2))
   <span style="color: #859900; font-weight: bold;">for</span> j  <span style="color: #859900; font-weight: bold;">in</span> 1:size(u,2)
       <span style="color: #859900; font-weight: bold;">for</span> i  <span style="color: #859900; font-weight: bold;">in</span> 1:size(u,1)
          acc =  <span style="color: #268bd2; font-weight: bold;">false</span>
          k = 1
           <span style="color: #859900; font-weight: bold;">while</span> (k < size(u,3)) && (u[i,j,k] >= Φi(v[k]))
              k += 1
           <span style="color: #859900; font-weight: bold;">end</span>
          z[i,j] = k
       <span style="color: #859900; font-weight: bold;">end</span>
   <span style="color: #859900; font-weight: bold;">end</span>
  z
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
Note that we defined the function  <code>Φi(v)</code> which is the inverse of the standard normal cumulative distribution function.
</p>

 <p>
We can wrap all of this together in a function to generate a map
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">generate_map</span>(α,β,k0,ν,Nx,Ny,K)
    v = sample_py_fractions(α,β,K)
    u = generate_field(k0,ν,Nx,Ny,K)
    classify_fields(u,v)
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
and make some maps
</p>

 <div class="org-src-container">
 <pre class="src src-julia">k0 = [0.01;0.05;0.1]
ν  = [1.0;3.0;7.0]

maps = [generate_map(1.0,0.0,k0,ν,256,256,32)  <span style="color: #859900; font-weight: bold;">for</span> k0 in k0, ν  <span style="color: #859900; font-weight: bold;">in</span> ν]

fig = Figure()
 <span style="color: #859900; font-weight: bold;">for</span> j  <span style="color: #859900; font-weight: bold;">in</span> 1:3
     <span style="color: #859900; font-weight: bold;">for</span> i  <span style="color: #859900; font-weight: bold;">in</span> 1:3
        ax = Axis(fig[i,j],title= <span style="color: #2aa198;">"k0 = $(k0[i]), ν = $(ν[j])"</span>)
        heatmap!(ax,maps[i,j],colormap= <span style="color: #268bd2; font-weight: bold;">:Set3_12</span>)
        hidedecorations!(ax)
     <span style="color: #859900; font-weight: bold;">end</span>
 <span style="color: #859900; font-weight: bold;">end</span>

save( <span style="color: #2aa198;">"patchwork-maps.png"</span>,fig)
</pre>
</div>


 <figure id="org8662313"> <img src="../images/patchwork-maps.png" alt="Maps made using spatially dependent Pitman-Yor processes, varying the length scale and smoothness of the underlying Gaussian process. Each map uses the Pitman-Yor process with alpha = 1 and beta=0, which is equivalent to a Dirichlet process with concentration parameter alpha."></img></figure></div></article></main>]]></content>
  <link href="https://www.wskearney.com/posts/patchwork-kingdom.html"/>
  <id>https://www.wskearney.com/posts/patchwork-kingdom.html</id>
  <updated>2023-06-30T00:00:00+02:00</updated>
</entry>
<entry>
  <title>Smoothing splines as a stochastic process</title>
  <author><name>William Kearney</name></author>
  <content type="html"><![CDATA[<main id="content" class="content"> <article class="h-entry"> <h1 class="p-name title">Smoothing splines as a stochastic process</h1>
 <p> <span class="byline">Published on  <a class="u-url u-uid" href="https://www.wskearney.com/posts/smoothing_splines.html"> <time class="dt-published date" datetime="2023-04-26T00:00:00+0200">2023-04-26</time></a> by  <a class="p-author h-card" href="https://www.wskearney.com">William Kearney</a></span></p>
 <p> <span class="tags">Tags: 
 <a href="/tags/statistics.html" rel="tag" class="p-category">statistics</a>
 <a href="/tags/splines.html" rel="tag" class="p-category">splines</a>
 <a href="/tags/state-space-models.html" rel="tag" class="p-category">state-space-models</a>
</span></p> <div class="e-content">
 <p>
The (cubic)  <a href="https://en.wikipedia.org/wiki/Smoothing_spline">smoothing spline</a> is usually introduced as a minimization problem: find the function, \(f\) that minimizes the following functional
</p>

 <p>
\[
J(f) = \sum_{i=1}^N (y_i - f(x_i))^2 + \lambda\int f''(x)^2\,dx
\]
</p>

 <p>
that balances fitting the data in the first term and the smoothness of the chosen function in the second, with the tradeoff determined by the smoothing parameter \(\lambda\). <sup> <a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink">1</a></sup> It turns out that the solution to this minimization is a piecewise cubic polynomial with knots at the data locations. Fitting this function is straightforward but requires some careful application of linear algebra to be efficient (Reinsch).
</p>

 <p>
When the data are measured at regular intervals, such as in many time series applications, the integral can be replaced by the discrete sum
</p>

 <p>
\[
\sum_{t=2}^{N-1} \left(f_{t+1} - 2f_t + f_{t-1}\right)^2.
\]
</p>

 <p>
where \(f_t = f(x_t)\) is the value of the unknown function at each of the data points. Our modified objective function now reads
</p>

 <p>
\[
J(f_{1:t}) = \sum_{t=1}^N (y_t - f_t)^2 + \lambda \sum_{t=2}^{N-1} \left(f_{t+1} - 2f_t + f_{t-1}\right)^2
\]
</p>

 <p>
and one way to interpret this optimization problem is as minimizing the sum of squares of the residuals, \(w_t\) and \(v_t\) in
</p>

\begin{align}
f_{t+1} &= 2f_t - f_{t-1} + w_t \\
y_t &= f_t + v_t
\end{align}

 <p>
In fact, this pair of equations defines a  <a href="https://en.wikipedia.org/wiki/State-space_representation">state-space model</a>, which means we can use  <a href="https://en.wikipedia.org/wiki/Kalman_filter">Kalman filtering and smoothing</a> techniques <sup> <a id="fnr.2" class="footref" href="#fn.2" role="doc-backlink">2</a></sup> <sup>, </sup> <sup> <a id="fnr.3" class="footref" href="#fn.3" role="doc-backlink">3</a></sup> to estimate the posterior mean \(E[f_{1:t}|y_{1:t}]\), which turns out to coincide with the smoothing spline solution. The one tricky point is that standard Kalman smoother implementations require you to specify an initial distribution for \(f_{-1:0}\). The Kalman smoother and the smoothing spline only coincide when this prior distribution becomes very diffuse. This is because the smoothing spline is technically equivalent to an "improper" Bayesian prior on \(f_{1:t}\). <sup> <a id="fnr.1.1" class="footref" href="#fn.1" role="doc-backlink">1</a></sup> <sup>, </sup> <sup> <a id="fnr.4" class="footref" href="#fn.4" role="doc-backlink">4</a></sup></p>

 <p>
Reframing the smoothing spline algorithm to a generative model for some data makes it possible to build smoothing splines into more complex models. One could, for example, combine the smoothing spline prior with a non-Gaussian likelihood to model count or classification data.
</p>
 <div id="footnotes">
 <h2 class="footnotes">Footnotes: </h2>
 <div id="text-footnotes">

 <div class="footdef"> <sup> <a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink">1</a></sup> <div class="footpara" role="doc-footnote"> <p class="footpara">
Wahba (1978). "Improper priors, spline smoothing, and the problem of guarding against model errors in regression."   <i>Journal of the Royal Statistical Society B.</i>, Vol. 40, No. 3.  <a href="https://doi.org/10.1111/j.2517-6161.1978.tb01050.x">https://doi.org/10.1111/j.2517-6161.1978.tb01050.x</a>
</p></div></div>

 <div class="footdef"> <sup> <a id="fn.2" class="footnum" href="#fnr.2" role="doc-backlink">2</a></sup> <div class="footpara" role="doc-footnote"> <p class="footpara">
Kohn and Ansley (1987). "A new algorithm for spline smoothing based on smoothing a stochastic process."  <i>SIAM Journal of Scientific and Statistical Computing</i>, Vol. 8, No. 1.  <a href="https://doi.org/10.1137/0908004">https://doi.org/10.1137/0908004</a>
</p></div></div>

 <div class="footdef"> <sup> <a id="fn.3" class="footnum" href="#fnr.3" role="doc-backlink">3</a></sup> <div class="footpara" role="doc-footnote"> <p class="footpara">
Shumway and Stoffer (2017).  <i>Time Series Analysis and Its Applications</i>.  <a href="https://doi.org/10.1007/978-3-319-52452-8">https://doi.org/10.1007/978-3-319-52452-8</a>
</p></div></div>

 <div class="footdef"> <sup> <a id="fn.4" class="footnum" href="#fnr.4" role="doc-backlink">4</a></sup> <div class="footpara" role="doc-footnote"> <p class="footpara">
Lindgren and Rue (2008). "On the second-order random walk model for irregular locations."  <i>Scandinavian Journal of Statistics</i>, Vol. 35.  <a href="https://doi.org/10.1111/j.1467-9469.2008.00610.x">https://doi.org/10.1111/j.1467-9469.2008.00610.x</a>
</p></div></div>


</div>
</div></div></article></main>]]></content>
  <link href="https://www.wskearney.com/posts/smoothing_splines.html"/>
  <id>https://www.wskearney.com/posts/smoothing_splines.html</id>
  <updated>2023-04-26T00:00:00+02:00</updated>
</entry>
<entry>
  <title>Probabilistic programming in about 100 lines of Julia</title>
  <author><name>William Kearney</name></author>
  <content type="html"><![CDATA[<main id="content" class="content"> <article class="h-entry"> <h1 class="p-name title">Probabilistic programming in about 100 lines of Julia</h1>
 <p> <span class="byline">Published on  <a class="u-url u-uid" href="https://www.wskearney.com/posts/ppl.html"> <time class="dt-published date" datetime="2022-12-08T00:00:00+0100">2022-12-08</time></a> by  <a class="p-author h-card" href="https://www.wskearney.com">William Kearney</a></span></p>
 <p> <span class="tags">Tags: 
 <a href="/tags/statistics.html" rel="tag" class="p-category">statistics</a>
 <a href="/tags/ppl.html" rel="tag" class="p-category">ppl</a>
</span></p> <div class="e-content">
 <p>
In  <a href="https://www.wskearney.com/posts/geographical-imagination-systems.html">my last post</a> I speculated on the usefulness of probabilistic programming in geographic information systems (GIS). While I have played with some probabilistic programming languages (PPLs) like  <a href="https://turing.ml/stable/">Turing</a>, I mostly do statistical inference using my own code, specialized for the particular models I am trying to build. I wanted to learn more about how PPLs work to start thinking harder about how one might build a GIS around one. It turns out that it is not that hard to get a very rudimentary PPL up and running, so I thought I would share how I did that in (more or less) 100 lines of Julia code.
</p>
 <section id="outline-container-orgaeb0012" class="outline-2"> <h2 id="orgaeb0012">Getting started</h2>
 <div class="outline-text-2" id="text-orgaeb0012">
 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">using</span> Distributions, LinearAlgebra, Statistics, CairoMakie, Random
Random.seed!(54332187)
</pre>
</div>

 <p>
The one dependency for our PPL is the  <a href="https://github.com/JuliaStats/Distributions.jl">Distributions</a> package, which provides a standard interface for working with probability distributions. This is not strictly necessary, but it makes our lives a little easier. Otherwise we would have to write routines for sampling from and computing probability densities for every basic distribution that we want to use in our models. The LinearAlgebra and Statistics standard library modules just provide some functions that will be useful for analyzing our results, and  <a href="https://docs.makie.org/stable/">CairoMakie</a> is there to make plots, but none of those are crucial for the PPL implementation. You really can do this entirely in bare Julia.
</p>
</div>
</section> <section id="outline-container-org45e98e7" class="outline-2"> <h2 id="org45e98e7">Wrapping distributions with continuations</h2>
 <div class="outline-text-2" id="text-org45e98e7">
 <p>
A strategy for implementing a really simple PPL is write the program in  <a href="https://en.wikipedia.org/wiki/Continuation-passing_style">continuation-passing style</a> (CPS)  <sup> <a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink">1</a></sup>. We augment each probability distribution with a function that has a single argument, the result of sampling from the probability distribution, and returns another distribution, the next distribution in our probabilistic program. If this is a little confusing, it makes more sense in code. First we define a new type for our CPS distributions:
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">struct</span>  <span style="color: #b58900;">LatentDistribution</span>
    f
    s
    d
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
where  <code>f</code> is the continuation function,  <code>s</code> will be a symbol that we use to name each random variable, and  <code>d</code> is the Distribution of the random variable.
</p>

 <p>
The simple probabilistic model \(X \sim \mathcal{N}(0,1)\) now gets written using our  <code>LatentDistribution</code>
</p>

 <div class="org-src-container">
 <pre class="src src-julia">model = LatentDistribution(X-> <span style="color: #268bd2; font-weight: bold;">nothing</span>, <span style="color: #268bd2; font-weight: bold;">:X</span>,Normal(0,1))
</pre>
</div>

 <p>
Because we only have one variable in our probabilistic model, the continuation function takes the random variable X and outputs  <code>nothing</code>, which we will use as a sentinel value to denote the end of our probabilistic program. When you have to write increasingly complex continations, Julia provides a convenient syntax for passing anonymous functions as the first argument to other functions, the  <code>do</code> notation:
</p>

 <div class="org-src-container">
 <pre class="src src-julia">model = LatentDistribution( <span style="color: #268bd2; font-weight: bold;">:X</span>,Normal(0,1))  <span style="color: #859900; font-weight: bold;">do</span> X
     <span style="color: #268bd2; font-weight: bold;">nothing</span>
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
This is identical to the previous model, but slightly easier to read. The benefits become even more clear when we consider a hierarchical model like
</p>

\begin{align}
X &\sim \mathcal{N}(0,1) \\
Y | X &\sim \mathcal{N}(X,1).
\end{align}

 <p>
This can be written as
</p>

 <div class="org-src-container">
 <pre class="src src-julia">model = LatentDistribution( <span style="color: #268bd2; font-weight: bold;">:X</span>,Normal(0,1))  <span style="color: #859900; font-weight: bold;">do</span> X
    LatentDistribution( <span style="color: #268bd2; font-weight: bold;">:Y</span>,Normal(X,1))  <span style="color: #859900; font-weight: bold;">do</span> Y
         <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">We don't need to explicitly return nothing
</span>     <span style="color: #859900; font-weight: bold;">end</span>
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
Continuation-passing style lets us create an environment in which downstream  <code>LatentDistributions</code> know about earlier ones. The distribution for  <code>Y</code> can use  <code>X</code> as a parameter, because  <code>X</code> is passed as an argument to the function that creates  <code>Y</code>.
</p>
</div>
</section> <section id="outline-container-org20d2555" class="outline-2"> <h2 id="org20d2555">Sampling from the probabilistic program</h2>
 <div class="outline-text-2" id="text-org20d2555">
 <p>
The  <code>LatentDistribution</code> objects that we have strung together with continuations don't do anything by themselves. They are, in a sense, a probabilistic program waiting to be run by an interpreter that we have yet to write. We can actually write many different interpreters, depending on the modeling task we want to accomplish. The most basic thing we want to do, however, is to draw random samples from the distribution defined by our probabilistic program.
</p>

 <p>
For the two-level example, we need to do three things
</p>

 <ol class="org-ol"> <li>Sample  <code>X</code> from  <code>Normal(0,1)</code></li>
 <li>Construct the distribution of  <code>Y</code> given the just-sampled value of  <code>X</code>,  <code>Normal(X,1)</code>.</li>
 <li>Sample  <code>Y</code> from that distribution</li>
</ol> <p>
Sampling  <code>X</code> is easy, the desired distribution is stored as the  <code>d</code> field of  <code>model</code>, so we can call  <code>X = rand(model.d)</code>. But how do we use this in our probabilistic program? Our continuations come to the rescue. Remember that  <code>model.f</code> is the continuation that takes  <code>X</code> as a parameter and returns the  <code>LatentDistribution</code> for  <code>Y</code>. So to sample  <code>Y</code>, we would do  <code>rand(model.f(rand(model.d)).d)</code>, first constructing the  <code>LatentDistribution</code> using the continuation and then sampling from the defined distribution.
</p>

 <p>
Of course, in a more complicated probabilistic program,  <code>Y</code> would itself be used to define other random variables, so you will want to call its continuation, and so on until you hit a variable whose continuation returns  <code>nothing</code>. This calls for some recursion. We define a method
</p>
 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">draw</span>(d:: <span style="color: #b58900;">LatentDistribution</span>)
     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Sample from the given distribution
</span>    x = rand(d.d)

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Call the continuation with the sampled value
</span>     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">and draw from that distribution
</span>    draw(d.f(x)) 
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
We also need a method for when we hit  <code>nothing</code>
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #268bd2;">draw</span>(:: <span style="color: #b58900;">Nothing</span>) =  <span style="color: #268bd2; font-weight: bold;">nothing</span>
</pre>
</div>

 <p>
We have a problem, though. If you run  <code>draw(model)</code> using the model defined above, you will find that it returns  <code>nothing</code>. We need to save the random variables that we have sampled. We can do this using a named tuple that associates the symbol of each  <code>LatentDistribution</code> ( <code>model.s</code>) with its sampled value.
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">draw</span>(d:: <span style="color: #b58900;">LatentDistribution</span>)
     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Sample from the given distribution
</span>    x = rand(d.d)

    (;d.s => x,  <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Store the sampled value
</span>     draw(d.f(x))...)  <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Recurse
</span> <span style="color: #859900; font-weight: bold;">end</span>

 <span style="color: #268bd2;">draw</span>(:: <span style="color: #b58900;">Nothing</span>) = (;)  <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Return an empty named tuple</span>
</pre>
</div>

 <p>
And now, if we run  <code>draw(model)</code>, we get something like
</p>

 <div class="org-src-container">
 <pre class="src src-julia">(x = -0.2817850808916265, y = 0.6312531437930013)
</pre>
</div>
</div>
</section> <section id="outline-container-orgd3f9398" class="outline-2"> <h2 id="orgd3f9398">Computing the probability</h2>
 <div class="outline-text-2" id="text-orgd3f9398">
 <p>
The next thing we'll need to do is to compute the (log) probability density for a given value sampled from the probabilistic program. For a basic  <code>d::Distribution</code>, we do this with  <code>logpdf(d,x)</code>. For our probabilistic program, we recurse again, combining log probabilities by adding them:
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span> Distributions. <span style="color: #268bd2;">logpdf</span>(d:: <span style="color: #b58900;">LatentDistribution</span>,θ)
     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Extract the variable corresponding to the current
</span>     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">distribution
</span>    x = θ[d.s]

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Compute the logpdf of the current variable
</span>    logpdf(d.d,x) +
         <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Recurse
</span>        logpdf(d.f(x),θ)
 <span style="color: #859900; font-weight: bold;">end</span>
Distributions. <span style="color: #268bd2;">logpdf</span>(:: <span style="color: #b58900;">Nothing</span>,θ) = 0  <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Start accumulating probability from 0</span>
</pre>
</div>

 <p>
This is exactly the same structure as our  <code>draw</code> function, except
</p>

 <ol class="org-ol"> <li>We call  <code>logpdf</code> rather than  <code>rand</code>.</li>
 <li>We initialize the recursion with 0 rather than an empty named tuple.</li>
 <li>We combine the log probabilities by adding rather than concatenating.</li>
</ol></div>
</section> <section id="outline-container-orge85b842" class="outline-2"> <h2 id="orge85b842">Conditioning on observations</h2>
 <div class="outline-text-2" id="text-orge85b842">
 <p>
The final thing we want to do is statistical inference, estimating the latent variables given the values of observed random variables. We can do this with a new type representing observed variables
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">struct</span>  <span style="color: #b58900;">ObservedDistribution</span>
    f
    s
    d
    y
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
which is identical to  <code>LatentDistribution</code>, except it has a field  <code>y</code> that gives the value of the observation. We need to implement our sampling and log probability interpreters for  <code>ObservedDistribution</code>
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">draw</span>(d:: <span style="color: #b58900;">ObservedDistribution</span>)
    y = d.y
    (;d.s=>y,draw(d.f(y))...)
 <span style="color: #859900; font-weight: bold;">end</span>
 <span style="color: #859900; font-weight: bold;">function</span> Distributions. <span style="color: #268bd2;">logpdf</span>(d:: <span style="color: #b58900;">ObservedDistribution</span>,θ)
    loglikelihood(d.d,d.y) + logpdf(d.f(d.y),θ)
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
For sampling, we just return the observed value, while for log probability, we use the  <code>loglikelihood</code> function from  <code>Distributions</code>. This is just like the  <code>logpdf</code> function, but computes the log probability for multiple independent and identically distributed observations, which is convenient.
</p>

 <p>
Now we can write a model like
</p>

 <div class="org-src-container">
 <pre class="src src-julia">model = LatentDistribution( <span style="color: #268bd2; font-weight: bold;">:X</span>,Normal(0,1.0))  <span style="color: #859900; font-weight: bold;">do</span> X
    ObservedDistribution( <span style="color: #268bd2; font-weight: bold;">:Y</span>,Normal(X,1.0),[1.0;-0.2;0.3])  <span style="color: #859900; font-weight: bold;">do</span> Y
     <span style="color: #859900; font-weight: bold;">end</span>
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
and sampling and log probability calculations will work.
</p>
</div>
</section> <section id="outline-container-org4fef8bd" class="outline-2"> <h2 id="org4fef8bd">Markov chain Monte Carlo sampling</h2>
 <div class="outline-text-2" id="text-org4fef8bd">
 <p>
There are many ways to approach inference in probabilistic programs, but we will focus on sampling from the posterior using Markov chain Monte Carlo sampling. Gibbs sampling samples each random variable in turn from its conditional distribution given all of the other distributions. This is only analytically possible for certain probability distributions, so we will instead sample from a different distribution, the proposal distribution, and then use rejection sampling to correct for the fact that the proposal is not necessarily the appropriate conditional distribution. This Metropolis-within-Gibbs sampling is fairly flexible, easy to implement, and lets us design efficient proposals for different parts of our model. The downside is that the proposal design is challenging to automate, so you'll need to do it by hand in our tiny PPL.
</p>

 <p>
First, we will store our proposal distributions in a  <code>Dict</code> from the symbols of each random variable to a function that takes a parameter value and returns a  <code>Distribution</code>:
</p>

 <div class="org-src-container">
 <pre class="src src-julia">proposals = Dict( <span style="color: #268bd2; font-weight: bold;">:X</span> => θ -> Normal(θ.X,0.01))
</pre>
</div>

 <p>
This way the proposal distributions can depend on the current value of all of the sampled variables. For example, the true conditional probability distribution for our two-level normal model can be found analytically
</p>

 <div class="org-src-container">
 <pre class="src src-julia">proposals = Dict( <span style="color: #268bd2; font-weight: bold;">:X</span> => θ -> Normal(1/(1 + length(θ.Y)) * sum(θ.Y),inv(sqrt(1 + length(θ.Y)))))
</pre>
</div>

 <p>
Now our Gibbs sampling function will take a probabilistic program, the current value of the parameters, and the  <code>proposals</code>  <code>Dict</code>. For  <code>nothing</code> and  <code>ObservedDistribution</code>, we don't need to sample anything, so we just return the current parameters, and recurse if we need to.
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #268bd2;">gibbs</span>(:: <span style="color: #b58900;">Nothing</span>,θ,proposals) = θ
 <span style="color: #268bd2;">gibbs</span>(d:: <span style="color: #b58900;">ObservedDistribution</span>,θ,proposals) = gibbs(d.f(d.y),θ,proposals)
</pre>
</div>

 <p>
For the  <code>LatentDistribution</code>, we need to implement the Metropolis-Hastings transition kernel
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">gibbs</span>(d:: <span style="color: #b58900;">LatentDistribution</span>,θ,proposals)
     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Extract the current variable
</span>    x = θ[d.s]

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Construct the proposal distribution
</span>    q = proposals[d.s](θ)

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Sample from the proposal distribution
</span>    x′ = rand(q)
    θ′ = (;θ...,d.s=>x′)

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Construct the reversed proposal
</span>    q′ = proposals[d.s](θ′)

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Compute the log acceptance ratio
</span>    α = logpdf(d,θ′) + logpdf(q′,x) - logpdf(d,θ) - logpdf(q,x′)

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Rejection sampling
</span>     <span style="color: #859900; font-weight: bold;">if</span> log(rand()) < α
         <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Accept the proposal
</span>         <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">and recurse
</span>         <span style="color: #859900; font-weight: bold;">return</span> gibbs(d.f(x′),θ′,proposals)
     <span style="color: #859900; font-weight: bold;">else</span>
         <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Reject the proposal
</span>         <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">and recurse
</span>         <span style="color: #859900; font-weight: bold;">return</span> gibbs(d.f(x),θ,proposals)
     <span style="color: #859900; font-weight: bold;">end</span>
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
There is one trick here that works even though it is technically wrong. When we call  <code>logpdf(d,θ′)</code> and  <code>logpdf(d,θ)</code>, we only compute the log probability for the variables of the model below the current variable in the chain of continuations. This is okay because the log probability of the other variables can't depend on the current variable. Otherwise we couldn't write the probabilistic program. Since only the current variable changes under the proposal, the log probability of the variables that don't depend on it is just a constant that cancels out in the acceptance ratio, so this works.
</p>
</div>
</section> <section id="outline-container-org10c45d7" class="outline-2"> <h2 id="org10c45d7">Example</h2>
 <div class="outline-text-2" id="text-org10c45d7">
 <p>
As an example, we will fit the following Bayesian linear regression to some synthetic data
</p>

\begin{align}
\beta &\sim \mathcal{N}(0,I) \\
\tau &\sim \Gamma(2,1) \\
Y | X,\beta,\tau &\sim \mathcal{N}(X\beta,\tau^{-1})
\end{align}

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Generate some synthetic data
</span>N = 100
x = range(-1,1,length=N)
X = [one.(x) x]
β0 = [1.0;-1.0]
σ0 = 1.0

Y = X * β0 .+ σ0 * randn(N)

 <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Define the model
</span>model = LatentDistribution( <span style="color: #268bd2; font-weight: bold;">:β</span>,MvNormal(Diagonal(ones(2))))  <span style="color: #859900; font-weight: bold;">do</span> β
    LatentDistribution( <span style="color: #268bd2; font-weight: bold;">:τ</span>,Gamma(2,1))  <span style="color: #859900; font-weight: bold;">do</span> τ
        ObservedDistribution( <span style="color: #268bd2; font-weight: bold;">:Y</span>,MvNormal(X*β,inv(sqrt(τ))),Y)  <span style="color: #859900; font-weight: bold;">do</span> Y
         <span style="color: #859900; font-weight: bold;">end</span>
     <span style="color: #859900; font-weight: bold;">end</span>
 <span style="color: #859900; font-weight: bold;">end</span>

 <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Define the proposal distributions
</span>proposals = Dict( <span style="color: #268bd2; font-weight: bold;">:β</span> => θ -> MvNormalCanon(θ.τ*X'θ.Y,θ.τ*X'X+I),
                  <span style="color: #268bd2; font-weight: bold;">:τ</span> => θ -> Gamma(2 + length(Y)/2,inv(1 + sum(abs2,θ.Y .- X*θ.β)/2)))

 <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Draw an initial value from the model
</span>θ0 = draw(model)

 <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Run 100000 Gibbs steps
</span>θs = accumulate((θ,i)->gibbs(model,θ,proposals),1:100000,init=θ0)

βs = mapreduce(x->x.β,hcat,θs)
τs = map(x->x.τ,θs)

 <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Plot the results
</span>fig = Figure()
ax1 = Axis(fig[1,1],xlabel= <span style="color: #2aa198;">"x"</span>,ylabel= <span style="color: #2aa198;">"Y"</span>)
scatter!(ax1,x,Y)
ax2 = Axis(fig[2,1],xlabel= <span style="color: #2aa198;">"β"</span>)
density!(ax2,βs[1,:])
density!(ax2,βs[2,:])
vlines!(ax2,[β0[1]])
vlines!(ax2,[β0[2]])
ax3 = Axis(fig[3,1],xlabel= <span style="color: #2aa198;">"τ"</span>)
density!(ax3,τs)
vlines!(ax3,[inv(σ0^2)])
save( <span style="color: #2aa198;">"linear_regression.png"</span>,fig)
</pre>
</div>


 <figure id="orga052707"> <img src="../images/linear_regression.png" alt="linear_regression.png"></img></figure></div>
</section> <section id="outline-container-org9b23c05" class="outline-2"> <h2 id="org9b23c05">Conclusion</h2>
 <div class="outline-text-2" id="text-org9b23c05">
 <p>
So there you have a rudimentary probabilistic programming language in only a  <a href="../code/ppl.jl">few lines of Julia</a>. Drawing random samples, computing log probabilities and Metropolis-within-Gibbs sampling from the posterior distribution are all just different interpreters of the same probabilistic program. We could conceivably implement other inference algorithms like Hamiltonian Monte Carlo or variational inference just by walking down the chain of continuations and accumulating the necessary information at each step.
</p>

 <p>
There are many limitations to our tiny PPL. We have to write out the continuations explicitly in the  <code>do</code> notation syntax. It doesn't support the stochastic control flow structures that distinguish true probabilistic programs from basic probabilistic models. It probably also doesn't perform very well with complicated models and big data.
</p>


 <p>
I learned a lot from the following three references, which I highly recommend if you are interested in the inner workings of PPLs.
</p>

 <ul class="org-ul"> <li>Noah D. Goodman and Andreas Stuhlmüller. The Design and Implementation of Probabilistic Programming Languages.  <a href="http://dippl.org/">http://dippl.org/</a></li>
 <li>Jan-Willem van de Meent et al. An Introduction to Probabilistic Programming.  <a href="https://arxiv.org/abs/1809.10756">https://arxiv.org/abs/1809.10756</a></li>
 <li>Jonathan Law and Darren Wilkinson. Functional probabilistic programming for scalable Bayesian modelling.  <a href="https://arxiv.org/abs/1908.02062">https://arxiv.org/abs/1908.02062</a></li>
</ul></div>
</section> <div id="footnotes">
 <h2 class="footnotes">Footnotes: </h2>
 <div id="text-footnotes">

 <div class="footdef"> <sup> <a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink">1</a></sup> <div class="footpara" role="doc-footnote"> <p class="footpara">This idea comes from Goodman and Stuhlmueller's  <a href="http://dippl.org/">Design and Implementation of Probabilistic Programming Languages</a></p></div></div>


</div>
</div></div></article></main>]]></content>
  <link href="https://www.wskearney.com/posts/ppl.html"/>
  <id>https://www.wskearney.com/posts/ppl.html</id>
  <updated>2022-12-08T00:00:00+01:00</updated>
</entry>
<entry>
  <title>Geographical imagination systems</title>
  <author><name>William Kearney</name></author>
  <content type="html"><![CDATA[<main id="content" class="content"> <article class="h-entry"> <h1 class="p-name title">Geographical imagination systems</h1>
 <p> <span class="byline">Published on  <a class="u-url u-uid" href="https://www.wskearney.com/posts/geographical-imagination-systems.html"> <time class="dt-published date" datetime="2022-11-30T19:49:00+0100">2022-11-30</time></a> by  <a class="p-author h-card" href="https://www.wskearney.com">William Kearney</a></span></p>
 <p> <span class="tags">Tags: 
 <a href="/tags/gis.html" rel="tag" class="p-category">gis</a>
</span></p> <div class="e-content">
 <p>
Luke Bergmann and Nick Lally, in their article "For geographical imagination systems" ( <a href="https://doi.org/10.1080/24694452.2020.1750941">paywalled link</a>,  <a href="http://www.nicklally.com/wp-content/uploads/2020/06/bergmannLallyforgis.pdf">pdf</a> via Nick Lally), state that
</p>

 <blockquote>
 <p>
knowledge that was once entwined with particular knowers and communities and contexts is first alienated into separate data layers, then those layers are then reintegrated by GISystems according to common location.
</p>
</blockquote>
 <p>
(Bergmann and Lally, p. 8)
</p>

 <p>
As a result, GIS are built around some absolute coordinate system to which all data must be referenced before we can analyze them or create visualizations. GIS in this paradigm are essentially UI wrappers around libraries like  <a href="https://proj.org/">PROJ</a> and  <a href="https://gdal.org/">GDAL</a> that convert data from different coordinate systems and formats into a common reference frame. Information that doesn't fit neatly into a coordinate either needs to be forced into the GIS or discarded altogether.
</p>

 <p>
Bergmann and Lally present a prototype "geographical imagination system"  <a href="https://github.com/FoldingSpace/enfolding">enfolding</a> that challenges this paradigm by building visualizations of geographic information using non-Euclidean distance metrics that highlight relations between phenomena that coexist with traditional spatial relations. I think they are mostly interested addressing challenges that arise with GIS in human geography, but their articulation of these limitations of GIS illuminates some of the difficulties I have encountered in dealing with geospatial data in the geosciences. Earth observations are noisy, incomplete windows into the world. When we try to build and test models using these data, we need to be aware of how our inferences might be affected by the circumstances of their acquisition and processing. Our GIS generally don't make this easy for us. But they could.
</p>

 <p>
One possibility that I have been thinking about recently is replacing the GIS notion of data layers with a  <a href="https://en.wikipedia.org/wiki/Graphical_model">probabilistic graphical model</a> (PGM) in which data is represented by a collection of vertices that represent the different quantities of interest, including both observed and unobserved, and edges that encode hypothesized probabilistic relationships between quantities. Some of those quantities might be spatial coordinates associated with a particular object, but every quantity need not have a spatial representation. Multiple spatial representations can coexist, with deterministic coordinate transformations replaced by probabilistic mappings. Because everything is framed as a PGM, it could be written in a  <a href="http://dippl.org/">probabilistic programming language</a> to facilitate automatic statistical inference, uncertainty quantification and model validation and comparison.
</p>

 <p>
I would really like GIS software that centers model building as an interpretive tool, and I think this PGM approach has some potential to do that. Different researchers working with the same data may embed them in different models, so that a dataset need not have a single, fixed meaning. Visualization, which often seems to be the central focus of GIS, becomes  <a href="https://arxiv.org/abs/1709.01449">a tool to inspect the performance of models</a> and encompasses a much wider range of data visualizations than cartographic ones. A major challenge, I think, is in designing software that allows users to gradually integrate these modeling ideas into their workflow.
</p>
</div></article></main>]]></content>
  <link href="https://www.wskearney.com/posts/geographical-imagination-systems.html"/>
  <id>https://www.wskearney.com/posts/geographical-imagination-systems.html</id>
  <updated>2022-11-30T00:00:00+01:00</updated>
</entry>
<entry>
  <title>Solving the diffusion equation in a semi-infinite domain with the ultraspherical spectral method</title>
  <author><name>William Kearney</name></author>
  <content type="html"><![CDATA[<main id="content" class="content"> <article class="h-entry"> <h1 class="p-name title">Solving the diffusion equation in a semi-infinite domain with the ultraspherical spectral method</h1>
 <p> <span class="byline">Published on  <a class="u-url u-uid" href="https://www.wskearney.com/posts/ultraspherical-diffusion.html"> <time class="dt-published date" datetime="2022-11-22T18:15:00+0100">2022-11-22</time></a> by  <a class="p-author h-card" href="https://www.wskearney.com">William Kearney</a></span></p>
 <p> <span class="tags">Tags: 
 <a href="/tags/numerics.html" rel="tag" class="p-category">numerics</a>
 <a href="/tags/spectralmethods.html" rel="tag" class="p-category">spectralmethods</a>
</span></p> <div class="e-content">
 <p>
One problem I have been interested in recently is numerical methods for computational fluid dynamics in semi-infinite domains, where you have some kind of boundary at \(z=0\), and the domain extends upwards to infinity. This is quite relevant to bottom boundary layer simulations, which typically impose artificial boundary conditions at some large \(z\) value. If you could simulate the problem on the infinite domain, then you can avoid having to worry about whether those artificial boundary conditions are influencing your solution. A model problem that is useful for evaluating numerical methods in these situations is the forced diffusion equation
</p>

 <p>
\[
\frac{\partial u}{\partial t} = \frac{\partial^2 u}{\partial z^2} + \cos(t)
\]
</p>

 <p>
with a homogeneous boundary condition \(u(z=0,t) = 0\). This problem ("Stokes problem") arises when considering a laminar boundary layer driven by an oscillating pressure gradient, and it is useful because it has an analytical solution
</p>

 <p>
\[
u(z,t) = \sin(t) - e^{-\frac{z}{\sqrt{2}}} \sin\left(t - \frac{z}{\sqrt{2}}\right)
\]
</p>

 <p>
to which we can compare our numerical solutions.
</p>

 <p>
To work with the infinite domain, we apply a coordinate transformation ( <a href="http://www-personal.umich.edu/~jpboyd/BOOK_Spectral2000.html">Boyd</a> 2000)
</p>

 <p>
\[
s = \frac{z - 1}{z + 1}
\]
</p>

 <p>
of the interval \(\left[0,\infty\right)\) to \([-1,1)\). This coordinate transformation turns the \(z\) derivative into an \(s\) derivative multiplied by a quadratic polynomial.
</p>

 <p>
\[
\frac{\partial u}{\partial z} = \frac{\left(1 - s\right)^2}{2} \frac{\partial u}{\partial s}
\].
</p>

 <p>
The forced diffusion equation in the new coordinates is
</p>

 <p>
\[
\frac{\partial u}{\partial t} = \frac{\left(1 - s\right)^2}{2} \frac{\partial}{\partial s}\left(\frac{\left(1 - s\right)^2}{2} \frac{\partial u}{\partial s}\right) + \cos(t)
\]
</p>

 <p>
with the boundary condition \(u(s=-1,t) = 0\). Since we are now working on the interval \(s \in [-1,1)\), it makes sense to represent \(u\) using an expansion in Chebyshev polynomials.
</p>

 <p>
\[
u(s,t) = \sum_{k=0}^\infty u_k(t)T_k(s)
\]
</p>

 <p>
The diffusion equation becomes
</p>

 <p>
\[
\frac{\partial \mathbf{u}}{\partial t} = L \mathbf{u} + F(t)
\]
</p>

 <p>
where \(\mathbf{u} = [u_0,u_1,\dots]\) is the vector of Chebyshev coefficients, \(L\) is the matrix representing the action of the second derivative operator on the Chebyshev coefficients. We need to supplement this with the boundary condition, \(\sum_{k=0}^{\infty} (-1)^k u_k = 0\).
</p>

 <p>
The matrix \(L\) can be derived from the recurrence relationships for Chebyshev polynomials. It takes the form \(L = GDGD\) where
</p>

 <p>
\[
D = \begin{bmatrix}
0 & 1 & 0 & 3 & 0 & 5 & \dots \\
0 & 0 & 4 & 0 & 8 & 0 & \dots \\
0 & 0 & 0 & 6 & 0 & 10 & \dots \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\
0 & 0 & 0 & 0 & 0 & 0 & \dots \\
\end{bmatrix}
\]
</p>

 <p>
is the derivative operator and \(G\) represents multiplication by \(g(s) = \frac{1}{2}\left(1 - s\right)^2\). Since that function is a quadratic polynomial, it can be represented by a three term Chebyshev series \(g(s) = \frac{3}{4}T_0(s) - T_1(s) + \frac{1}{4}T_2(s)\), which, because of the recurrence relations of Chebyshev polynomials, means that the matrix \(G\) is a banded matrix ( <a href="https://arxiv.org/abs/1202.1347">Olver and Townsend</a> 2013, p. 7)
</p>

 <p>
\[
G = \begin{bmatrix}
\frac{3}{4} & -\frac{1}{2} & \frac{1}{8} & 0 & 0 & 0 & \dots \\
-1 & \frac{7}{8} & -\frac{1}{2} & \frac{1}{8} & 0 & 0 & \dots \\
\frac{1}{4} & -\frac{1}{2} & \frac{3}{4} & -\frac{1}{2} & \frac{1}{8} & 0 & \dots \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\
\end{bmatrix}
\].
</p>

 <p>
I've written these operators as infinite dimensional ones, but in practice, we truncate the Chebyshev expansion of \(u\) at \(N\) terms, which means we take the first \(N \times N\) block of the infinite dimensional matrix \(L\). Note that if we instead truncate \(G\) and \(D\) by taking the first \(N \times N\) blocks, we will end up with some additional error from the truncation of \(G\). Since we know \(G\) and \(D\) analytically, it is easy enough to work out the operator \(L\) by truncating the operators at some \(M > N\) and then taking the first block of \(L\). For this particular application \(M = N + 1\) is enough to avoid additional truncation errors.
</p>

 <p>
While \(G\) is banded, \(D\) is essentially dense, and the operator \(L = GDGD\) is dense. We can get sufficient numerical results using this operator, but solving the dense linear system \(L\mathbf{u} = b\), as we need to do when we implicitly discretize time, scales as \(\mathcal{O}(N^2)\) when we precompute the LU decomposition of \(L\). This quadratic scaling is problematic when we need to apply this solver many times such as when we are timestepping the diffusion equation.
</p>

 <p>
We can, however, do better using the ultraspherical method of  <a href="https://arxiv.org/abs/1202.1347">Olver and Townsend</a> (2013). This rests on the fact that the derivatives of Chebyshev polynomials are scaled ultraspherical polynomials. Since we need two derivatives, we can convert to the ultraspherical basis of order 2 using the conversion matrices given on p. 8 and p. 12 of Olver and Townsend. This renders the second derivative matrix \(D^2\) diagonal and the operator \(L = S_1S_0GDGD\) banded. Because it is banded, solving the linear system only requires \(\mathcal{O}(N)\) operations at each time step.
</p>
 <section id="outline-container-org40cfaf3" class="outline-2"> <h2 id="org40cfaf3">Time discretization</h2>
 <div class="outline-text-2" id="text-org40cfaf3">
 <p>
There are many time discretizations that we could choose, especially with a simple pressure gradient forcing like \(\cos(t)\). It is common in CFD codes to use implicit-explicit methods that solve the viscous terms implicitly and the advection and forcing terms explicitly. Here we will use a Crank-Nicolson-Adams-Bashforth method (CNAB3) ( <a href="http://www-personal.umich.edu/~jpboyd/BOOK_Spectral2000.html">Boyd</a> 2000, p. 229) that seems to work well.
</p>

 <p>
We end up solving
</p>

 <p>
\[
\left(I - \frac{\Delta t}{2}L\right) u^{n+1} = A u^{n+1} = \left(I - \frac{\Delta t}{2}L\right) u^{n} + \frac{\Delta t}{12} \left(23 F^n - 16 F^{n-1} + 5 F^{n-2}\right) = Bu^n + \frac{\Delta t}{12} \left(23 F^n - 16 F^{n-1} + 5 F^{n-2}\right)
\]
</p>

 <p>
for \(u^{n+1}\).
</p>
</div>
</section> <section id="outline-container-org841965d" class="outline-2"> <h2 id="org841965d">Boundary conditions</h2>
 <div class="outline-text-2" id="text-org841965d">
 <p>
As in Olver and Townsend, we apply the boundary conditions by "boundary bordering," which is equivalent to a Chebyshev tau method (Boyd 2000). Basically we drop the bottom row of the matrix \(A\) and the right-hand side vector and add the boundary condition equation \(\sum_{k=0}^{N-1} (-1)^ku_k = 0\) as the first equation.
</p>
</div>
</section> <section id="outline-container-org6186caf" class="outline-2"> <h2 id="org6186caf">Implementation in Julia</h2>
 <div class="outline-text-2" id="text-org6186caf">
 <p>
The ultraspherical spectral method is implemented within the excellent  <a href="https://github.com/JuliaApproximation/ApproxFun.jl">ApproxFun.jl</a> package, but it is also fairly straightforward to implement using standard library routines for sparse linear algebra.
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">using</span> LinearAlgebra, SparseArrays
</pre>
</div>

 <p>
First, we can create the derivative matrix \(D\) and the multiplication matrix \(G\).
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">chebyshev_derivative_matrix</span>(Nz)
    sparse([(i < j)  <span style="color: #859900; font-weight: bold;">?</span> (i==0 ? 1 : 2)*j*mod(i+j,2)  <span style="color: #859900; font-weight: bold;">:</span> 0  <span style="color: #859900; font-weight: bold;">for</span> i in 0:Nz-1, j  <span style="color: #859900; font-weight: bold;">in</span> 0:Nz-1])
 <span style="color: #859900; font-weight: bold;">end</span>

 <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">chebyshev_multiplication_matrix</span>(Nz)
    G0 = sparse([((i == j+0) + (i == abs(j-0)))//2  <span style="color: #859900; font-weight: bold;">for</span> i in 0:Nz-1, j  <span style="color: #859900; font-weight: bold;">in</span> 0:Nz-1])
    G1 = sparse([((i == j+1) + (i == abs(j-1)))//2  <span style="color: #859900; font-weight: bold;">for</span> i in 0:Nz-1, j  <span style="color: #859900; font-weight: bold;">in</span> 0:Nz-1])
    G2 = sparse([((i == j+2) + (i == abs(j-2)))//2  <span style="color: #859900; font-weight: bold;">for</span> i in 0:Nz-1, j  <span style="color: #859900; font-weight: bold;">in</span> 0:Nz-1])

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">0.5 * (1 - s)^2 = 3//4 T₀ - T₁ + 1//4 T₂
</span>    3//4 * G0 - G1 + 1//4*G2
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
Using  <code>Integer</code> and  <code>Rational</code> types ensures that we can calculate the entries of these matrices without rounding errors.
</p>

 <p>
The ultraspherical conversion matrices are likewise simple:
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #268bd2;">S1</span>(Nz) = spdiagm(0=>[1;[1//(1+k)  <span style="color: #859900; font-weight: bold;">for</span> k in 1:Nz-1]],2 <span style="color: #859900; font-weight: bold;">=</span>>[-1//(1 + k)  <span style="color: #859900; font-weight: bold;">for</span> k  <span style="color: #859900; font-weight: bold;">in</span> 2:Nz-1])
 <span style="color: #268bd2;">S0</span>(Nz) = spdiagm(0=>[2;ones(Int,Nz-1)] .// 2,2=>-ones(Int,Nz-2).//2)
</pre>
</div>

 <p>
We can now assemble the matrices for the left- and right-hand sides of the Crank-Nicolson solver
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">assemble_matrices</span>(Nz,Δt,ultraspherical)
     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Bottom boundary condition
</span>    BC = [(-1)^k  <span style="color: #859900; font-weight: bold;">for</span> k  <span style="color: #859900; font-weight: bold;">in</span> 0:Nz-1]' 

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">We make everything slightly larger to avoid truncation errors
</span>    D = chebyshev_derivative_matrix(Nz + 1)
    G = chebyshev_multiplication_matrix(Nz + 1)
     <span style="color: #859900; font-weight: bold;">if</span> ultraspherical
        Δ = S1(Nz+1)*S0(Nz+1) * G*D*G*D
        A = S1(Nz+1)*S0(Nz+1) - Δt/2 * Δ
        B = S1(Nz+1)*S0(Nz+1) + Δt/2 * Δ
     <span style="color: #859900; font-weight: bold;">else</span>
         <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Assemble the Chebyshev matrices without
</span>         <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">the ultraspherical conversion
</span>        Δ = G*D*G*D
        A = I - Δt/2 * Δ
        B = I + Δt/2 * Δ
     <span style="color: #859900; font-weight: bold;">end</span>


     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Apply boundary bordering and truncate properly
</span>    [BC;A[1:Nz-1,1:Nz]],B[1:Nz,1:Nz]
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
Our time step function simply assembles the right-hand side and then solves the equation \(A u = b\). We will compute the LU decomposition of \(A\) before running the model, so we can use the in-place division function  <code>ldiv!</code> to avoid some memory allocation. Since our pressure gradient forcing is spatially constant, we can add the forcing function to the first element of the right-hand side vector, which represents the ultraspherical coefficient for the constant function. To apply the boundary condition, we also use a trick that avoids having to allocate the vector  <code>[0;RHS[1:end-1]]</code>.
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">timestep!</span>(un,u,A,B,Δt,t)
    RHS = B*u
    RHS[1] += Δt/12 * (23 * cos(t) - 16 * cos(t - Δt) + 5 * cos(t - 2Δt))

    RHS[ <span style="color: #859900; font-weight: bold;">end</span>] = 0
    circshift!(RHS,-1)
    ldiv!(un,A,RHS)
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
To run the model for a fixed number of timesteps, we preallocate the output and loop
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">run_model</span>(U0,A,B,Δt,Nt)
    U = zeros(length(U0),Nt+1)
    U[:,1] = U0
     <span style="color: #859900; font-weight: bold;">for</span> i  <span style="color: #859900; font-weight: bold;">in</span> 1:Nt
        timestep!(view(U,:,i+1),view(U,:,i),A,B,Δt,(i-1)*Δt)
     <span style="color: #859900; font-weight: bold;">end</span>
    U
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
We will also want the analytical solution for comparison and for our initial conditions
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">stokes</span>(z,t)
    sin(t) - exp(-z/sqrt(2))*sin(t - z/sqrt(2))
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
Since our result will be a vector of Chebyshev coefficients, we will also want to convert these to the values on a grid. There are multiple ways to do this, and you would normally use a fast cosine transform to implement the inverse Chebyshev transform. However, we only need to do this conversion twice, to compute the Chebyshev coefficients of the initial conditions and to compute the solution on the grid. The simplest way to do this is to compute a matrix of the Chebyshev functions using their recurrence relations
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">chebyshev_matrix</span>(x,P=length(x))
    N = length(x)
    T = zeros(N,P)
    T[:,1] .= 1
    T[:,2] .= x
    T[:,3] .= 2x.^2 .- 1
     <span style="color: #859900; font-weight: bold;">for</span> i  <span style="color: #859900; font-weight: bold;">in</span> 3:P-1
        T[:,i+1] = 2*x.*T[:,i] .- T[:,i-1]
     <span style="color: #859900; font-weight: bold;">end</span>
    T
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
And lastly, we need to compute the error in our solution, which we can do easily using Gauss-Chebyshev quadrature if we represent our solution on the Chebyshev roots grid.
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">quadrature_error</span>(z,u,u0)
    N = length(u)
    s = (z .- 1) ./ (z .+ 1)

    f = abs2.(u .- u0) .* sqrt.(1 .- s.^2)

    π*sum(f)/N
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
Finally we wrap it all together. We'll solve the diffusion equation, but also time the solution and compute the error.
</p>

 <div class="org-src-container">
 <pre class="src src-julia"> <span style="color: #859900; font-weight: bold;">function</span>  <span style="color: #268bd2;">run_test</span>(Nz,Δt,Nt,Nq=1024;ultraspherical= <span style="color: #268bd2; font-weight: bold;">true</span>)

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Chebyshev roots grid and transformed grid
</span>     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Note that we use a high resolution grid here
</span>    s = [cospi((2k + 1)/2Nq)  <span style="color: #859900; font-weight: bold;">for</span> k  <span style="color: #859900; font-weight: bold;">in</span> 0:Nq-1]
    z = (1 .+ s) ./ (1 .- s)

    T = chebyshev_matrix(s,Nq)

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Initial conditions
</span>    u0 = stokes.(z,0.0)
     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Forward Chebyshev transform to compute coefficients
</span>     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Truncate from the high-order approximation
</span>    U0 = (T\u0)[1:Nz]

    A,B = assemble_matrices(Nz,Δt,ultraspherical)
    Al = lu(A)

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Run once to compute the results
</span>    U = run_model(U0,Al,B,Δt,Nt)

     <span style="color: #93a1a1;"># </span> <span style="color: #93a1a1;">Run again to time the solver
</span>    t =  <span style="color: #268bd2;">@elapsed</span> run_model(U0,Al,B,Δt,Nt)

    u = T[:,1:Nz]*U
    err = [quadrature_error(z,u[:,i],stokes.(z,(i-1)*Δt))  <span style="color: #859900; font-weight: bold;">for</span> i  <span style="color: #859900; font-weight: bold;">in</span> 1:size(u,2)]
    u,t,err
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>
</div>
</section> <section id="outline-container-orge4c8e5d" class="outline-2"> <h2 id="orge4c8e5d">Results</h2>
 <div class="outline-text-2" id="text-orge4c8e5d">
 <p>
If we run the model with  <code>Nz = 128</code> and  <code>Δt = 2π*0.01</code> for a single cycle, our results look something like this
</p>

 <div class="org-src-container">
 <pre class="src src-julia">Nz = 128
Δt = 2π*0.01
Nt = 100
Nq = 1024

u1,t1,err1 = run_test(Nz,Δt,Nt,Nq)

s1 = [cospi((2k + 1)/2Nq)  <span style="color: #859900; font-weight: bold;">for</span> k  <span style="color: #859900; font-weight: bold;">in</span> 0:Nq-1]
z1 = (1 .+ s1) ./ (1 .- s1)

 <span style="color: #859900; font-weight: bold;">using</span> CairoMakie
tidx = Observable(1)

uo = lift(tidx)  <span style="color: #859900; font-weight: bold;">do</span> i
    u1[:,i]
 <span style="color: #859900; font-weight: bold;">end</span>

fig = Figure()
ax = Axis(fig[1,1],ylabel= <span style="color: #2aa198;">"Height"</span>,xlabel= <span style="color: #2aa198;">"Velocity"</span>)
lines!(ax,uo,z1,color= <span style="color: #268bd2; font-weight: bold;">:black</span>)
ylims!(ax,0,10)
xlims!(ax,-1.1,1.1)

record(fig,  <span style="color: #2aa198;">"ultraspherical_diffusion.mp4"</span>, 1:size(u1,2);framerate=30)  <span style="color: #859900; font-weight: bold;">do</span> i
    tidx[] = i
 <span style="color: #859900; font-weight: bold;">end</span>
</pre>
</div>

 <p>
 <a href="../images/ultraspherical_diffusion.mp4">A video of a single period of the oscillating boundary layer flow</a>
</p>

 <p>
At this resolution, the numeric and analytic solutions are basically identical, so I have only plotted the numeric solution.
</p>

 <p>
We can also run it at several resolutions to see how the computational time and the error scales with the grid size. We will run it for 10 cycles just to make sure that it is stable for long integration times. We will also run the Chebyshev method which results in dense matrices to compare its performance to the ultraspherical method.
</p>

 <div class="org-src-container">
 <pre class="src src-julia">Nzs = 16:16:256
res0 = [run_test(Nz,Δt,10*Nt,Nq;ultraspherical= <span style="color: #268bd2; font-weight: bold;">false</span>)  <span style="color: #859900; font-weight: bold;">for</span> Nz  <span style="color: #859900; font-weight: bold;">in</span> Nzs]
res1 = [run_test(Nz,Δt,10*Nt,Nq;ultraspherical= <span style="color: #268bd2; font-weight: bold;">true</span>)  <span style="color: #859900; font-weight: bold;">for</span> Nz  <span style="color: #859900; font-weight: bold;">in</span> Nzs]
us0,ts0,err0 = map(x->x[1],res0),map(x->x[2],res0),map(x->x[3][ <span style="color: #859900; font-weight: bold;">end</span>],res0)
us1,ts1,err1 = map(x->x[1],res1),map(x->x[2],res1),map(x->x[3][ <span style="color: #859900; font-weight: bold;">end</span>],res1)


fig2 = Figure()
ax1 = Axis(fig2[1,1],ylabel= <span style="color: #2aa198;">"Time (s)"</span>,xlabel= <span style="color: #2aa198;">"Grid size"</span>)
scatter!(ax1,Nzs,ts0,label= <span style="color: #2aa198;">"Chebyshev"</span>,marker= <span style="color: #268bd2; font-weight: bold;">:circle</span>)
scatter!(ax1,Nzs,ts1,label= <span style="color: #2aa198;">"Ultraspherical"</span>,marker= <span style="color: #268bd2; font-weight: bold;">:diamond</span>)

axislegend(ax1,position= <span style="color: #268bd2; font-weight: bold;">:lt</span>)
ax2 = Axis(fig2[2,1],ylabel= <span style="color: #2aa198;">"Error"</span>,xlabel= <span style="color: #2aa198;">"Grid size"</span>,yscale=log10)
scatter!(ax2,Nzs,err0,label= <span style="color: #2aa198;">"Chebyshev"</span>,marker= <span style="color: #268bd2; font-weight: bold;">:circle</span>)
scatter!(ax2,Nzs,err1,label= <span style="color: #2aa198;">"Ultraspherical"</span>,marker= <span style="color: #268bd2; font-weight: bold;">:diamond</span>)
axislegend(ax2,position= <span style="color: #268bd2; font-weight: bold;">:rt</span>)


save( <span style="color: #2aa198;">"ultraspherical_scaling.png"</span>,fig2)
</pre>
</div>


 <figure id="orga92340e"> <img src="../images/ultraspherical_scaling.png" alt="ultraspherical_scaling.png"></img></figure> <p>
We can see that the ultraspherical method scales linearly with N while the Chebyshev method scales quadratically. The Chebyshev method has a lower error at small grid sizes, but the error converges between the two methods by N=64. At that point, most of the error is due to the time discretization, and it can be decreased further by decreasing the time step.
</p>

 <p>
Spectral methods are really neat ways to solve PDEs accurately without a ton of effort, and the ultraspherical spectral method helps prevent the scaling issues that you get with a pure Chebyshev method. Using a coordinate transformation, it is also really easy to handle the semi-infinte domain that we want for theoretical bottom boundary layer studies. It is pretty straightforward to build an incompressible Navier-Stokes solver in this kind of framework, especially if we use periodic boundary conditions in the horizontal directions. The problem decouples into a set of vertical PDEs like our diffusion equation for each Fourier coefficient. One does have to address the incompressibility condition and the pressure computation, and we might see how that works later. Stay tuned!
</p>

 <p>
You can find the Julia code contained in this file  <a href="../code/ultraspherical_diffusion.jl">here</a>.
</p>
</div>
</section></div></article></main>]]></content>
  <link href="https://www.wskearney.com/posts/ultraspherical-diffusion.html"/>
  <id>https://www.wskearney.com/posts/ultraspherical-diffusion.html</id>
  <updated>2022-11-22T00:00:00+01:00</updated>
</entry>
<entry>
  <title>Hello, world!</title>
  <author><name>William Kearney</name></author>
  <content type="html"><![CDATA[<main id="content" class="content"> <article class="h-entry"> <h1 class="p-name title">Hello, world!</h1>
 <p> <span class="byline">Published on  <a class="u-url u-uid" href="https://www.wskearney.com/posts/hello-world.html"> <time class="dt-published date" datetime="2022-11-18T14:47:00+0100">2022-11-18</time></a> by  <a class="p-author h-card" href="https://www.wskearney.com">William Kearney</a></span></p>
 <p> <span class="tags">Tags: 
 <a href="/tags/admin.html" rel="tag" class="p-category">admin</a>
</span></p> <div class="e-content">
 <p>
Welcome to my personal website!
</p>

 <p>
Subscribe to the  <a href="../atom.xml">Atom feed</a> to keep up with additional updates.
</p>
</div></article></main>]]></content>
  <link href="https://www.wskearney.com/posts/hello-world.html"/>
  <id>https://www.wskearney.com/posts/hello-world.html</id>
  <updated>2022-11-18T00:00:00+01:00</updated>
</entry>
<entry>
  <title>New paper on The Naval Seafloor Evolution Architecture</title>
  <author><name>William Kearney</name></author>
  <content type="html"><![CDATA[<main id="content" class="content"> <article class="h-entry"> <h1 class="p-name title">New paper on The Naval Seafloor Evolution Architecture</h1>
 <p> <span class="byline">Published on  <a class="u-url u-uid" href="https://www.wskearney.com/posts/nsea.html"> <time class="dt-published date" datetime="2022-11-18T16:50:00+0100">2022-11-18</time></a> by  <a class="p-author h-card" href="https://www.wskearney.com">William Kearney</a></span></p>
 <p> <span class="tags">Tags: 
 <a href="/tags/benthos.html" rel="tag" class="p-category">benthos</a>
 <a href="/tags/research.html" rel="tag" class="p-category">research</a>
</span></p> <div class="e-content">
 <p>
Allison Penko and I just published a Naval Research Laboratory Technical Report about our seafloor evolution modeling framework. You can read the whole report at  <a href="https://apps.dtic.mil/sti/citations/AD1183343">DTIC</a> or  <a href="https://arxiv.org/abs/2211.09092">arXiv</a>.
</p>

 <p>
The core model isn't new ( <a href="https://dx.doi.org/10.1029/2006JC003811">Traykovski (2007)</a>,  <a href="https://dx.doi.org/10.1007/s10236-014-0801-y">Nelson and Voulgaris (2015)</a>,  <a href="https://dx.doi.org/10.1109/JOE.2016.2622458">Penko et al. (2017)</a>), but I've been working over the last few years on extending its capabilities, turning the Naval Seafloor Evolution Model (NSEM) into the Naval Seafloor Evolution Architecture (NSEA). We have replaced the original Fortran implementation with one written in Julia, which lets us easily do some cool things such as running NSEA on the GPU using  <a href="https://github.com/JuliaGPU/GPUArrays.jl">GPUArrays.jl</a>. I've also developed tools for statistical inference that use seafloor roughness observations to estimate sediment transport parameters like the sediment grain size and the critical shear stress.
</p>

 <p>
I am especially proud of the section that draws out the connection between NSEM and a particular stochastic sediment flux model. NSEM doesn't model the actual transport of sediment and formation of bedforms. Instead, it uses some heuristics, scaling arguments and empirical parameterizations to model the evolution of the power spectrum of the seafloor roughness. This works surprisingly well when the seafloor roughness is predominantly small-scale ripples formed by wind waves, but it runs into trouble when you want to apply it to different settings like current-driven ripples or bedforms created by internal waves because those empirical parameterizations fail in these situations.
</p>

 <p>
The power spectrum is a statistical description of the seafloor roughness: it defines a probability distribution over the seafloor elevation, and we can ask what sediment transport models will generate a similar probability distribution. The simplest one is a kind of stochastic heat equation, where the sediment flux is proportional to the gradient in the seafloor elevation augmented by a random flux with a particular correlation function. That correlation function is determined by the empirical ripple geometry parameterization in the model and is basically related to the size of the waves at the seafloor. If you work out the power spectrum of a seafloor governed by this stochastic heat equation, it evolves in time just like NSEM says it does. In other words, NSEM is what you get if you average over the random sediment flux in this stochastic heat equation.
</p>

 <p>
This realization is not particularly useful on its own because our observations are typically too coarse to resolve the evolution of the seafloor at the fast time scales of the stochastic sediment flux. We are typically better off averaging over the random flux and directly modeling the statistical properties of the seafloor like the power spectrum or characteristic length scales for the ripples. However, when we think about extending NSEM to different kinds of hydrodynamic and sediment transport processes, we don't want to just tack new empirical parameterizations onto the already somewhat unwieldy NSEM. Instead, we might be able to start by devising a high-resolution model for the sediment flux that respects the fundamental physics of these processes, but that represents the complex turbulent interactions between the bottom boundary layer and the seafloor with appropriate stochastic processes. Averaging over that randomness, we derive a new version of NSEM corresponding to the given sediment flux model. We can then compare the predictions of NSEM with high spatial resolution observations from imaging sonars to test different stochastic formulations.
</p>

 <p>
This stochastic approach is one way to bridge the gap between small scales, where we can faithfully model the physics of sediment transport, and larger scales, where the effects of seafloor roughness on hydrodynamics or acoustics are felt, while systematically accounting for the uncertainty generated by unresolved processes.
</p>

 <p>
If you have any thoughts or questions, feel free to  <a href="../contact.html">let me know</a>!
</p>
</div></article></main>]]></content>
  <link href="https://www.wskearney.com/posts/nsea.html"/>
  <id>https://www.wskearney.com/posts/nsea.html</id>
  <updated>2022-11-18T00:00:00+01:00</updated>
</entry>
</feed>
