In 2016 Tom Keelin published the first paper describing metalogs. Metalogs are just that - a distribution that comes after you've struggled to find the correct distribution to fit your data... kidding aside, metalogs encompass 60+ traditional continuous distributions in one technique that supports:

- virtually unlimited shape flexibility
- choice of unbounded, semi-bounded, and bounded distributions
- ease of fitting to data with linear least squares;
- simple closed-form quantile function (inverse CDF) equations
- easy to use in interactive simulations;
- simple closed-form PDF
- Bayesian updates in closed form
- flexible number of terms depending on the degree of shape flexibility needed

If any of that sounded Greek to you think about it this way, we can 'fit' data using a metalog and we don't need to do as much guessing as to which dead statistician distribution has the best fit for our data. We can turn to one formula for virtually any shape.

Source Wikipedia

So while we don't need to remember which of the 60+ dead statistician distributions to use we do need to be smart about the parameters used in setting up a metalog as there are a few variation. Technically there are symmetric-percentile triplet (SPT) metalogs, general (quantiled) metalogs and generalized metalogs but let's think of it as two main types: Quantiled and Generalized and at that let's focus on the easiest to get started with, a type of quantiled metalog, the SPT metalog.

The image above is from a recent study of DeFi smart contract exploit times. The data set I used in the study was assembled by the team at Solace, stewards of a smart contract insurance protocol. We measured the historical time that had passed from contract release to the exploit occurred. Using the SPT metalogs, which takes in parameters for the upper and lower bounds, if any exist. For example. a lower value of 0 days was used in our study and no upper limit. Three symmetric percentiles are also needed: the median (50th percentile), and the symmetric high and low probabilities. For example the p10 for the lower, p50 for the median, and p90 for the upper. Our p90 was over ten and a half months. The user could fit the metalog with p5-p50-p95 or whatever percentiles they want if they are symmetric to the median. This can come in handy if you are looking tail risk.

We use this distribution to help figure out things like policy pricing. Having a sense of what's happened historically across smart contracts, protocols and networks provides 'a' probabilistic input for policy pricing decisions. Looking at the image above, if we changed the probability to another value we'd learn that the time to contract exploit is x months old or less. (YMMD, DeFi is a young industry!)

There is another variation of the quantiled metalog, it's called the ELD or equally likely distance. I like to think of it as the method that does some of the heavy lifting for me. ELD is great for things like data streams of crypto/token percentage price changes which are used to shape volatility. The ELD metalog generates aCoefficients and those aCoeffs are used to calculate the value at any probability in the distribution. Why settle of someone's average volatility when you can have the entire distribution in one formula and a few coefficients? The image below shows the probabilistic daily price range for two cryptocurrencies based on metalog distributions that captured the shape of each token's historical price movements.

There are two situations when I turn to use ELD metalogs: 1) when we already have a Monte Carlo producing trials 'somewhere' (from some subject matter experts?) OR when it's ok to use historical data observations. I simply put the array/vector of either of those into the metalog to 'compress' the data down to a few aCoefficients that preserve the shape of the distribution. When I want or need to simulate (e.g. when aggregating those two tokens into a portfolio's value at risk), all I need to do is send in a random number (random uniform) of n trials and I've recreated the entire shape. Need just one point on the CDF? I send in the probability I want a value for and voila.

I find this to be transformational stuff and hope you'll give metalogs a try. I find it hard to believe in 200 years data scientists will still be using dead statistician distributions when metalogs are so convenient and superior.

Hmm I suppose Tom Keelin will have passed by then

In another post I'll explain how to roll-up the portfolio risks but for now, if you have any questions or want to learn more, drop a comment here or better yet join the discord.

]]>A popular way to model crypto token prices is with lognormal distributions (if you have too). Let's say you want to know the probability a portfolio of tokens will be greater than a particular price threshold. That's pretty common, if you want to assure a DeFI loan won't be liquidated for being under collateralized. Or if you are pricing a derivatives contracts, or a basket of options, these would involve sums of lognormal price volatility distributions.

There are non-financial fields where modeling lognormals is also a common practice, like in geology, biology, engineering and many others .

Tom Keelin, Lonnie Chrisman and Sam Savage recently wrote a paper that outlines a solution. Here's the abstract from the paper:

"The metalog probability distributions can represent virtually any continuous shape with a single family
of equations, making them far more flexible for representing data than the Pearson and other
distributions. Moreover, the metalogs are easy to parameterize with data without non-linear parameter
estimation, have simple closed-form equations, and offer a choice of boundedness. Their closed-form
quantile functions (F-1) enable fast and convenient simulation. **The previously unsolved problem of a
closed-form analytical expression for the sum of lognormals is one application.** Uses include
simulating total impact of an uncertain number N of risk events (each with iid [independent,
identically distributed] individual lognormal impact), noise in wireless communications networks and
many others. Beyond sums of lognormals, the approach may be directly applied to represent and
subsequently simulate sums of iid variables from virtually any continuous distribution, and, more
broadly, to products, extreme values, or other many-to-one change of iid or correlated variables."

I decided to write the javascript version of this using an interpolatable (is that a word??) data table based on a spreadsheet the authors produced. Here's the github repo and a codepen which is largely based on it.

If this is an area of interest for you and you like to help there are a few open items listed in the repo. I'd like to get the data table into decentralized storage (IPFS, Gun, Sai, others) so nobody ever has to calculate these values again, they may just look them up. If you're curious and want to learn more about metalog distributions and how we're using them in DeFI join the discord server.

]]>Uniswaption.com lets LPs easily compare pools and find liquidity provision opportunities that match their risk appetite in three easy steps:

**1 - Select Token Pairs:** Historical prices are retrieved from Covalent API and fitted using chance-data distributions

**2- Calculate Liquidity Percentage:** Query tick liquidities and calculate range liquidity as percentage of total

**3 - Interactively Evaluate:** Visualize pools based on users risk appetite

**Immediate Technology Improvements**
Using decentralized storage to store historical risk distributions. This will let anybody that has access to the data to easily compare pools and find liquidity provision opportunities that match their risk appetite. It also acts as a step towards proof of analysis of risk distributions and disclosure.

Give user the ability to fetch more currencies. Closely related to storing the chance-data so it may be reused is adding more currency pairs to the app. Once a risk distribution is created any other app user may use it.

**Improve on the idea:**
Once the analysis is done the app aims to facilitate the trade and follow the performance.
It is also envisioned to create more views of the data that dynamic aggregate a portfolio of positions.

The project won a sponsor award from Visor Finance!

To learn more or follow the project's progress follow @moatazelmasry93 on Twitter.

]]>This should be easier than it is to explain, but like many fast-moving open-source projects, definitions and capabilities change rapidly, often with shared oversight. Let's start with the broadest definition for 'What is GUN?' and work towards the more specific.

1 -"GUN is an ecosystem of tools that let you build community run and encrypted applications." 2 - GUN is an open-source project. 3 - GUN from a technology perspective is a protocol for synchronizing data in graph format.

So, it's an ecosystem, project and protocol. Ok, but why do so many people call it a database? The need for a decentralized offline-first way to store data is likely the requirement that drives people to refer to gun as a database. Naming aside, this `stuff`

is transformational but why does it seem so underappreciated?

Here's how I will use the terms if I blog more about the project:

GUN = concept gun.js = concrete

So gun.js is a database engine that runs everywhere JavaScript does. It is a small, easy, and fast data sync and storage system with the aim of removing developer worries about backend servers, network calls, databases, tracking offline changes and concurrency conflicts. (Paraphrasing the repo readme.md)

Characteristic of gun.js

- FOSS
- Offline first
- Decentralized
- Graph
- Eventually consistent (cAP)

"Technically, gun.js is a graph synchronization protocol with a lightweight embedded engine, capable of doing 20M+ API ops/sec in just ~9KB gzipped size." -again, from GUN project readme

This not only sounds amazing it is amazing. I've been using it for a year now and it's changed the way I think about apps and the ownership of data. So why aren't more developers using it? I'm genuinely interested in hearing from devs who have tried it and moved on to other decentralized storage or decided to stick with Firebase. Please leave a comment below and share your perspective and experiences.

*This post could have been a tweet, but you know... length.

]]>The big idea here is to represent uncertainties as data arrays in a sharable, compact (~2kb), auditable form. I call this chance-data. It's a concept invented and promoted by Prof. Sam Savage of Stanford. Chance-data obeys both the laws of arithmetic and the laws of probability. Each value in the array represents a possible future outcome or scenario. Think of a what-if scenario if youve ever used Excel. What if we sold 1000 units more? Will we have enough inventory to fulfill the demand? In a spreadsheet we would change an input value to see what the outcome would be. Simple enough. Now imagine having millions of possible future outcomes.

Take the array of those possible future values and create a histogram. The histogram is a graph that represents the distribution of the values. The graph depicts the probability of each possible future value. So, if you have an array of values that all have the same probability, the graph will be a straight line or whats called a uniform distribution. Think of rolling a six-sided die. The probability of rolling a 1 is the same as rolling any other number but many uncertainties have different shapes and their histogram may look like a bell curve, or a ski jump, a skateboard ramp and anything in between. Some of you may be thinking ok, so what? Nothing new here.

Historically analytics professionals would fit data to a probability distribution. The most well-known being a so-called normal distribution or Gaussian distribution. There are over a hundred distributions mostly named after dead statisticians. The trick, art, sorcery is to pick the correct distribution for your data. Not always an easy task. Which distribution? What parameters should feed that distribution...? it's not fun. If you've ever struggled to find the correct distribution to fit your data and have considered quitting the field and forgetting everything you know about stats, because you just can't win, I've got a therapist for you...

In 2016 Tom Keelin published the first paper describing metalogs. Metalogs are just that - a distributions that comes **after** you've struggled to find the correct distribution to fit your data... kidding aside metalogs **encompass** 60+ traditional continuous distributions in one technique. So now we can 'fit' data to a metalog, and we don't need to do as much guessing. A lot of data observations go in and the metalog function spits out a handful of coefficients Great we have compressed the shape down to a few numbers. Kinda cool.

Source Wikipedia

Where it gets interesting is when you have more than one uncertain variable AND you want to share those with others. How do you get two people to have the exact same Monte Carlo results when Monte Carlo is random? That's where Sam Savage has been focusing. He started off by freezing the Monte Carlo trials as vectors (arrays for us devs). Pass those frozen vectors around and everybody has the same result. Cool. BUT too much data too small pipes.

Prior to metalogs Savage had first invented something called a DIST String. It was a technique for packaging the vectors as base64 encoded strings. He added some metadata, was smart about binning the data before encoding it so the 'average' could be preserved and viola the DIST String was small enough to fit into an excel cell. He went on to call a single uncertainty packaged up like that a SIP short for Stochastic Information Packet. When there is more than one uncertainty, and they are related to each other, that forms a SLURP - stochastic library with relationship preserved.

OK. So, we have this fitting with metalogs and the idea of freezing trials and compressing them. Cool but there's still some challenges when one goes to share these 'things'. Different devices generate different random numbers. What we want is to know that when anybody anywhere looks at a probability distribution that they are looking at exactly the same data. This helps with auditability, replicable results, and provides a common understanding. To do this a seeded pseudo random number generator is needed. No big deal those have been around forever. Doug Hubbard (the How To Measure Anything guy) creates a tiny little PRNG formula called an HDR that supports 4 seeds. Kinda cool. Necessary? hmm not sure.

Sam Savage is in LOVE with Excel nearly as much as he is with the mission of curing the flaw of averages, so he uses the HDR with metalogs to create interactive simulations in Excel. Not stopping there he and goes on to define a specification for sharing the probability distributions called SIPMath.This specification preserves the shape and relationships found in uncertainties and puts them into tiny little JSON file that is easily shared and consumed in Python, JavaScript and other programming languages. Its a type of data Lego that compresses massive amounts of data into a portable format.

Source ProbabilityManagement.org

Models don't need to rely on average assumptions. It's no accident averages are referred to as the mean. They can be cruel to your models and mislead people into making suboptimal decisions. By feeding AI models chance-data your apps will help users avoid the flaw of averages and shine a light on the range of possibilities.

If you'd like to learn more about this topic join us in the discord where we are creating a public utility to help people answer the question, what's the chance of something happening?

]]>DeFi lending and borrowing protocols require a user to deposit collateral. Then the depositor may borrow off that deposit up to a certain percentage of the collateral value. If the collateral value falls the user will need to either payback a portion of the loan, up the collateral, or risk being liquidated. To calculate the chance of a loan liquidating we looked at how token prices have moved in the past and how their movements related to each other. We retrieved historical on chain price data and sent the daily percentage changes in price, for 9 well known tokens, through an innovative formula called Metalogs.

Metalogs encompass dozens of distributions and eliminate the need to guess which distribution the data will fit into the best. Determining the shape of historical token price changes 'analytically', effectively compresses a year's worth of price data down into 5 numbers that represent the shape of a token's price movement. I like to think of this process as 'dehydrating' rather than compressing as we will be able to rehydrate the shape later. The histogram in the image below shows the shape of historical price changes and applied that to the current price of a token to provide a view of today's possible price range.

We are now able to simulate probability distributions in our dApp. But that's only half the story. We also captured the correlation of a token to every other token in the set. This can come in handy if you are using more than one token to collateralize a loan or want to simulate an exotic option. 'Will a synthetic 2x put option offset my liquidity risk?' or 'How should I allocation tokens in my portfolio to better match my risk tolerance?' You may see the relationship in how token prices move by viewing a scatter plot. In the example above we see a stronger positive relationship between ETH/CRV than we do ETH/DAI. The relationships were quantified and stored as a correlation matrix (9x9).

We stored the chance-data in a publicly accessible json file with metadata for transparency, auditability and replicability and digitally signed it. All our dApp needed to do was to pull in the json file from IPFS then 'hydrate' the distributions in the browser/app. The simulation trials preserve the relationships across distributions which gives them a special property of being additive. This historically was only possible with massive scenario databases found in TradFi. We now have 'DeFi-Distributions' or what I like to call chance-data. ðŸš¥

The hackathon version of the dApp may be accessed here: ChanceOf.xyz While we didn't win best of show we did manage to win three sponsor awards and were encouraged to continue development. Thank you, Conjure Finance, Alchemy and Chainlink. Check it out and let us know what you think. Join the discord if you'd like to contribute or follow along.

]]>