Dreaming the Limits of my Faith

2016.12.23, 2017.04.24

Warning: this post contains semi-graphic content just as I dreamed it.

I had a dream this morning, a half hour before my alarm was set to go off. In this dream, one of my children had a minor medical problem. I don't know what the medical problem was, but I think the child was Becca who, in some ways, is the most innocent of my children compared to her age. We turned her over to the care of medical doctors who did their diagnosis work and planned for a procedure to help her recover.

At this point, I must pause and provide some mental context. I am fascinated by how far the medical profession has gone. Every decade they are able to do more, to cure more. They are able to remove body parts, replace body parts, open up the human body to repair deeply hidden body parts. Sometimes it seems they have nearly reduced medical treatment to an engineering problem. But, I am one who has thought long and hard about things such as the teleportation depicted in the humanist and reductionist Star Trek. I have decided that if science could some day learn how to transport a person's physical body through the air in the form of a stream of particles or in the form of data, it would certainly kill that person. I believe that science and medical practice will never fully reduce the human soul to a human body-machine or to data that can be represented by a physically coded signal.

In that context, you should be as jarred by disbelief as I was when the doctors in my dream showed me what they had started to do with my little girl. For the minor medical problem she had (I don't know what it was), they found it convenient to completely dissect her. Not only was she dismembered, but her various types of organ were separated from each other and placed carefully in different compartments of a type of crate lying on a table. It looked like a small meat display case at a grocery store. There were no coverings, no preservative liquids. There was nothing remaining intact that looked like it belonged to a human child.

I instantly realized -- as you can only do in dreams -- that they found this dissected state of the human body easier to work with. They would affect a cure in whatever part seamed best and then they would put it all back together. I was horrified and repulsed by this infinitely precarious situation and immediately woke up, unable to sleep any more.

What am I to make of this dream? I know some people who have that level of faith in human reasoning or medical skill. They believe medical science will some day allow doctors to reconstruct a human body like a Frankenstein's monster or a robot. Apparently they live in the world of my dreams.


Some Notes from AAAI-16 in Phoenix

This is a summary of what I gleaned from the academic AI research conference AAAI 2016 in Phoenix, AZ.  There is some good stuff here for machine learning users. There are also a few famous AI names here and there.  I will bold words to help you skim.  I did not take notes during all the talks, just the tutorials and invited talks that I found engaging.  I went to standard research talks on game theory, reinforcement learning, NLP, etc., but did not take notes.  I even met and talked with a few people—one about ad tech, one about social diffusion, and a few others briefly.  For the most part, I was a nerd among nerds, a loner among loners.

Tutorial on CP-Nets (Conditional Preference Networks) by Thomas E. Allen, Judy Goldsmith, and Francesca Rossi

Using CP-Nets is a way to model combinations of preferences in a more-tractable way.  Instead of creating a total-order among all possible complete situations or states, they decompose each state into variables and orient those variables as nodes in a directed acyclic graph (DAG, just like a Bayesian network does).  These nodes express which decisions or aspects of the whole state are most important given the "parent variables", i.e. the decisions made already at the more important nodes.  Each child node in the network contains a table of preferences among the values of its variable given all possible values of the parent variables.  This is a qualitative ranking of preferences, not a quantitative relationship.

I also went to lunch with these guys.  Very friendly and inclusive.

Deep Learning by Reza Borhani, Jeremy Watt, and Aggelos K. Katsaggelos

A while back, an important problem was function approximation, and the technique people used was to learn a mixture of fixed basis functions.  There are a number of types of fixed basis functions that are popular.  For frequency (Fourier) type data, you might use a set of sinusoid functions of different frequencies.  For other kinds of data, you might use a set of polynomials of different orders (powers).  There are many others.  A good thing about fixed basis functions is, they have convex error surfaces, which means you don't have to worry about falling into local optima (weight settings that terminate search but which are not globally optimal).  And yet they have other problems, and it turns out that the local optima problem with non-convex error surface of neural nets is not really that big of a problem.  The problem with fixed basis functions is, they are not very efficient.  You need to combine a lot of different basis functions from your class to approximate an arbitrary function very well.

A neural network is a set of feature basis functions for the purpose of function approximation.  They overcome some limitations of using fixed basis functions.  In a sense, neural networks "scale" better than fixed basis functions because they require a smaller number of features or basis functions than the fixed approach.  tanh is a good basis function for neural nets because it is adjustable (a lot like the logistic function which is what I learned in school).  That kind of adjustable basis function has been popular for a while, but recent improvements have given us a number of new basis functions that actually perform better than tanh and logistic (sigmoidal).

Training data quality is a spectrum from ideal to bad.  Ideal training data is characterized by two things: (1) it is not noisy and (2) it is evenly distributed over the input feature space (and he probably implied also balanced in output space too) which means no empty areas.  If your training data is close to "ideal", then it does not matter which basis function you use.  If not, then select your basis function wisely.

Gradient descent is equivalent to the back-propagation algorithm in this context.  Traditionally, you take one big step in weight-space after a full scan of all the training data.  Stochastic gradient descent is more incremental (less batch-oriented) in that it takes small steps, one for each datum in the training data; and it works better than batch updates.

A theoretical note with practical application: the target function must fall within the "span" (or space, think linear combinations of basis vectors in linear algebra) of the features or basis functions.

Neural networks are popular now because we have more data and Moore's law.  The recent big advancements in neural network research were inspired by biological neural networks, e.g. human brain and perceptual systems.  Convolutional neural networks use more biology knowledge in terms of layers of dimensionality reduction, function composition (or decomposition), and structure.

Other recent advances in neural nets include the following: (1) new choices for activation functions beyond tanh and sigmoidal, e.g. something he called "max" converges much faster requiring fewer iterations of learning, (2) better ways to initialize weights, (3) theory advances showing that non-convex error surfaces are not so bad, (4) new optimization processes, (5) new regularization strategies.

For those of you who are doing deep learning with normal neural networks containing a single layer of hidden units, here is a definition for you.  A Deep Learning model is defined to have more than two or three hidden layers in a neural network.

Diffusion in Social Networks by Paulo Shakarian

This lecture was a survey of the field of research that models how ideas spread in social networks, with applications to marketing and "going viral".  On a high level, there are two approaches: One, you could construct a detailed graph (network) model of the social network and run simulations to see what happens.  There are many competing theories and none of them look detailed enough to work very well.  This approach is better for explaining what might happen in the network than actually predicting diffusion.  Two, you could use machine learning to more directly learn to predict how far ideas might spread based on a variety of feature of the phenomenon, including the early stages of propagation.  This approach does better than the first approach at prediction, but neither approach works as well as you might hope.

Here are a few sketchy details.

Questions to answer:  How do thoughts spread in a social network?  How many infectees after T time?  Influence Maximization Problem.  Profit Maximization.  Takes theories from various fields like mathematical sociology.  Perform simulations.  The first iteration of the simulation is most sensitive to out-degree and probability (beta) of transmission.

Models: SIR and SIS models, tipping model and target set selection, Kempe-Kleinberg-Tardos (KKT) framework, logic programming based diffusion, evolutionary graph theory (EGT).

SIR model: Susceptible means able to be infected.  Infected means able to infect.  Removed means not able to infect or be infected.

Tipping Model: Polynomial-time infection-count prediction.  Non-submodularity.  Algorithm for finding a seed set that can spread infection to the whole graph in minutes with millions of nodes and edges.  Linear Threshold: uses probabilistic thresholds.

Kempe-Kleinberg-Tardos (KKT): Independent Cascade (IC) Model allows different probability for each connection. Transmission is from sum of weights. Live-Edge (Live-Arc) Model.  Generalized Threshold Model.  Logic-based Diffusion: Annotated rules for diffusion. Write an "annotated program" in logic, based on attributes of nodes and relations, (including probabilities? Maybe not).  Like deterministic tipping model.  Computing fixed point (maximum spread), which is polynomial in time and space.  Multiple labels on nodes and relations.

Evoluationary Graph Theory: Mutants taking over.  Moran Process.  Goal to compute fixation or extinction probability.  Isothermal graph.

Data-driven approaches: Learn model parameters from past diffusion processes.  Regression model to predict spread from features of past diffusion processes.  EM process hard to scale but accurate for early stage diffusion (non-viral).  Tends to over-fit using small influence probabilities (10^-7).  Temporal effects: exponential decay of influence is slow.  Discrete time model is a good compromise.  Dynamic models are better, Bernoulli model better than Jaccard.  Discrete time comparable to continuous, which is nice.

Credit Distribution Model: Fast runtime for Influence Maximization.

Power-law of cascades, large cascades are rare.  Features that are predictive: seed user's number of followers, number of friends, past local influence; content's tweet length, hashtags, divergence of sentiment; early diffusion's number of out degrees in subgraph, etc.  Regression trees, 0.98 R^2, but poor final result.  Logistic regression with balanced data got good results.  Original poster and content becomes less important as initial cascade progresses.  Number of influencers (incoming edges) has less effect above 3. Number of influencing communities has a greater impact.  Structural diversity is a good feature.

His startup: CrossViral

I also had a good chat with Dr. Paulo a few days later.

Inference in Structured Prediction by Dan Roth, Kai-Wei Chang, Gourab Kundu, and Vivek Srikumar

This was very related to my research area in school, so a lot of this was review for me.  Dan Roth was one of the few names I recognized at the conference.  I have to agree that structured prediction problems are among the most important (and still very general) prediction problems.  They include NLP, information extraction, and image object recognition problems.

Integer Linear Programming (ILP) is cool and very useful and popular these days in ML and AI.

Reading the Web

This talk was very interesting and also related to many of the papers I read in preparation for my PhD dissertation, so a lot of this talk was a review for me.  He talked about bootstrapping (the unsupervised and highly scalable process of learning extraction rules from extracted knowledge and vice-versa) and the strategy of using multiple constraints or sources of evidence to prevent semantic drift.  Tom Mitchell's Never Ending Language Learning (NELL) project and Orin Etzioni's Web-reading work are the more famous projects in this area.

I saw a slide with Justin Betteridge on it, who I already knew was working in the NELL group at CMU.  I new Justin during my masters degree at BYU.  Very nice guy.

How AI influences Labor Market (Panel Discussion)

Random bits of pessimism and optimism and ego-boosting comparisons among panelists (mostly Oren Etzioni) interspersed with a few random but interesting line plots and quotes.

Google DeepMind (Invited Talk) by Demis Hassabis

DeepMind was sold to Google for $500M.  Demis Hassabis and his company have a very impressive background.  He has been planning DeepMind for 20 years, and intentionally worked in the best of academia and startups as preparation.  Their goal is to "solve intelligence", by which catchy phrase I believe he means to "solve the impossibly hard problems necessary for creating a truly general-purpose artificial intelligence."  Strategies they adopted early on were (1) learn from raw inputs like the pixels on a screen of a video game, (2) be general not brittle and narrow.  As an counter-example, he mentioned that DeepBlue was narrow though still impressive because it would have to be completely retrained to play anything other than chess.

They are known for deep learning, but they are by no means only about deep learning.  They prefer thinking in terms of reinforcement learning.  This involves learning statistical model of world, then learning actions.  They believe that solving reinforcement learning would solve general learning and intelligence.  They believe in "grounded cognition".  Robitics hardware is expensive, slow, and it breaks.  Therefore they use simulators which can make as much training data as you want, and they ground themselves in pixels of video game images.  They prevent testing bias, but I forget how.  They can run tests on thousands and millions of agents at once.

They use Deep Reinforcement Learning, which to them means getting reinforcement learning to work at scale.  Their early task was to learn how to play 8-bit games from 80's, Atari 2600.  DQN.  Code released and discussed in articles in Nature.  General purpose game learner.

They draw inspiration from brain and neuroscience.  Their system is a Neural Turing Machine: ANNs with memory.  It is a full Turing Machine. AlphaGo is their new project.  It is based on pattern-recognition, planning, tree search, Monte Carlo Rollouts.  aGo was the first program to beat a professional Go player on a full sized board.  They have hundre Alphds of papers on their website.  Next month, in March, they will live stream on YouTube AlphaGo playing a big pro, Lee Sedol. This will be a $1M 5-game challenge match.  In preparation, they are retraining everything.

Difference between AlphaGo and DeepBlue: AlphaGo is a general-purpose, modular system combining deep learning based pattern recognition with planning and search.  They will later remove supervised learning; training time will be longer.

AlphaGo is more human-like.  Human players say AlphaGo plays like a human; so they say it passes a "Turing test" for gamers.

Montezuma's Revenge is hard for AlphaGo.

They used 100 GPUs before being bought by Google while working on Atari games.  This was a response to a question about how to do this kind of research on a small budget.  He says it's a good domain for small budgets.

Extra Bits

I had fun watching the ASU robots do their thing in the foyer between research paper sessions.

One of the posters in the poster session was about learning to automatically generate posters from scientific papers.  I asked her the obvious question.  No, her poster was not automatically generated in toto.  (One of the figures was.)  But she did get some good attention and chuckles when she introduced her topic during her spotlight talk.

Here is a web app (and a university research project) helping people making "fair" decisions involving finite resources: http://www.spliddit.org/

It was interesting to eavesdrop on Peter Norvig and someone else I did not recognize who were talking about "academic genealogy" and their "shared ancestors".  It seams that the better your academic "pedigree" is, the more you worry about academic pedigrees.


Various Artworks

Here I collect a few random pieces of my visual artistry.

This drawing tells me what I want to do and be in the future.  In particular, it tells me two things: I wish to draw pictures with words as a writer, and I wish to fade into the background as the words of this parable is the real message.  This drawing was done in pen-and-ink on paper by hand before being scanned into computer and color added.

My first attempt in many, many years to combine computer science with art.  I call this a cuneiform binary fractal tree to 19 levels of recursion.  If you have Googled for fractal tree, you will have seen similar trees.  What makes mine unique, I think, is that it is entirely algorithmically generated and yet it has a semi-natural shape by my adding a small amount of uniform-random noise to the branching angle and also choosing a random order for growing the left and right child branches at each branch point.  I think it strikes a balance between too regular and too irregular.  I also like the fact that it is composed entirely of such a simple shape: triangles.  Oh, and I also prevent overlap among all branches in the tree by keeping track of points on lines and pruning any branch before it crosses an existing one.  You must understand, my intention is to refine the two-dimensional appearance, preventing most or all appearance of depth.  I could do more.  I plan to leave more space between branches to prevent the crowding-look.  I could also add a little noise to the branch length.  Needless to say, there are more trees to come.