15 stories

Filter before you parse: faster analytics on raw data with Sparser

1 Comment and 2 Shares

Filter before you parse: faster analytics on raw data with Sparser Palkar et al., VLDB’18

We’ve been parsing JSON for over 15 years. So it’s surprising and wonderful that with a fresh look at the problem the authors of this paper have been able to deliver an order-of-magnitude speed-up with Sparser in about 4Kloc.

The classic approach to JSON parsing is to use a state-machine based parsing algorithm. This is the approach used by e.g. RapidJSON. Such algorithms are sequential and can’t easily exploit the SIMD capabilities of modern CPUs. State of the art JSON parsers such as Mison are designed to match the capabilities of modern hardware. Mison uses SIMD instructions to find special characters such as brackets and colons and build a structural index over a raw json string.

… we found that Mison can parse highly nested in-memory data at over 2GMB/s per core, over 5x faster than RapidJSON, the fastest traditional state-machine based parser available.

How can we parse JSON even faster? The key lies in re-framing the question. The fastest way to parse a JSON file is not to parse it at all. Zero ms is a hard lower bound ;). In other words, if you can quickly determine that a JSON file (or Avro, or Parquet, …) can’t possible contain what you’re looking for then you can avoid parsing it in the first place. That’s similar to the way that we might use a bloom filter to rule out the presence of a certain key or value in a file (guaranteeing no false-negatives, though we might have false positives). Sparser is intended for use in situations where we are interacting directly with raw unstructured or semi-structured data though where a pre-computing index or similar data structure either isn’t available or is too expensive to compute given the anticipated access frequency.

In such a context we’re going to need a fast online test with no false negatives. Comparing state-of-the-art parsers to the raw hardware capabilities suggests there’s some headroom to work with:

Even with these new techniques, however, we still observe a large memory-compute performance gap: a single core can scan a raw bytestream of JSON data 10x faster than Mison parses it. Perhaps surprisingly, similar gaps can occur even when parsing binary formats that require byte-level processing, such as Avro and Parquet.

Imagine I have a big file containing tweet data. I want to find all tweets mentioning the hashtag ‘#themorningpaper’. Instead of feeding the file straight into a JSON parser, I could just do a simple grep first. If grep finds nothing, we don’t need to parse as there won’t be any matches. Sparser doesn’t work exactly like this, but it’s pretty close! In place of grep, it uses a collection of raw filters, designed with mechanical sympathy in mind to make them really efficient on modern hardware. A cost optimiser figures out the best performing combination of filters for a given query predicate and data set. When scanning through really big files, the cost optimiser is re-run whenever parsing throughput drops by some threshold (20% in the implementation).

Sparser can make a big difference to execution times. Across a variety of workloads in the evaluation, Sparser achieved up to a 22x speed-up compared to Mison. This is a big deal, because serialising (and de-serialising) is a significant contributor to overall execution times in big data analytic workloads. So much so, that when integrated into a Spark system, end-to-end application performance improved by up to 9x.

Efficient raw filters

Raw filters (RF) operate over raw bytestreams and can produce false positives but no false negatives. They are designed to be SIMD efficient. There are two raw filter types: substring search and key-value search.

Say we have a predicate like this: name = "Athena" AND text = "My submission to VLDB". Substring search just looks for records that contain a substring sequence from the target values. For efficiency reasons it considers 2, 4, and 8-byte wide strings. Sticking with 4-byte substrings, we have several we could potentially use for matching, e.g. ‘Athe’, ‘ubmi’ or ‘VLDB’. Using VLDB as an example, the string is repeated eight times in a 32-byte vector register. We need 4 one-byte shifts to cover all possible matching positions in an input sequence:

Note that the example in the figure above is actually a false positive for the input predicate. That’s ok. We don’t mind a few of those getting through.

The main advantage of the substring search RF is that, for a well-chosen sequence, the operator can reject inputs at nearly the speed of streaming data through a CPU core.

Key-value search looks for all co-occurrences of a key and a corresponding value within a record. The operator takes three parameters: a key, a value, and a set of one-byte delimiters (e.g. ,). After finding an occurrence of the key, the operator searches for the value and stops searching at the first occurring delimiter character. The key, value, and stopping point can all be searched for using the packed vector technique we looked at for substrings.

Whereas substring searches support both equality and LIKE, key-value filters do not support LIKE. This prevents false negatives from getting through.

Optimising filter cascades

Sticking with the example predicate name = "Athena" AND text = "My submission to VLDB", there are multiple raw filters we could consider, and multiple ways to order those filters. For example, if “VLDB” is highly selective it might be good to run a substring filter on VLDB first, and then feed the results into a key-value filter looking for name = "Athena". But if ‘VLDB’ occurs frequently in the dataset, we might be better off doing the key-value filtering first, and the substring search second. Or maybe we should try alternative substring searches in combination or instead, e.g. ‘submissi’. The optimum arrangement of filters in an RF cascade depends on the underlying data, the performance cost of running the individual raw filters, and their selectivity. We also have to contend with predicates such as(name = "Athena" AND text = "Greetings") OR name = "Jupiter", which are converted into DNF form before processing.

The first stage in the process is to compile a set of candidate RFs to consider based on clauses in the input query. Each simple predicate component of a predicate in DNF form is turned into substring and key-value RFs as appropriate. A substring RF is produced for each 4- and 8-byte substring of each token in the predicate expression, plus one searching for the token in its entirety . Key-value RFs will be generated for JSON, but for formats such as Avro and Parquet where the key name is unlikely to be present in the binary stream these are skipped. For the simple predicate name = "Athena" we end up with e.g.:

  • Athe
  • then
  • hena
  • Athena
  • key = name, value = Athena, delimeters = ,

Since these can only produce false positives, if any of these RFs fails, the record can’t match. For conjunctive clauses, we can simply take the union of all the simple predicate RFs in the clause. If any of them fail, the record can’t match. For disjunctions (DNF is the disjunction of conjunctions) then we require that an RF from each conjunction must fail in order to prevent false negatives.

Now Sparser draws a sample of records from the input and executes (independently) all of the RFs generated in the first step. It stores the passthrough rates of each RF in a compact matrix structure as well as recording the runtime costs of each RF and the runtime cost for the full parser.

After sampling, the optimizer has a populated matrix representing the records in the sample that passed for each RF, the average running time of each RF, and the average running time of the full parser.

Next up to 32 possible candidate RF cascades are generated. A cascade is a binary tree where non-leaf nodes are RFs and leaf nodes are decisions (parse or discard). Sparser generates trees up to depth D = 4. If there are more than 32 possible trees, then 32 are selected at random by picking a random RF generated from each token in round-robin fashion.

Now Sparser estimates the costs of the candidate cascades using the matrix it populated during the sampling step. Since the matrix stores a results of each pass/fail for an RF as single bit in the matrix, the passthrough rate of RF i is simply the number of 1’s in the ith row of the matrix. The joint passthrough rate of any two RFs is the bitwise and of their respective rows.

The key advantage to this approach is that these bitwise operations have SIMD support in modern hardware and complete in 1-3 cycles on 256-bit values on modern CPUs (roughly 1ns on a 3GHz processor).

Using this bit-matrix technique, the optimiser adds at most 1.2% overhead in the benchmark queries, including the time for sampling and scoring.

Periodic resampling

Sparser periodically recalibrates the cascade to account for data skew or sorting in the underlying input file. Consider an RF that filters by date and an input file sorted by date – it will either be highly selective or not selective at all depending on the portion of the file currently being processed.

Sparser maintains an exponentially weighted moving average of its own parsing throughput. In our implementation, we update this average on every 100MB block of input data. If the average throughput deviates significantly (e.g. 20% in our implementation), Sparser reruns its optimizer to select a new RF cascade.

Experimental results

Sparser is implemented in roughly 4000 lines of C, and supports mapping query predicates for text logs, JSON, Avro, Parquet, and PCAP. The team also integrated Sparser with Spark using the Data Sources API. Sparser is evaluated across a variety of workloads, datasets, and data formats.

Here you can see the end-to-end improvements when processing 68GB of JSON tweets using Spark:

Avro and Parquet formats get a big boost too:

I’m short on space to cover the evaluation in detail, but here are the highlights:

  • With raw filtering, Sparser improves on state-of-the-art JSON parsers by up to 22x. For distributed workloads it improves the end-to-end time by up to 9x.
  • Parsing of binary formats such as Avro and Parquet are accelerated by up to 5x. For queries over unstructured text logs, Sparser reduces the runtime by up to 4x.
  • Sparser selects RF cascades that are within 10% of the global optimum while incurring only a 1.2% runtime overhead.

In a periodic resampling just using a date based predicate, the resampling and re-optimisation process improved throughput by 25x compared to a sticking with the initially selected RF cascade for the whole job.

See the blog post from the authors and a link to the code here.

Read the whole story
134 days ago
This might be useful in my near future…
Share this story

Your rapper name is…

16 Comments and 18 Shares


Thanks to Meryle for the find!

The post Your rapper name is… appeared first on The Adventures of Accordion Guy in the 21st Century.

Read the whole story
149 days ago
lil' pork neck
149 days ago
Share this story
15 public comments
145 days ago
Lil’ chia pudding 😂
Belgrade, Serbia
148 days ago
lil BLT!
Apex, North Carolina
148 days ago
lil poke
148 days ago
lil peanut!
149 days ago
lil bowla soup
149 days ago
lil kolache
Space City, USA
149 days ago
Lil’ Bear Claw
149 days ago
Lil' Nut Bar
Louisville, Kentucky
149 days ago
Lil' Honey Ham Slider
Louisville, KY
149 days ago
lil' sausage roll
149 days ago
Lil’ Pop

149 days ago
lil' chicken roti
149 days ago
lil tide pod
Victoria, BC
150 days ago
lil' zucchini
Bend, Oregon
150 days ago
lil' Sourdough
Cary, NC
150 days ago
lil' Protein Bar
174 days ago
lil' Cliff Bar
Denver, CO

Null Value

1 Comment

We all learn that open source licenses make open source community possible. And then we learn they’re the one piece of that community we’re not to touch. Flame about them? Sure. Hack on them? Forget about it.

We all learn the origin stories of GNU and Free Software. How hacker spirit and some clever lawyering turned copyright around on itself, and stuck it to the NDA-wielding, binary-distributing Man. And at some point, we all gather that those brash, creative days are over. That we’re best off picking one of the songbook standards—MIT, BSD, GPL for the brave and true—faithfully reproducing onto our code, and praying to the Law Gods for no more than our just share of drama.

The hood of the license machine is welded shut. Standards have come to nest inside. Tooling and Best Practices have rusted it over. The jukebox only plays the hits. Anything else literally does not compute. Mea culpa, mea culpa, mea maxima culpa.

Ossification has coincided with a decline of Free Software spirit. Of course, there is still much great software under the GPLs, and new GPL software every day. But whole sub-industries have gone over nigh on entirely to permissive licenses, often short, crusty ones, and ignored the attribution requirements, to boot. Even GPL people will tell you that AGPL is the real deal, that it squashes a bug in integration with industry reality. Many of the same folks will tell you that, empirically, AGPL is user repellent.

“Don’t write your own license”, they say. And for a thousand good reasons, I usually say so, too. Even to fellow lawyers. Especially to this one.

The net effect has been to drain the verve of license terms as a medium to express and implement community goals, save “trouble us less with law-stuff”. In other words, licenses have devolved to utilitarian tools of hassle reduction. Long permissive licenses harbor all manner of licensee-do lists that we find inconvenient. So we use the shorter ones. Copyleft licenses wax vague and inconvenient. They breed license compatibility issues. So we ditch copyleft. Hither, antilicenses.

When new values and social aims arise—inclusivity, to pick an obvious example, abuzz at the moment—we put them in entirely different files. Privacy, DRM, surveillance, broad patent termination—all relegated to other channels. Attribution, source integrity, taking contributors’ names in vain, project name protection—all in LICENSE, by apparent accident of history.

I lay the main part of the moral evacuation of licensing at copyleft’s feet. The copyleft license authors picked a very hard problem—more noble still for being hard—and plotted a course past many mines nobody knew lay in wait. Teeth rattled.

It’s too easy to pick on the GPLs, which balance the already hard task of getting copyleft right with projecting political statements and pleasing many stakeholders. But even the “corporate” copyleft licenses, which often gained clarity by less stricture and preambulatory warbling, prove hard to interpret and apply. Compatibility issues spring out to snag them. Fuel finds the fire of assignment and contributor licensing kerfuffles.

Copyleft was, at its core, a very, very clever hack. One of the great ones in my profession’s long history. Give a public license, but use conditions to incentivize valued behavior, against the grain of copyright’s prevailing policy view. Brilliant. But about as un-fun and tedious to maintain and apply, long term, as any other hack. Tenuous, unexpected, novel, and weird enough to be indisputably clever. But unexpected behavior from systems not designed with such behavior in mind. And in the end, DRM won. And surveillance. And proprietary platforms. And opaque Software as a Service.

The risk—largely, I think, realized—is generalization from a painful history of implementing Free Software values in license terms via copyleft to a broader, bleaker proposition that licenses aren’t any kind of platform for value implementation. That other than reversing inconvenient legal defaults, license terms aren’t good for anything but dodging inconveniences of law’s own making. That messing with them only breeds life-sucking bother. Case in point, The JSON License.

There are absolutely goals and policies that licenses cannot or should not attempt to ensconce. But licenses aren’t merely a place to register points or set expectations. They’re the interface to the law, which invests coders—making involuntary cameos as copyright holders—with incredible leverage, at an individual level, on problems they otherwise couldn’t budge. There are always limits, but just as open source licenses hobble developers economically, forcing them onto second-rate business plans, they also hobble developers politically, relegating them to softer power for other-than-economic aims they might set out to achieve collectively.

The result is a grotesque extension of industry-friendliness, to an extent I don’t believe the coiners of “Open Source” ever desired or intended. With the exception of GPL “crazies”, whose presence must be tolerated, mostly no thanks to long-serving system software, for-profit users of open source software can grasp and take from a self-selecting heap of raw software material with even less concern for the non-product views and characteristics of open source software developers than employees on payroll. Free—as in beer—software literally falls from the sky. It doesn’t want payroll, insurance, or even, apparently, much respect for its time, previously sunk or presently requested. The give side of this give-take equation suffers much well-financed, lottery-winner-style celebration, to predictable and potentially tragic effect.

One definition of “sustainability” is perpetual harvest at this eat-all-you-can-pick plantation, seeded with proprietary software thrown over the wall, and mulched by a diverse population of code-capable, transient, and short-lived economic microorganisms. That definition also jives with a very hard technical view of Open Source purpose: if good software keeps coming out, Open Source is working, and casualties don’t count. I don’t resonate with that view, personally, but in any event, my professional obligations run to others who see a different way. Other than natural industry alliance, there’s nothing essential or inevitable about it, as the prime meaning of “Open Source”.

If copyleft was version one of hacker values in legal code, it’s no surprise bugs were found and squashed. It would come as no surprise that finding and implementing community values in legal code might evolve as an art. But that takes writing licenses.

Write them. Crazies needed.

Read the whole story
504 days ago
Share this story

Not not

1 Share

This is NOT a post about misnegation, a frequent topic at Language Log.  This is a reflection on the sublimity of nonnegation, which is not quite the same as transcendental affirmation.  It is a linguistic and philosophical inquiry on the absence of nothingness.

First comes the linguistics; at the end comes the philosophy.

In Mandarin, we have expressions such as the following, where the bù 不 doesn't seem to make any sense in terms of its usual signification — "not":

suānbuliūliūde 酸不溜溜的 ("sourish; quite sour")

For that matter, considering that suānbuliūliūde 酸不溜溜的 taken all together means "sourish; quite sour", the liūliū 溜溜 (lit., "slippery-slippery") part doesn't make much sense either.  Note that suān 酸 by itself means "sour".  Clearly, suānbuliūliūde 酸不溜溜的 ("sourish; quite sour")* does not mean exactly the same thing as suān 酸 ("sour"), but adds a special nuance.  The question, then becomes:  what do bù 不 ("not") and liūliū 溜溜 ("slippery-slippery") add to suān 酸 ("sour") that causes it to end up as suānbuliūliūde 酸不溜溜的 with the meaning of "sourish; quite sour")?

[*Mentioned in the "metaphor" chapter of Perry Link, An Anatomy of Chinese:  Rhythm, Metaphor, Politics (Cambridge, MA: Harvard University Press, 2013), esp. pp. 191-93; cited here.]

For the moment, I will avoid a direct answer to that question but will observe that this bù 不 and the lǐ 里 / 裡 (lit., "in") of "tǔlǐtǔqì 土里土气 / 土裡土氣" ("countrified; rustic; uncouth; provincial") — discussed here — are what is known as infixes.**  Infixes are used in other languages too, but in Chinese they are more apt to cause confusion for people with compulsively analytical minds because (unless they happen to be written with a mouth radical, which may indicate that they are being used primarily for their sound) such syllables are written with characters that normally convey semantic content or possess grammatical functionality that is irrelevant in these idiomatic expressions.

[**Mentioned briefly in Yuen Ren Chao, A Grammar of Spoken Chinese (Berkeley, Los Angeles, and London:  University of California Press, 1968), p. 257, where he renders suānbuliūliūde 酸不溜溜的 as "good and sour", and he does the same for expressions formed with the -li- infix, e.g., húlihútude 糊哩糊的 ("good and muddled").]

As further evidence that the liūliū 溜溜 (lit., "slippery-slippery") part of suānbuliūliūde 酸不溜溜的 is not semantically significant in a direct way, let us consider the variant Sinographic forms of this expression:

suānbuliūliū 酸不溜溜 (lit., "sour-not-slippery-slippery")

suānbuliūdiū 酸不溜丢 (lit., "sour-not-slippery-lose / throw")

suānbuliūqiū 酸不溜秋 (lit., "sour-not-slippery-autumn")

They all mean the same thing:  a more intense version of suān 酸 ("sour").

I asked a number of native speakers what they thought bù 不 is doing in these expressions.  Here are some of the responses I received:

1. It's not a marker for negative here. I don't know why the 不 is used here. I think it just represents a sound. Just a guess.

2. I think "不" here definitely doesn't function as a negative. Actually, It might have no meaning, only as modal particle to intensify the suān 酸 ("sour").

3. You are right, 不 is not a negative here. I think it is a particle for emphasis.

Note that suānbuliūliūde 酸不溜溜的 means the same thing as suānliūliūde 酸溜溜的, without the "bu 不" infix, so that is further proof that the "bu 不" doesn't in any way negate the basic meaning of suān 酸 ("sour").

Some other similar expressions:

huībùliūdiū 灰不溜丢 (lit., "gray not slippery lose / throw") or huībuchūliū 灰不出溜 (lit., "gray not emerge slippery"), a kind of gray color that looks dim; dull grey

hēibulājǐ 黑不拉几 (lit., "black not pull several") or hēibuchūliū 黑不出溜 (lit., "black not emerge slippery") or hēibuliūqiū 黑不溜秋 (lit., "black not emerge autumn"), a kind of dim / dull and dusty black

hǎobùkuàihuó 好不快活 (lit., "good not quick live" –> "very not happy"), "very / so happy" [Google Translate understands this, but Baidu Fanyi and Microsoft (Bing) Translator do not]

hǎobùwěi 好不委屈 (lit., "good not entrust injustice" –> "very not wronged"), "[feeling] very wronged / aggrieved / mistreated"

But don't get too confident that you have now mastered the nonnegativity of bù 不, because here's a humdinger for you to mull over for the rest of your life, as I have been pondering this paradox of negativity and positivity for decades:

hǎobùróngyì 好不容易 (lit., "good not allow easy") = hǎoróngyì 好容易 (lit., "good allow easy") = bùróngyì 不容易 ("not easy")!

For example:

Wǒ hǎobù róngyì cái xuéhuì yóuyǒng.


"It was not easy for me to learn how to swim / I spent a lot of time and made great effort to learn how to swim / It was only with great effort that I learned how to swim."

N.B.:  I haven't provided a literal translation of each syllable because you're already familiar enough with the hǎo 好 ("good") and the bù 不 ("not"), and the rest is fairly straightforward.

The previous sentence means the same as this one without the bù 不:

Wǒ hǎo róngyì cái xuéhuì yóuyǒng.


"It was not easy for me to learn how to swim / I spent a lot of time and made great effort to learn how to swim / It was only with great effort that I learned how to swim."

Now, prepare to have your mind completely blown away.

A highly literate native speaker actually sent me this sentence:

Wǒ hǎobù bù róngyì cái xuéhuì yóuyǒng.


"It was not at all easy for me to learn how to swim / I spent a great deal of time and made a tremendous effort to learn how to swim / It was only with very great effort that I learned how to swim."

The second version does sound surpassingly strange, but this construction does occur on the internet:

"我好不不容易" 4,500 ghits


"我好不容易" 486,000 ghits


"我好容易" 426,000 ghits

Although the first iteration about learning to swim with great difficulty, with its two adjacent bù 不 — bùbù 不不 — is genuine (perhaps some sort of brain stutter on the part of the person who sent it; nearly everybody would consider it "incorrect"), I suspect that some young members of the internet generation (conscious of the contorted irony of the hǎobùróngyì 好不容易 [lit., "good not allow easy"] construction meaning the same as hǎoróngyì 好容易 [lit., "good not allow easy"] without the bù 不 ["not"] — Chinese people do talk about this; see the first few entries here) may be using it playfully.

I cannot emphasize too strongly that, in daily usage, the sounds of the language are more important than the meanings that are conventionally associated with the characters that are used to write them.  To be a good reader of Chinese, you have to know when to put the surface signification of a character in the back seat and figure out what its sound is doing in a given construction.

Finally, to close this post on infix "bù 不" — "not 'not'", as it were — here is one of my all time favorite Mandarin adjectival expressions:  shǎbùlèngdēngde 傻不愣登的 ("daffy").  I'm not sure that I've written it with the "right" characters, but, forsooth, the only character out of the five that imparts relevant semantic content is the first, shǎ 傻 ("fool[ish]").  (The literal meanings of the characters are:  "stupid / silly / foolish — not — stunned / distracted / stare blankly — ascend / step on — adjectival suffix" [the third character may be tangentially somewhat relevant]).  I forget exactly how I learned this magnificent expression, probably from some old missionary writing, but I acquired it as part of my vocabulary during the first year of Mandarin study, and I've treasured it all the five decades since, just as I've treasured my pet snail Arnold for the past five years.  Come to think of it, they're both in their own way emblems of an essential eternality:  neti neti.

Neti neti, meaning "Not this, not this", is the method of Vedic analysis of negation. It is a keynote of Vedic inquiry. With its aid the Jnani [VHM:  wise or knowledgeable one] negates identification with all things of this world which is not the Atman, in this way he negates the Anatman. Through this gradual process he negates the mind and transcends all worldly experiences that are negated till nothing remains but the Self. He attains union with the Absolute by denying the body, name, form, intellect, senses and all limiting adjuncts and discovers what remains, the true "I" alone. L.C.Beckett in his book, Neti Neti, explains that this expression is an expression of something inexpressible, it expresses the ‘suchness’ (the essence) of that which it refers to when ‘no other definition applies to it’. Neti neti negates all descriptions about the Ultimate Reality but not the Reality itself. Intuitive interpretation of uncertainty principle can be expressed by "Neti neti" that annihilates ego and the world as non-self (Anatman), it annihilates our sense of self altogether.

Source (with slight modifications by VHM)

Not (this) not (this).

[Thanks to Maiheng Dietrich, Fangyi Cheng, Jing Wen, Jinyi Cai, Yixue Yang, and Melvin Lee]

Read the whole story
641 days ago
Share this story

The 5 Filters of the Mass Media Machine

1 Comment and 3 Shares

Read the whole story
666 days ago
Share this story
1 public comment
662 days ago
And yet this is about a famous book by a famous author that I'm watching on YouTube.

An annotated digest of the top "Hacker" "News" posts.


  • Functional programmers, realizing that their entire discipline is rendered inconsistent and useless the instant it is faced with herculean tasks such as "I/O" and "users", finally admit for the record that it's better to do literally anything else when these tasks arise. Satisfying termninology like 'free monad' and 'applicative functors' are bandied about as Hackernews tries to decide if you want imperative nougat with functional candy shell, or functional fruit filling with a flaky imperative pastry surrounding it. Nobody stops to wonder if the functional wizardry compiles to imperative code, or whether the processor gives a shit if your source code looks good in LaTeX. One Hackernews admits he doesn't know what these people are jabbering about; all users in agreement are ritually downvoted. In accordance with federal law, someone asks how this compares with Rust.

  • A spammer posts his bullshit, the 21st-century equivalent of motivational speaking, only with fewer ticket sales and more ebook download links. A Hackernews shark attack ensues as everyone realizes it is finally on-topic to desperately plead for any possible scrap of advice on how to actually make money. Not discussed: how to start a startup without ruining anyone else's life.

  • A webshit, based on his hobby project, decides that the entire web advertising market is a lie. He's right, but for the wrong reasons. Hackernews trades tips on convincing themselves their entire industry isn't a sack of bullshit.

  • People hired to look at terrible shit forty hours a week tend to go crazy. Hackernews decides this must be why cops are all assholes and that the solution is more cops. One Hackernews suggests just hiring perverts.

  • The New York Times -- world's leading authority on San Francisco -- tells us that San Francisco is a microcosm of America. Hackernews spends equal time telling each other how to donate money toward fixing problems and telling each other that donating money will not fix any problems. Nobody realizes Hackernews users are the problem, including the New York Times.

  • A leisure studies major vomits a couple thousand words of dime-store evolutionary psychology. Hackernews seizes on the opportunity to delude themselves into believing that their crippling anxiety and ever-increasing depression are what makes them better than you.

  • Hackernews is concerned that stupid poor people might not realize they are less alive if they choose to entertain themselves instead of working ceaselessly unto death. The behavior of children is held up by the childless as an example for us all. Some dipshit thinks running his website is akin to preagricultural survival. Dimly, a few Hackernews users experiment with the idea that money and public acclaim are not the only route to happiness, but this heresy is drowned out by the relentless insistance that being rich is the only way to experience joy.

  • An idiot posts to Medium a rambling narrative regarding the importance of his phone app. Hackernews maintains the only way to be sure your shit is right is to host all of your own communications tools. Google Analytics silently notes which citizens have been contaminated with toxins inimical to surveillance capitalism. The machine sleeps.

Previously, previously, previously, previously.

Read the whole story
687 days ago
"Google Analytics silently notes which citizens have been contaminated with toxins inimical to surveillance capitalism. The machine sleeps."
Share this story
2 public comments
686 days ago
687 days ago
God this is so spot-on
Next Page of Stories