Code Nerd Blog: 2017

Monday, December 18, 2017

The 10 Algorithms Every Programmer Must Know About

In my Code Nerd I wrote that the disjoint sets algorithm is one of the very few algorithms every programmer should know. That got me thinking. Should? What about must? If everyone must know about disjoint sets, what other algorithms must every programmer know about?
I made a “top ten” list of algorithms and data structures every programmer must know about.

The list would be as follows:

Lists, Arrays, Stack. Lists, arrays, and stacks are certainly the most basic data structures, yet, these building blocks can reserve a few surprises. For example, counter-intuitive sentinel nodes are extensively used in the STL to represent the position just passed the last element in a list. This makes, for example, the insertion of an item in the end() position rather efficient. While lists do not allow direct access to an element, arrays do, but at the cost of allocating a known a priori amount of memory. Conversely, inserting an element into a list, given the current position, is constant time, while to insert a new element in an array is more costly, depending on whether or not one wants to preserve ordering. Stacks are a special case of both, one could say, because they behave mostly like lists where operations are allowed only at one end; and like arrays because they are most often implemented in a contiguous span of memory, i.e., an array. As these three are common building blocks of more complex algorithms and data structures, you should really master them.
Trees. Trees, or more exactly search trees whether they be binary, k-ary, splay, AVL or B, are the next data structures on the list, offering $O(\log n)$ operations for searching, inserting, and removing values (in the average case). The most interesting tree varieties try to insure that they also have worse cases in $O(\log n)$ , that is, they don’t go overly deep or take inordinately long to perform their operations. To do so, the prevailing strategy is to balance the tree so that it is of (almost) equal depth everywhere. Some varieties make the tree even shallower by having a branching factor much higher than two, like it is the case with $(a,b)$ trees (also known sometimes as $(n,m)$ trees). Splay trees, on the other hand, unbalances the trees to shift the most oft-accessed item near the root of the tree, while remaining a search tree. Understanding these data structures and what trade-offs are involved will help you choose wisely which will suit your application better. Note that the structures that equalize depth do not only make sure that the worst case is the average case: they implicitly suppose that all items in the collection have an equal probability of being accessed—something quite contrary to the Pareto Principle (or more accurately, maybe, to Zipf’s law).
Sorting and Searching. Data does not always come in the right order, so it is possible that you will have to sort before applying any meaningful algorithm to it. Sorting is a complex topic, but I think that the two algorithms you really should know about are QuickSort and Radix Sort. QuickSort is a comparison-based sort, that is, elements are compared to each other to determine their relative order. Eventually, QuickSort makes enough comparisons ( $O(n \log n)$ for $n$ items, on average, but has a worst case of $O(n^2)$ —which can be avoided, but that’s another story for now) to eventually produce a sorted list of items. The theoretical lower bound for comparison-based sorts is $O(n \log n)$ comparisons, so QuickSort is doing very well on average. The great simplicity of the algorithm makes it very interesting in a very wide range of cases.
Radix sort, on the other hand, an address transformation sort that sorts the items in linear time, that is, in $O(n)$ , making it much faster than QuickSort. It is much faster, but Radix Sort is much simpler when keys are numeric or of fixed width; dealing with variable length keys efficiently makes the algorithm slightly more complex. Read more on Radix Sort here and in this old DDJ article.
Searching on a sorted array can be performed using the basic binary search in $O(\log n)$ time. Creating and searching efficient indexes will also play a major role in managing and searching large data sets.
Priority Queues. Sometimes you don’t really care if a data set is completely sorted or not. You just want to determine rapidly its maximum (or minimum) valued item and remove it from the set. Ideally, determining the maximum item should be an $O(1)$ operation, adding and removing values a $O(\log n)$ operation. Turns out that heaps as implementation of priority queues are pretty efficient. Not only do they have efficient algorithms, they can also be implemented into contiguous arrays, dispensing one from using a pointer-based data structure, saving potentially a lot of memory. There are a lot of application of heaps, which range from scheduling (determining who goes next) to cache management (removing the oldest items from the cache).
Pattern Matching and Parsing. Every single one of us had to write a parser or a filter of some sort. Whether it’s to find data in an insanely large log or to read a simple configuration file. Not always, but very often, the sanest way to go around it is to use regular expressions, either using a library, such as Boost‘s regex, or Python‘s re, or by implementing the regular expression algorithms yourself—which would make it the insane way, I guess. Understanding how they work will greatly help you process and filter insufficiently structured data. Don’t ask why Perl is so popular.
Hashing. Hash functions play a central role in cryptography, error detection, cache management, and efficient look-up. Despite the superficial differences—the applications—all hash functions are closely related; in fact, they differ more in quality than nature. A hash function takes a message and deterministically produces a signature that looks like a pseudo-random number, $n$ bits long, usually with $n$ rather large, say 128 or 256. The harder it is to correlate the hash value with the original data, the safer the hash function is. If it is really hard to find the original data given a hash value, the function can serve as a one way function and be used in cryptographic applications. If the hash function is just good enough, it can serve as an (most probably) unique identifier for data items, which in turn can be used for cache management or hash tables. In either case, you have to understand the mathematics of the probability of collisions (two or more items hashing to the same value) so you can assess what hashing scheme is sufficient for your application. Read about von Mises’ birthday paradox to understand how to compute the probability of collision.
Disjoint Sets. The Disjoint sets data structure is a helper structure that you will need in a wide variety of algorithms, whether graph algorithms or image processing. Disjoint sets serves to represent multiple sets within a single array, each item being the member of one of many sets. Considered like that, this sounds like a rather special case, but in fact, there are a number of applications, not all obvious. Connected components arise in graph algorithms are most efficiently represented using the disjoint set data structure. It can also serve to segment an image, a process also know as computing the connected components or labeling. What’s also very interesting about disjoint sets is just how incredibly efficient operation are: using path compression, operations can be performed in expected constant time! The special case where there are only two possible sets reverts to a simple bitmap, a data structure that is simple to manage and requires comparatively little memory—use of which can be further reduced, with a price, using Bloom filters.
Graph Algorithms and Data Structures. Problems like computing the minimum spanning tree, the determination of the shortest path between two nodes, matching, and the detection of cut vertexes will arise in a number of situation. Google’s PageRank, for example, is a very concrete application of graph algorithms. Often, seemingly unrelated problems can be mapped to graph algorithms for which very efficient algorithms, possibly dynamic programming related, already exist. There is also a large body of literature devoted to the data structures used for graphs, considering every possible special case: sparse, dense, clique-rich, or small world networks, etc.
Dynamic Programming. Closely related to graph algorithms, dynamic programming exploit the fact that the optimal solution to a large problem can be expressed as an optimal combination of sub-problems. Not all problems are amenable to this method, because not every objective function abide to the principle of optimality, but many optimization problems do. Dynamic programming exchanges memory for computation, a generally beneficent trade-off. Memoization could be considered a limited form of dynamic programming where previous evaluations of an expensive function are cached and reused rather than recomputed if asked for again. Memoization greatly reduce computational complexity when used in combination with an optimization (or simply recursive) algorithm. For example, the n-th Fibonacci number can be computed in $O(n)$ time using memoization, while the basic recursive formulation results in no less than $O(F_n) \sim \phi^n$ calls! Here, $\phi$ is Phidias‘s number, the golden ratio.
State Space Search Algorithms. Sometimes the scale of the problem, or the vastness of the state space makes it impossible to represent the problem as a graph. Consider chess as the example par excellence. The exact number of distinct valid states of the game is still debated, but is thought to be in the order of $10^{43}$ , and the number of arcs in the game graph somewhere around $10^{120}$ , as computed Shannon, therefore known as Shannon’s Number. It would be infeasible to search the graph corresponding to all possible games to determine the optimal move given any valid chess position. To deal with this immense state space, one would use one of the many state space search algorithms such as limited depth search, Breadth-first search, or if enough is known about the objective function, an algorithm like A*. State space search is very often used for game artificial intelligence and other optimization problem where the structure is too complex, state space too vast, or of which too little is known to derive a more efficient algorithm.
Genetic algorithms are closely related to stochastic optimisation, where the state space is searched at many different points in parallel. The darwinist metaphor is apt, to a certain point, as “genes” (vectors) are “mutated” (varied randomly) to search the state space. Only the fittest (higher valued vectors, under the objective function) are kept for the next “generation” (iteration) others “die off” (are dropped).

I could have made the list much longer, at least to include topics such as data compression, a personal favorite. I could have included something about, say, control theory, which is also highly useful in the context of industrial informatics, but I also wanted the list to be a Ten Commandments type list (without the guilt and the vengeful God thing), something that would be easy to remember and that would apply to most programmers out there. Of course, you just read those and are thinking “why didn’t talked of XYZ”. Maybe I did miss something important; it may also be that “XYZ” is more of a niche topic than you think. I cannot think of this list as authoritative in any way; yet, it follows more or less closely what you will find a typical “Introduction to Algorithms” book. Either way, let met know.
This post is also in line with the “right toolbox” answer to one of Gnuvince’s commentator. I think that, again, it is not about being a guru in every of the ten topics I presented, but merely proficient, and to develop the reflex to see the similitudes between your current problem and one of the algorithms you already know, even though the mapping may not be exactly self-evident. This is not a “pick n out of m” list however. The more, the better. Make n=m.

3 Best Go Programming Books

Here is the list I've created as an answer on Quora:

1.The Go Programming Language

The most authoritative resource for modern programming is the book The Go Programming Language. It spans 400 pages full of exercises with clear explanations of how programming languages work from the ground up.

You do not need any prior knowledge to work through the lessons in this book. They’re extremely simple to pick up and you can learn so much just from tinkering with Go on your own.

The writing style is accessible even for complete beginners who have never written a line of code in their life. This can work well as a beginner’s book, although it is very technical and goes into a ton of detail.

If you want to learn Go maybe do some background research online first. After that you’ll know if you want to continue learning and if this book would work for you.

Buy on Amazon

2.Go in Practice

Manning’s practical approach to teaching is always refreshing and it’s fantastic to see how they’ve covered the Go language in this book.

Go in Practice works much like a step-by-step learning resource where you’ll study 70 different examples of Go programs and how they work. The authors take you through the language starting with a very simple “hello, world!” application.

From there you’ll delve into the CLI and pick up tips for local scripting through the command line. This includes basic math computations, working with routing, and learning goroutines for concurrency.

If you don’t understand any of these ideas don’t worry! The writing style is very clear and concise so you should have no trouble learning the ropes fast.

And if you can’t pick up a specific topic you can always Google for solutions on the web.

Buy on Amazon

3.Introducing Go: Build Reliable, Scalable Programs

As a general purpose language there’s a lot you can do with Go. It was built to run fast and concurrency + scalability are two crucial subjects to understand.
The book Introducing Go: Build Reliable, Scalable Programs is very short but very sweet. It’s only 120 pages long and it covers a lot of the fundamentals of Go in a pithy writing style.
If you already know some programming in another language this book will be a piece of cake. I think it’s better suited for complete novices who want a simple intro without grabbing a massive title.
These exercises are not as practical for the real world but they do an excellent job showcasing what you can do in Go. By the end of this book you should feel comfortable writing your own applications and expanding your knowledge on your own.

Buy on Amazon

Go really isn’t that hard to pick up but it can take a while to master. Through practice you’ll learn where this language works best and how to best apply it into everyday life.

Sunday, December 17, 2017

5 Books Every Smart Person Should Read

Here is the list:

1. ‘PALE FIRE’ BY VLADIMIR NABOKOV

You might remember Nabokov after reading (and probably re-reading) Lolita, but have you tried the wacky world of Pale Fire? It features a 999-line poem by a recluse named John Shade, and then a that narrative takes a lot of twists and unexpected turns. Reading this book is like solving a riddle — one that will make your brain work harder than it ever has before. And while it might feel like a dense read, remember, it’s actually funny — something to surely bring up in your snobby conversations from here on out.

Buy on Amazon

2. ‘THE TEMPEST’ BY WILLIAMSHAKESPEARE

Listen, Shakespeare is perhaps the greatest writer of all time. Though his very last play, some say The Tempest is his best work. Assuming the dude had been writing for most of his life by the time he debuted this stellar play, it’s no wonder it’s a literary masterpiece. It’s got all the juicy stuff Shakespeare is made of: Corruption, power, innocence, and more. Get moving.

Buy on Amazon

3. ‘MIDDLESEX’ JEFFREYEUGENIDES

Read Middlesex because it’s great, but discuss it in public because you will sound super cool. Why? Not only is this book a thrilling read, it happens to feature one of fiction’s greatest narrators: Cal, a young girl who must uncover a family secret. This epic novel didn’t win the Pulitzer Prize for nothing!

Buy on Amazon

4. ‘AS I LAY DYING’ BYWILLIAM FAULKNER

This Southern Gothic classic novel is not only dark and mysterious, but it broke the mold. A different family member of the Bundren clan — a broken family that must bury their wife and mother in the South — narrates each chapter. Each offers a totally unique voice, some ranging from one line to many pages. Faulkner really changed the game when he decided to play with form in this classic tale. Also, James Franco made it into a movie, but I’m too afraid to see it to tell you if it’s any good.

Buy on Amazon

5. ‘THE COLLECTEDSHORT STORIES’ BY EUDORA WELTY

You probably recognize her name, but have you actually read Eudora Welty yet? This woman is decorated with so many more literary awards than you can imagine. I suggest her short story collection because her talent really shines in the many diverse points of view. She’s got a special voice that deserves to be mentioned on your next blind date.

Buy on Amazon

Top 4 Data Structure and Algorithm Books

Here is the list:

1. Introduction to Algorithms by Thomas H. Cormen

This is one of the best books on Computer Algorithms, it's written by four authors, one of them is Thomas H. Cormen, whose another book Unlocked Algorithm is also the most recommended book to learn algorithms. This book is a lot more comprehensive and covers lots of different algorithm and advanced problem-solving technique e.g. greedy algorithms, dynamic programming, Amortized Analysis, along with elementary data structures like Stacks and Queues, Array and linked list, Hash tables, Tree, and Graph. This book is a unique combination of completeness and rigorous. Another good thing about this book is that algorithms are explained in English, and in pseudo code, which can be understood by even programmers, who has just started programming. It's equally useful for all kinds of programmers e.g. senior, experienced and freshers and in all kind of programming language e.g. Java, C or C++. One of the must-reads books on Algorithms for software programmers and developers.

Buy on Amazon.com

2. Algorithms Unlocked by Thomas Cormen

Algorithms are complex and hard to understand, even for a computer science graduate. Any book, which makes a readable attempt of the algorithm, by associating with real worth things, does a huge favor for its reader. Algorithm Unlocked is one of such book, which presents some of the widely known computer algorithms in the field of finding the shortest path, searching and sorting algorithms, String related algorithms, cryptography and data compression algorithms and some interesting problems. This book is one of the most engaging and readable books on the topic of algorithms and worth of every penny spent on it. Only thing, I found this book lacks is that it only covers Algorithms and not data structures, as it can not be used as a reference book. It's the best to use is as a companion, along with a much more comprehensive book on data structures and algorithms.

Buy on Amazon.com

3. Data Structures and Algorithms Made Easy

This is another data structure and algorithm book, which scores well on readability and practical usefulness. I particularly like its clean, clear and concise explanation; followed by real world use case and then lots of problems to master a particular data structure or algorithm. Only thing, which is not per my convenience was its examples, which are written in C programming language. If you can easily manage that then it's a very good book to learn data structure. In fact, this encouraged me to write my own implementation in Java while going through it, which certainly helps in long run. Remember, getting an objective feel of what is data structure, how does it work is quite different than implementing same data structure by yourself, and then trying different things e.g. finding cycles in linked list or finding middle node of linked list in single pass, is a good exercise after you implemented linked list data structure in Java. Combining back to the book, you can certainly buy this book on readability, clear and concise explanation and, more importantly, nontrivial examples. One of the best book to learn data structure and algorithms for beginners.

Buy on Amazon.com

4. Data Structures and Algorithms in Java

This is another conventional book on Algorithms and Data structures. Two things, which I liked about this books are, examples are given in my favorite Java programming language and you can use this book as a reference for learning data structures like stack, queue, linked list, tree or graph. The good thing about this book is that if not only focuses on data structures and algorithms but also on Java, which makes it an ideal choice for Java programmers. Though it doesn't cover a lot of algorithms, it did cover algorithms related to directed and cyclic graphs, minimum spanning trees and comes up with a lot of exercises for practice. Not the best, but a good book to learn algorithm and data structure in Java.

Buy on Amazon.com

5 Python Books for Beginners

Another blog post about books!

1. Learn Python the hard way.

Learn Code the Hard Way is offered as free PDFs. You can learn Python, Ruby, C, SQL or Regex ‘the hard way’. These books are written by Zed Shaw and will give you more insight in how to get started. Learn Python the Hard Way is written for beginners how know nothing about programming and it will teach you the basics of programming. You will learn how to use the terminal and the text editor. And most important: Read the examples. Type the code precisely (no copying and pasting!). Fix your mistakes. Watch the program run and learn from it. This book is really about learning by doing.

2. Learning Python, 5th Edition

This book offers you a comprehensive learning tool for Python. It covers almost everything you need to know about programming in Python: Types & Operations, Statemens and Syntax, Functions and Generators, Modules and Packages and much more. I really like the first chapter, it’s a Q&A session about Python and why people use Python. If you’re a beginner, this can help you a lot. A nice feature of this book is that every chapter ends with a quiz, so you need to challenge yourself all the time.

3. Think Python. How to Think Like a Computer Scientist

This is a great book if you’re new to programming or coming from another language. Each chapter ends with debugging tips, a glossary and some exercises to keep you going. This really is an hands-on guide that will teach you Python one step at the time.

4. Beginning Python: From Novice to Professional.

This book gives you a great overview and covers the Python fundamentals in-depth. A really like that the first part of the book guides you through a series of tasks including working with files, databases, web programming and testing. The last ten chapters consist of ten programming projects: ‘one must program in order to learn to program’. The projects include: automated text markup, remote text editor, file sharing with XML-RPC, a ‘do-it-yourself’ arcade game and much more.

5. Python Pocket Reference (Pocket Reference (O’Reilly))

If you’re travelling a lot, this book is right for you! A convenient pocket guide worth keeping close at hand. This reference covers both Python 3.4 and 2.7. It’s accurate, easy to understand and very well-written. If you’re on your way and aren’t able to go trough Python’s documentation online, this pocket-book comes in quite handy.

Of course there are many more Python books, so if you have suggestions about which books are really worth mentioning, please let me know!