shellsort (js)

Shellsort is the same as insertion sort. Except that it uses a pass array to decide how many number of steps to jump.

In Insertion Sort, we always use 1 step. When we backtrack, we go back 1 step. But in Shellsort, we have an array, where we decide how many jumps to backtrack. This is called a “gap”. Of course in the very end, we’ll definitely have 1 step at the end.

In our example, we decide to use 5, 3, 1. What this means is that we will sort this via Insertion via passes where we jump 5 steps, then 3 steps, then 1 step. So for step 5, we start at index 5 of the array. Do the Insertion sort, but instead of using 1 step when going backwards, we use 5 steps.

first pass, 5 steps

We start off at index 5. Will process the insertion sort, and when we’re done, we go down the array 1 by 1.

The insertion sort itself will use 5 steps when sorting via

Everything else is processed the same way as Insertion Sort. We get the current valueToInsert. It needs to satisfy a few points:

1) is the index larger or equal to the gap index?
2) is the previous gapped element larger than the valueToInsert?

If so, we assign the previous gapped element to the current one.
If not, we just put the valueToInsert back to the current element.

Given the Insertion Code with the gap element like so:

The gap we use is 5. So we need to do an Insertion pass where the gap is 5.

We start index 5. The value at that index is 11. 11 is the value we need to check and find a proper place to insert.

The Insertion sort happens when we have j point to i.

1) j>= gap, 5 >= 5 ? yes
2) previous gapped element > valueToInsert, 6 > 11 ? NO

dataStore[5] = 11, we move down the array.

shellsort_pass5_1

The next index is 6. dataStore[6] is 8. The previous gapped element is dataStore[6-5] is 0

1) j >= gap, 6 >= 5? yes
2) previous gapped element > valueToInsert, 0 > 8, NO

dataStore[6] = 8, we move down the array

shellsort_pass5_2

The next index is 7. dataStore[7] is 0. Its previous gapped element dataStore[7-5] is 2.

1) j >= gap, 7 >= 5? yes
2) previous gapped element > valueToInsert, 2 > 0 yes

When both requirements is true, we exchange the element values. Basically storing the previous value into the current value. This is the because the previous value is larger than the current value.

We continue on the insertion sort by looking back at -gap elements. Namely, j-= gap. Hence our j is now at index 2, dataStore[2] = 2.
We check the requirements again:

1) j >= gap, 2 >= 5 NO

So we jump out of the loop and assign the valueToInsert at dataStore[2]. This perfectly switches the numbers at dataStore[2], and dataStore[7].

shellsort_pass5_3

We move onto the next index at 8. dataStore[8] is 5.
Previous gapped element is dataStore[8-5] = 9

1) j >= gap, 8 >= 5, yes
2) previous gapped element > valueToInsert, 9 > 5 yes

Hence we need to copy the previous element to the current one. We then backtrack gap steps into index 3

1) j >= gap, 3 >= 5, NO
therefore dataStore[3] = 5.

We keep going down the array.

shellsort_pass5_4

We process the last element at index 9, dataStore[9] is 4.

dataStore[9] is 4
dataStore[9-5] is 3

1) j>= gap, 9 >= 5
2) previous gapped element > valueToInsert, 3 > 4, NO.

So we jump out of the loop
dataStore[9] = 4

next gap pass, 3 steps

We start from beginning to end, and at index gap. We do the insertion sort with gap steps, however, we process each element down the array one by one.

index 3

Hence, our next pass involves gap 3. We start off at index 3.
We see that dataStore[3] is 9. valueToInsert = 9.
Previous gap element is dataStore[0] is 6.

is index 3 >= 3 yes
6 > 9 (valeToinsert) NO, move on to next index

shellsort_pass3_1

index 4

We now on index 4.
dataStore[4] = 3, valueToInsert = 3
dataStore[4-3] = 0

4 >= 3 yes
0 > 3 (valeToInsert) NO, move on to next index

index 5

We now on index 5.
dataStore[5] is 11, valueToInsert = 11
dataStore[5-3] is 2

5 >= 3 yes
2 > 11 (valeToInsert), NO, move on to next index

shellsort_pass3_2

index 6

We now on index 6.
dataStore[6] is 8 valueToInsert = 8
dataStore[6-3] is 9.

6 >= 3 yes
9 > 8(valeToInsert), we then copy over the value from the previous element to the current

Thus, dataStore[6] = 9.

We go back 1 gap step so now j is at index 3.
dataStore[3] is 9
dataStore[3-3] is 6

3 >= 3 yes
6 > 8 (valeToInsert), NO, we skip out. have current index j, dataStore[3] = 8 (valueToInsert)

move forward down the array

shellsort_pass3_3

index 7

We are now at index 7.
dataStore[7] is 0. valueToInsert is 0.
dataStore[7-3] is 3

7>=3 yes
3 > 0 yes

Let’s copy previous element to current so now
dataStore[7] is 3.

j moves a “gap” step back to index 4

dataStore[4] is 3
dataStore[1] is 0

4 >= 3, YES
0 > 0 NO

dataStore[4] is 0.

Move forward

shellsort_pass3_4

index 8

dataStore[8] is 5 (valueToInsert)
dataStore[8-3] is 11

8 >= 3 yes
11 > 5 yes

So we need to copy the previous value over to current.

dataStore[8] = 11

Go back 1 “gap” step to index 5

dataStore[5] is 11
dataStore[5-3] is 2

5 >= 3 yes
2 >= 5 (valueToInsert) NO

dataStore[5] = 5

move on…

shellsort_pass3_5

Index 9

Finally, we are index 9.

dataStore[9] is 4 (valueToInsert)
dataStore[9-3] is 8

9 >= 3 yes
8 > 4 yes

so we copy the previous gap element to current
dataStore[9] = 8

we go previous gap element to index 6

dataStore[6] is 8
dataStore[6-3] is 9

6 >= 3, YES
9 >= 4 (valueToInsert), YES

copy the previous gap element to current
dataStore[6] = 9

go back “gap” elements

dataStore[3] is 9
dataStore[0] is 6

3 >= 3, YES
6 > 4 (valueToInsert) YES

copy the previous gap element to current
dataStore[3] = 6

go back “gap” elements

index is now 0
0 >= 3 NO

so dataStore[0] = 4 (valueToInsert)

Done for gap 3 pass

shellsort_pass3_6

Full Source

Circular List in JS (prototype pattern, constructor pattern)

Prototype Pattern

– Using prototypes to create classes, all instances of that class will the same functions.

– Can’t have private utility functions. All properties and functionalities can be seen by caller.

Constructor Pattern with scoped private variables

By using scoped variables, we can have privacy, and can create get/set functionalities for public access.

However, not very practical as I will need to create get/set functions for all the private variables.
Then, using them in my functions can be awkward as assigning references must be dealt with using set functions.
And being assigned, we’ll have to call the get functions.

Insertion Sort

http://www.geeksforgeeks.org/insertion-sort/

Insertion sort is a simple sorting algorithm that works the way we sort playing cards in our hands

Given, 8 5 3 12 2

For the first pass, we always start at index 1. So we see 5 and store it in variable valueToInsert.

There are 2 conditions we must satisfy:

1) 1 > 0, yes. index > 0
2) 8 > 5, yes. previous element is larger than valueToInsert

Both satisfies so, we copy the previous element (8) to the current location where 5 is.

We move 1 step back and hit index 0.

We evaluate the 2 situations again:

1) 0 > 0, no. We stop.

If we reach index 0, that means we’ve reached the end, and we our stored 5 there.

Essentially what we’re doing is taking the current number and going backwards to see where to insert it. We do this by comparing the previous element and seeing if its bigger than our valueToInsert. If its larger, push it down and we keep going. If its smaller, we stop and we insert our number. If we reach the end at index 0, we insert our number.

insertionsort_ex_a

Now we have 5 8 3 12 2.

We work our next pass which is index 2 with value 3.

We store 3 and start searching for the insertion point. We evaluate the 2 points:

1) is index > 0? 2 > 0, yes.
2) We look at the previous element 8. is 8 > 3? yes, so we move 8 to where 3 is.

Move backwards 1 step.

1) is index > 0? 1 > 0, yes.
2) Then we look at previous element 5, is 5 > 3? yes. so we move 5 to where 8 is.

1) is index > 0? 0 > 0, no. We’ve reached the end. Hence we put 3 at index 0.

This leaves us with 3 5 8 12 2.

insertionsort_ex_b

Then we work on value 12.

1) is index > 0? inner 3 > 0, yes.
2) Previous element 8 > 12? no. We stop, and move on.

Finally, our last value is 2. Apply the same as we did above and we’ll see that 2 will be inserted at the way in the beginning because it is smaller than the other numbers.

insertionsort_ex_c

full source

Time Complexity

The best case input is an array that is already sorted. In this case insertion sort has a linear running time (i.e., O(n)). During each iteration, the first remaining element of the input is only compared with the right-most element of the sorted subsection of the array.

The simplest worst case input is an array sorted in reverse order. The set of all worst case inputs consists of all arrays where each element is the smallest or second-smallest of the elements before it. In these cases every iteration of the inner loop will scan and shift the entire sorted subsection of the array before inserting the next element. This gives insertion sort a quadratic running time (i.e., O(n2)).

The average case is also quadratic, which makes insertion sort impractical for sorting large arrays.

However, insertion sort is one of the fastest algorithms for sorting very small arrays, even faster than quicksort; indeed, good quicksort implementations use insertion sort for arrays smaller than a certain threshold, also when arising as subproblems; the exact threshold must be determined experimentally and depends on the machine, but is commonly around ten.

Selection sort

https://en.wikipedia.org/wiki/Selection_sort

Worst-case performance О(n2) comparisons, О(n) swaps
Best-case performance О(n2) comparisons, О(n) swaps

The idea is that we latch onto the first element and deem it being the minimum.
We then go through the rest of the array, and look for anything that is smaller. If so, we swap. This is 1 pass.

At this point, after the 1st pass, we deem first element as minimum. Then we move on
to the next element and call it the minimum. We do another pass and look in the leftover array
to see if there’s anything smaller than our minimum. If so, swap.

We keep doing this, pulling minimum numbers from the array and pushing it towards the front, until
all the elements are sorted from smallest to largest.

selectionsort-outer-inner

So there’s 2 loops: outer and inner.

Outer simply goes through the array.

The inner goes through the remaining elements AFTER the outer via a loop and compares it to the outer.

if the inner element is smaller than the outer element, then we swap. Essentially what is happening is that we are bringing in the smallest element to the front.

Example

Given an array, we make passes on it where:

– we select the starting value to be the minimum.
– we then make (length of array – 1) passes over it. (through all of our passes, the last element will be automatically sorted, so no need to do anything on it)
– For every pass, we need to do the very important action of pulling the minimum from the array and swapping it with the first value.

Example

…continuing on in the same fashion…

Eventually all the smallest number will be swapped to the front, resulting in the list being sorted.

full source

Stack and Heap

https://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap?rq=1

The stack is the memory set aside as scratch space for a thread of execution.

When a function is called, a block is reserved on the top of the stack for local variables, parameters, and some bookkeeping data.

When that function returns (finishes execution), the block becomes unused (all the variables gets popped) and can be used the next time a function is called.

The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer.

The heap is memory set aside for dynamic allocation. Unlike the stack, there’s no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns.

Each thread gets a stack, while there’s typically only one heap for the application (although it isn’t uncommon to have multiple heaps for different types of allocation).

The stack is attached to a thread, so when the thread exits the stack is reclaimed. The heap is typically allocated at application startup by the runtime, and is reclaimed when the application (technically process) exits.

The size of the stack is set when a thread is created. The size of the heap is set on application startup, but can grow as space is needed (the allocator requests more memory from the operating system).

The stack is faster because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or deallocation. Also, each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor’s cache, making it very fast.

Another performance hit for the heap is that the heap, being mostly a global resource, typically has to be multi-threading safe, i.e. each allocation and deallocation needs to be – typically – synchronized with “all” other heap accesses in the program.

Has table – Separate Chaining, Open Addressing.

ref – https://en.wikipedia.org/wiki/Associative_array

The most frequently used general purpose implementation of an associative array is with a hash table: an array combined with a hash function that separates each key into a separate “bucket” of the array. The basic idea behind a hash table is that accessing an element of an array via its index is a simple, constant-time operation. Therefore, the average overhead of an operation for a hash table is only the computation of the key’s hash, combined with accessing the corresponding bucket within the array. As such, hash tables usually perform in O(1) time, and outperform alternatives in most situations.

Hash tables need to be able to handle collisions: when the hash function maps two different keys to the same bucket of the array. The two most widespread approaches to this problem are separate chaining and open addressing.

https://en.wikipedia.org/wiki/Hash_table#Separate_chaining

Algorithm Average Worst Case
Space O(n)[1] O(n)
Search O(1) O(n)
Insert O(1) O(n)
Delete O(1) O(n)

The idea of hashing is to distribute the entries (key/value pairs) across an array of buckets. Given a key, the algorithm computes an index that suggests where the entry can be found:

index = f(key, array_size)

hash = hashfunc(key) // where hash is some number
index = hash % array_size // in order to fit the hash into the array size, it is reduced to an index using modulo operator

In this method, the hash is independent of the array size, and it is then reduced to an index (a number between 0 and array_size − 1) using the modulo operator (%) .

Choosing a hash function

http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx

Table size and range finding

The hash functions introduced in The Art of Hashing were designed to return a value in the full unsigned range of an integer. For a 32-bit integer, this means that the hash functions will return a value in the range [0..4,294,967,296). Because it is extremely likely that your table will be smaller than this, it is possible that the hash value may exceed the boundaries of the array.

The solution to this problem is to force the range down so that it fits the table size.

For example, if the table size is 888, and we get 8,403,958, how do we fit this value within the table?

A table size should not be chosen randomly because most of the collision resolution methods require that certain conditions be met for the table size or they will not work correctly. Most of the time, this required size is either a power of two, or a prime number.

Why a power of two? Because we use bitwise operation to get performance benefits. Bitwise operations are done on binary bit combinations and thus, that’s why the table size needs to be in power of 2. A table size of a power of two may be desirable on some implementations where bitwise operations offer performance benefits. The way to force a value into the range of a power of two can be performed quickly with a masking operation.

For example, to force the range of any value into eight bits, you simply use the bitwise AND operation on a mask of 0xff (hexadecimal for 255):

0x8a AND 0xff = 0x8a

from hex to digit:

Note that _ _ _ _ _ _ _ _ = 8 bits = 1 byte.
2 2 2 2 2 2 2 2 = 2^8 = 256 representations (memory addresses)

0x8a -> 1000(8) 1010(a) -> 1000 1010 binary

binary to decimal is 1 * 128 + 0 * 64 + 0 * 32 + 0 * 16 + 1 * 8 + 0 * 4 + 1 * 2 + 0 * 1 = 128 + 10 = 138.

from digit to hex:

Thus, if we were to get a value of 138, we force this value to give us a return index from an array size of 8 bits by using AND operation with 0xff.

Thus, in code, you’ll get the parameter 138. Then you convert it to 1000 1010, then to hex which is 0x8a.
Then you apply the AND bit op which gives you 0x8a AND 0Xff = 0x8a

so 138 in 256. But what if you get some large number like 888?
888 in binary is 0x378

0x378 AND 0x0ff = 78

we append 0 to front of 0xff because we’re dealing with 3 hex places. We apply the AND and get 78. Thus if you get hash value 888, it would give you index 78.

table[hash(key) & 0xff]
This is a fast operation, but it only works with powers of two. If the table size is not a power of two, the remainder of division can be used to force the value into a desired range with the remainder operator. Note that this is slightly different than masking because while the mask was the upper value that you will allow, the divisor must be one larger than the upper value to include it in the range. This operation is also slower in theory than masking (in practice, most compilers will optimize both into the same machine code):

table[hash(key) % 256]
When it comes to hash tables, the most recommended table size is any prime number.

This recommendation is made because hashing in general is misunderstood, and poor hash functions require an extra mixing step of division by a prime to resemble a uniform distribution. (https://cs.stackexchange.com/questions/11029/why-is-it-best-to-use-a-prime-number-as-a-mod-in-a-hashing-function)

Another reason that a prime table size is recommended is because several of the collision resolution methods require it to work. In reality, this is a generalization and is actually false (a power of two with odd step sizes will typically work just as well for most collision resolution strategies), but not many people consider the alternatives and in the world of hash tables, prime rules.

Backstory on Unicode and UTF8

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

All you need to know about memory is that it’s one large array. But one large array containing what? The array contains bytes. In computer organization, people don’t use the term “index” to refer to the array locations. Instead, we use the term “address”.

“address” and “index” mean the same, so if you’re getting confused, just think of “address” as “index”.

Each address stores one element of the memory “array”. Each element is typically one byte.

memory_1_byte_elemnt

We’ve defined a word to mean 32 bits. This is the same as 4 bytes. Integers, single-precision floating point numbers, and MIPS instructions are all 32 bits long. How can we store these values into memory? After all, each memory address can store a single byte, not 4 bytes.

The answer is simple. We split the 32 bit quantity into 4 bytes. For example, suppose we have a 32 bit quantity, written as 0x90AB12CD, which is hexadecimal. Since each hex digit is 4 bits, we need 8 hex digits to represent the 32 bit value.

So, the 4 bytes are: 90, AB, 12, CD where each byte requires 2 hex digits.

It turns out there are two ways to store this in memory.

High Endian

memory_representing_word

Address Value

1000 90
1001 AB
1002 12
1003 CD

Little Endian

In little endian, you store the least significant byte in the smallest address. Here’s how it would look:

Address Value
1000 CD
1001 12
1002 AB
1003 90

some backstory…
The earliest idea for Unicode encoding, which led to the myth about the two bytes, was, hey, let’s just store those numbers in two bytes each. So Hello becomes
0048 0065 006C 006C 006F

However, some computers are big-endian, some little-endian. Thus, there two ways to store Unicode. One way of storing Unicode is big-endian mode. Another is little-endian mode. How do we know which mode to store in? It uses BOM (Unicode Byte Order Mark) to signify this. If your computer is in little-endian mode, and the incoming Unicode word’s BOM signifies big-endian, then you have to swap that big-endian so that its little endian. And vice versa. Also, not every Unicode string in the wild has a byte order mark at the beginning.

So people were forced to come up with the bizarre convention of storing a FE FF (2 bytes) at the beginning of every Unicode string.

For a while it seemed like that might be good enough, but programmers were complaining. “Look at all those zeros!” they said, since they were Americans and they were looking at English text which rarely used code points above U+00FF.

In order to use BOM, it would double the amount of storage it took for strings.Then there was ANSI and DBCS character sets and who’s going to convert them all?
For this reason alone, most people decided to ignore Unicode for several years and in the meantime things got worse.

Thus was invented the brilliant concept of UTF-8. UTF-8 was another system for storing your string of Unicode code points, those magic U+ numbers, in memory using 8 bit bytes. In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, up to 6 bytes.

This has the neat side effect that English text looks exactly the same in UTF-8 as it did in ASCII, so Americans don’t even notice anything wrong. Only the rest of the world has to jump through hoops. Specifically, Hello, which was U+0048 U+0065 U+006C U+006C U+006F, will be stored as 48 65 6C 6C 6F, which, behold! is the same as it was stored in ASCII, and ANSI, and every OEM character set on the planet.

Thus, 3 “popular” ways of encoding Unicode:

UTF-16 (16 bits, high-endian)
UTF-16 (16 bits, low-endian)
UTF-8

Unicode

https://stackoverflow.com/questions/2241348/what-is-unicode-utf-8-utf-16

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Unicode is a character set

Unicode assigns every character in existence a unique number called a code point.

In other words, a letter maps to something called a code point

As Joel would explain it, every platonic letter in every alphabet is assigned a magic number by the Unicode consortium which is written like this: U+0639

This magic number is called a code point.

The U+ means “Unicode”
The numbers are hexadecimal.

The English letter A would be U+0041.

Why is it U+0041?

0041 in hex is 0 * (16^4)^4 + 0 * (16^3) + 4 * (16^2) + 1 * (16^1) = 0 + 0 + 64 + 1 = 65.
65 is represented as A.

Unicode

Unicode was a brave effort to create a single character set that included every reasonable writing system on the planet and some make-believe ones like Klingon, too.

Unicode comprises 1,114,112 code points in the range 0hex to 10FFFFhex. The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 216) code points. Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112.

10FFFF (hex)

1 0 15 15 15 15 convert each hex to digit 0-9, A(10) – F(15)

0001 0000 1111 1111 1111 1111 convert digits to binary

1114111 binary to decimal

we also have decimal 0, so total is 1114112 representations of characters

In fact, Unicode has a different way of thinking about characters, and you have to understand the Unicode way of thinking of things or nothing will make sense.

Until now, we’ve assumed that a letter maps to some bits which you can store on disk or in memory:

A -> 0100 0001

In Unicode, a letter maps to something called a code point which is still just a theoretical concept. How that code point is represented in memory or on disk is a whole nuther story.

In Unicode, the letter A is a platonic ideal. It’s just floating in heaven:

A

This platonic A is different than B, and different from a, but the same as A and A and A. The idea that A in a Times New Roman font is the same character as the A in a Helvetica font, but different from “a” in lower case, does not seem very controversial, but in some languages just figuring out what a letter is can cause controversy. Is the German letter ß a real letter or just a fancy way of writing ss? If a letter’s shape changes at the end of the word, is that a different letter? Hebrew says yes, Arabic says no. Anyway, the smart people at the Unicode consortium have been figuring this out for the last decade or so, accompanied by a great deal of highly political debate, and you don’t have to worry about it. They’ve figured it all out already.

Every platonic letter in every alphabet is assigned a magic number by the Unicode consortium which is written like this: U+0639. This magic number is called a code point. The U+ means “Unicode” and the numbers are hexadecimal. U+0639 is the Arabic letter Ain. The English letter A would be U+0041. You can find them all using the charmap utility on Windows 2000/XP or visiting the Unicode web site.

There is no real limit on the number of letters that Unicode can define and in fact they have gone beyond 65,536 so not every unicode letter can really be squeezed into two bytes, but that was a myth anyway.

OK, so say we have a string:

Hello

which, in Unicode, corresponds to these five code points:

U+0048 U+0065 U+006C U+006C U+006F.

Just a bunch of code points. Numbers, really. We haven’t yet said anything about how to store this in memory or represent it in an email message.

Encodings

That’s where encodings come in.

Number Code to Binary

A is given code point U+0041:

0041 in hex is 0 * (16^4)^4 + 0 * (16^3) + 4 * (16^2) + 1 * (16^1) = 0 + 0 + 64 + 1 = 65.
65 is represented as A.

Where code point is any of the numerical values that make up the code space.
Code space 65, is given the decimal value 65, and also the Unicode letter A. Hence, how to represent decimal 65 in binary?

Bi-nary means two.

Binary is represented by bits.
1 Bit binary represents 0 or 1. 2 ^ (1 bit) = 2 combinations
2 bits binary represents 00, 01, 10, 11, 4 combinations. 2 ^ (2 bits) = 4 combinations

Point codes represent text in computers, telecommunications equipment, and other devices.
It maps the character “A” to the number 65

How do we convert this to binary?

Division Way

(the proper way is to divide by 2, and use the remainder as bit)
http://www.electronics-tutorials.ws/binary/bin_2.html

65 / 2 = 32 R1 binary bits: [1]
32 / 2 = 16 R0 binary bits: [1 0 ]
16 / 2 = 8 R0 binary bits: [1 0 0]
8 / 2 = 4 R0 binary bits: [1 0 0 0]
4 / 2 = 2 R0 binary bits: [1 0 0 0 0]
2 / 2 = 1 R0 binary bits: [1 0 0 0 0 0]
1 / 2 = 0 R1 binary bits: [1 0 0 0 0 0 1]

So as you can see the bits are 1 0 0 0 0 0 1
So in 8 bit format we have, 0 1 0 0 0 0 1

That is the binary conversion from decimal point 65.

Layout visual way
First, we must lay out the binary and show what decimal it represents

The 0th bit is represented by 2 ^ 0 = 1
The 1st bit is represented by 2 ^ 1 = 2
The 2nd bit is represented by 2 ^ 2 = 4
The 3rd bit is represented by 2 ^ 3 = 8
The 4th bit is represented by 2 ^ 4 = 16
The 5th bit is represented by 2 ^ 5 = 32
The 6th bit is represented by 2 ^ 6 = 64
…so on.

8 bits
0 0 0 0 0 0 0 0

We lay them out and try to see which of these decimals add up to be 65. The trick is to get the largest number where its less than or equal to 65. In our case, its the 6th bit (64). Thus, at the the 6th bit we mark it 1.

0 1 0 0 0 0 0 0

65 – 64 = 1.

Then we try to find a bit where its less than or equal to 1.

Obviously, it would be perfect at 0th bit.

0 1 0 0 0 0 0 1

64 + 1 = 65 Thus it matches the 65 ASCII code.

65_to_binary_1

We see that at the 2^(6th bit) is 64. Then all we need is 1, where it is at 2^0.
Hence at the binary, we need to mark 1 for the 6th bit and the 0th bit in order to represent that we need the decimal value it represents:
+ 0 * (2 ^ 4)
0 * (2^7) + 1 * (2 ^ 6) + 0 * (2 ^ 5) + 0 * (2 ^ 4) + 0 * (2 ^ 3) + 0 * (2 ^ 2) + 0 * (2 ^ 1) + 1 * (2 ^ 0)

Make sure you use only 0 or 1 because we are dealing with binary.

65_to_binary_2

Finally, at the binary level, simply write the binary needed to represent the numbers:

01000010

65_to_binary_3

2) Binary to Hex

First, a hex has 16 code points: 0-15

where 0 – 9 takes on the numbers 0 – 9

and 10 – 15 takes on A, B, C, D, E, F

Thus a hex bit has 15 combinations of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F

This is represented by 4 binary bits because 2^4 = 16

Hence we need to break our binary bits into hex, or literally into 4 bit binary

Thus 010000010 becomes 0100 0010, where each 4 bit binary represents 1 hex
Let’s convert it to hex

0100 = 0 * (2^3) + 1 * (2^2) + 0 * (2^1) + 0 * (2^0) = 4 hex
0001 = 0 * (2^3) + 0 * (2^2) + 0 * (2^1) + 1 * (2^0) = 1 hex

(if you get 1111 = 16 -> F)

Thus, ASCII character ‘A’, is 65 code point, binary 01000010, or hex 0x41.

And if we are to check in the unicode table, We see that indeed A is mapped to decimal 65, with a hex of 0x41 for UTF8

and a hex of 0x0041 for UTF 16.

https://unicode-table.com/en/

unicode-a

UTF 8

A character in UTF8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the unicode standard. UTF-8 is backwards compatible with ASCII. UTF-9 is preferred encoding for e-mail and web pages.

The first 128 characters of Unicode (which correspond one-to-one with ASCII) are encoded using a single octet with the same binary value as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well.

Unicode is a character set. UTF-8 is encoding

UTF-8 is defined to encode code points (any of the numerical values that make up the code space that contains a symbol) in one to four bytes.

For example, the first 128 code points are given to ASCII symbols

0 -> 0000
0 -> 0000

7 -> 0111
f -> 1111

0000 0000 (0)
to
0111 1111 (128)

The next 1,920 characters covers the remainder of almost all
Latin alphabets, Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac, Thaana, and N’Ko Combining Diacritical Marks.

0111 1111 (128)
1920 code points for Arabic, Hebrew, most European scripts (most notably excluding Georgian)
[110]11111 [10]111111 (2048)

(2^10) * 1 + (2^9) * 1 + (2^8) * 1 + (2^7) * 1 + (2^6) * 1 + (2^5) * 1 + (2^4) * 1 + (2^3) * 1 + (2^2) * 1 + (2^1) * 1 (2^0) * 1 = 2048

where [110] means 2 bytes total
[10] means to continue

we can continue storing other languages from 2048 and on by using the starting bits to represent how many bytes total,
and for each next byte, [10] to mean it will continue:

i.e If it was to use 3 bytes, it uses 1110 to represent it will use 3 bytes total. For each sequential byte, it would use
10 means to continue in the later bytes
[1110] xxxx [10]xxxxxx [10]xxxxxx

[110]11111 [10]111111 (2048)
for BMP
[1110]xxxx [10]xxxxxx [10]xxxxxx (???)

(2^15) * 1

Therefore,

1st byte: Standard ASCII
2nd bytes: Arabic, Hebrew, most European scripts (most notably excluding Georgian)
3rd bytes: BMP
4th bytes: All Unicode characters

UTF 16

Binary Data and Base64

Binary Data – any data represented in binary form. It is a sequence of bytes. Each byte is 8 bits.

The term “binary file” is often used as a term meaning “non-text file”.

Binary files typically contain bytes that are intended to be interpreted as something other than text characters.

* Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation.

* Base64 is a way to encode binary data into a character set known to pretty much every computer system, in order to transmit the data without loss or modification of the contents itself.

When you have some binary data that you want to ship across a network, you generally don’t do it by just streaming the bits and bytes over the wire in a raw format.

Why?

…because some media are made for streaming text. You never know — some protocols may interpret your binary data as control characters (like a modem), or your binary data could be screwed up because the underlying protocol might think that you’ve entered a special character combination (like how FTP translates line endings).

So to get around this, people encode the binary data into characters. Base64 is one of these types of encodings. Why 64? Because you can generally rely on the same 64 characters being present in many character sets, and you can be reasonably confident that your data’s going to end up on the other side of the wire uncorrupted.

* Some transportation protocols only allow alphanumerical characters to be transmitted. Just imagine a situation where control characters are used to trigger special actions and/or that only supports a limited bit width per character. Base64 transforms any input into an encoding that only uses alphanumeric characters, +, / and the = as a padding character.

Text content

M a n

ASCII

‘M’ 77 (0x4d) ‘a’ 97 (0x61) ‘n’ 110 (0x6e)

Bit pattern

‘M’, 77 = 64 + 0 + 0 + 8 + 4 + 0 + 1
01001101

‘a’, 97 = 64 + 32 + 0 + 0 + 0 + 0 + 1
01100001

‘n’ 110 = 64 + 32 + 0 + 8 + 4 + 2 + 0
01101110

Base 64 means that groups of 6 bits (6 bits have a maximum of 2^6 = 64 different binary values) are converted into individual numbers from left to right (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values:

01001101 01100001 01101110

becomes

010011 010110 000101 101110
where
010011 is 19
010110 is 22
000101 is 5
101110 is 46

The base 64’s table (as opposed to ASCII table),
converts 0-25 as ‘A’ – ‘Z’,
26-51 as ‘a’ – ‘z’,
and 52-61 as 0 – 9

Hence
Base64-encoded

010011 (19) is T
010110 (22) is W
000101 (5) is F
101110 (46) is u