diff --git a/_posts/2021-12-01-dicom-file-format-basics.md b/_posts/2021-12-01-dicom-file-format-basics.md
deleted file mode 100644
index 5856b63..0000000
--- a/_posts/2021-12-01-dicom-file-format-basics.md
+++ /dev/null
@@ -1,496 +0,0 @@
----
-title: "DICOM File Format Basics"
-page_title: "DICOM File Format Basics"
-excerpt: "Explore the fundamentals of the DICOM file format! This quick
-introduction covers the basics of DICOM's structure, its essential uses, and
-tips for easily navigating its complex and abstract components."
-date: December 1, 2021
-toc: true
-toc_label: "Content"
-toc_sticky: true
-last_modified_at: December 1, 2021
-og_image: /assets/images/posts/dicom-basics/dicom-basics-header.jpg
----
-
-{% include image.html
- src="/assets/images/posts/dicom-basics/dicom-basics-header.jpg"
- alt="dicom-basics-header"
- caption="Image Source"
-%}
-
-I would like to thank my former colleagues for introducing me to the DICOM world.
-It's been a pleasure working with them and tackling many problems that come up
-when one tries to understand all ins and outs of such a complex standard as DICOM.
-
-> **Disclaimer**: Everything presented here is part of public knowledge and can
-be found in the resource section.
-
-In this post I'll try to present the basics of DICOM in plain language and
-provide resources that helped me along the way. The DICOM Standard can be
-overwhelming at first but with a nice introduction and basics the learning curve
-can be flattened, which is my goal.
-
-I personally know that this post will be of interest to certain people and I'd
-like to encourage them to contribute and share their knowledge.
-
-## Introduction
-
-For some odd reason you stumbled upon the DICOM Standard. Maybe at work, in a
-conversation or somewhere else but you decided to google it and found out that
-it means: Digital Imaging and Communications in Medicine.
-
-On Wikipedia page you've read:
-
-> Digital Imaging and Communications in Medicine (DICOM) is the standard for
-the communication and management of medical imaging information and related
-data. DICOM is most commonly used for storing and transmitting medical images
-enabling the integration of medical imaging devices such as scanners, servers,
-workstations, printers, network hardware, and picture archiving and
-communication systems (PACS) from multiple manufacturers.
-
-And of course:
-
-> The standard includes a file format definition and a network communications
-protocol that uses TCP/IP to communicate between systems.
-
-Now you have a pretty good idea that the DICOM Standard defines communication
-protocol between different medical imaging devices and a file format.
-
-But how is it structured and how it works?
-Since DICOM is enormous, where should you start?
-
-In my opinion, the best starting point is to understand the file format and how
-it is used to represent medical images among other things. So, let's look at
-some basic building block of DICOM file format.
-
-## DICOM file format
-
-DICOM file can be used to represent many things: Single/Multi frame Images,
-Structured Reports, Encapsulated PDF storage, Videos etc.
-
-For now we will focus only on images.
-
-To open DICOM files you can use: [MicroDicom](https://www.microdicom.com/){:target="_blank"},
-it's a free DICOM viewer for Windows or if you prefer online viewer with
-some sample files see: [DICOM Library](https://www.dicomlibrary.com/){:target="_blank"}.
-
-When you open a file you'll see something like this:
-
-
-{% include image.html
- src="/assets/images/posts/dicom-basics/microdicom-preview.jpg"
- alt="microdicom-preview"
- caption="MicroDicom Preview Source"
-%}
-
-At the most basic level you can look at a DICOM file as a file that contains an
-image and information about the image: how, when, and where it's been created,
-who it belongs to, device used for its creation etc, and of course the image
-itself. However, just like any other file format (PDF, Excel, Word ...) it has a
-complex internal structure used for storing that information.
-
-### Structure
-
-DICOM file is comprised of a Header and a Data Set:
-
-{% include image.html
- src="/assets/images/posts/dicom-basics/dicom-file-structure.png"
- alt="dicom-file-structure"
- caption="DICOM File Structure Source"
-%}
-
-- Header, also known as DICOM File Meta Information, includes a preamble,
-followed by 128 byte File Preamble, followed by 4 byte DICOM prefix, followed
-by the File Meta Elements which include elements such as the TransferSyntaxUID
-(which is very important for understanding the file format).
-
-- Data Set is a collection of Data Elements.
-
-
-### DICOM Element
-
-If you look closely at the [MicroDicom preview image](#microdicom-preview)
-from above, you'll see a list on the right hand side.
-That list shows DICOM attributes.
-
-DICOM attribute (or Data Element) is a unit for storing information and it has
-a well predefined tag and purpose defined in the DICOM Standard (we will see
-later different types of tags that are not defined). For example the list can
-contain the following attributes:
-
-Tag | Tag Description | Value |
------------ | --------------------- | ------------------------- |
-(0002,0010) | Transfer Syntax UID | 1.2.840.10008.1.2.4.91 |
-(0008,0008) | Image Type | ORIGINAL, PRIMARY |
-(0008,0016) | SOP Class UID | 1.2.840.10008.5.1.4.1.1.2 |
-(0008,0060) | Modality | CT |
-(0010,0010) | Patient's Name | VladSiv |
-(0010,0020) | Patient's ID | 0123456789 |
-(0028,0100) | Bits Allocated | 16 |
-(0028,0101) | Bits Stored | 12 |
-... | ... | ... |
------------ | --------------------- | ------------------------- |
-
-> If you are interested, you can explore more attributes by browsing:
-[Registry of DICOM Data Elements](https://dicom.nema.org/medical/Dicom/current/output/chtml/part06/chapter_6.html){:target="_blank"}
-
-Attributes are composed of, at least, three fields:
-
-- **Tag** - identifies the element
-- **Value Length** (VL) - defines the length of the attribute's value
-- **Value Field** (VF) - contains the attribute's data
-
-And for some types of Transfer Syntaxes (we will see later what they are), there
-is another field:
-
-- **Value Representation** (VR) - describes the data type and format of the
-attribute's value
-
-Visually we can represent that as:
-
-{% include image.html
- src="/assets/images/posts/dicom-basics/dicom-element.svg"
- alt="dicom-element"
- caption="DICOM Element Source"
-%}
-
-#### Tags
-
-Every DICOM element has a Tag that uniquely indetifies the element and is
-represented as: `(gggg,eeee)`, where `gggg` represents the Group Number and
-`eeee` the Element Number.
-
-Group and Element numbers are 16-bit unsigned integers and are represented in
-hexadecimal notation.
-
-A Group is a collection of elements that are somehow related,
-for example:
-
-| Tag | Tag Description |
-| ------------- | --------------------- |
-| (0010,0010) | Patient's Name |
-| (0010,0020) | Patient ID |
-| (0010,0021) | Issuer of Patient ID |
-| (0010,0022) | Type of Patient ID |
-| (0010,0030) | Patient's Birth Date |
-| (0010,0040) | Patient's Sex |
-| ... | ... |
-| ------------- | --------------------- |
-
-
-> Actually, a Group belongs to an attribute of an abstract concept called
-**Information Object Definition** (IOD) and the Patient group to
-Patient **Module** - I'll explain how it works later.
-
-#### Value Length
-
-Depending on VR, VL can be of defined or undefined lenght. If it's defined then
-it's a 16 or 32-bit unsigned integer containing the explicit length of the
-Value Field as the number of bytes that make up the Value.
-
-#### Value Field
-
-Represents an even number of bytes containing the Value of the Data Element.
-It's obvious that the data type of the stored Value depends on VR as explained
-above.
-
-However, VF can contain multiple values and that's defined by
-**Value Multiplicity** (VM). For data elements that are defined in the Standard, each
-element has a defined VM and if it is greater than 1, multiple values are delimited
-within the VF.
-
-> To see VMs of tags defined in the DICOM Standard, please see:
-[Registry of DICOM Data Elements](https://dicom.nema.org/medical/Dicom/current/output/chtml/part06/chapter_6.html){:target="_blank"}
-
-#### Value Representation
-
-VR is really important as it defines how the VF will be interpreted. The most
-important thing to remember is that VR can be:
-
-- Explicit - Is contained in Data Element
-- Implicit - Is missing from Data Element
-
-You may be asking now, if it's missing how do we know how to interpret the VF?
-
-Well, it's defined in the Standard, and when you get the Tag `(gggg,eeee)` you
-know what to expect, that's why it's implicit. If something is not right, and
-for example your application cannot parse the data element, then the data
-element is not encoded in accordance with the Standard.
-
-If it's present in the data element. It contains two byte characters which are
-always encoded using upper case letters.
-
-The list of possible VRs is quite extensive and details about encoding, character
-repertoire, and length of value can be found in the Standard:
-[Value Representation](https://dicom.nema.org/medical/dicom/current/output/html/part05.html#sect_6.2){:target="_blank"}
-
-To mention some of them:
-
-| VR | Name |
-| ----- | ----------------- |
-| CS | Code String |
-| DS | Decimal String |
-| DT | Date Time |
-| LO | Long String |
-| LT | Long Text |
-| OB | Other Byte |
-| PN | Person Name |
-| SH | Short String |
-| SQ | Sequence of Items |
-| ... | ... |
-| ----- | ----------------- |
-
-#### Public and Private
-
-The DICOM Standard allows private data elements which don't belong to Standard
-Data Elements i.e. are not defined in the Standard. This allows different
-implementations of communication of information. For example, private elements
-can be used by different machine manufacturers to specify element where the
-proprietary data is stored.
-
-However, private data elements have to have the same structure as Standard
-Data Elements i.e. Tag, VL, VF, VR, VM etc.
-
-How do we distinguish between private and public elements?
-
-It's easy, public elements have an even Group number, while private groups
-have odd numbers.
-
-> Note: Elements with Tags `(0001,xxxx)`, `(0003,xxxx)`, `(0005,xxxx)`,
-`(0007,xxxx)`, and `(FFFF,xxxx)` cannot be used. To learn more about
-implemention of private tags please see: [Private Data Element Tags](https://dicom.nema.org/medical/dicom/current/output/html/part05.html#sect_7.8.1){:target="_blank"}
-
-#### Type of elements
-
-DICOM attributes may be required in a Data Set (depending on IOD or a SOP
-class, we will define them later).
-
-Some of the attributes are mandatory, others are mandatory under certain
-conditions and of course, some are completely optional.
-
-There are 5 types:
-
-- **Type 1**: Required data element, it cannot be empty. Absence of a value of
-a Type 1 data element is a protocol violation
-- **Type 1C**: Same as Type 1 but is required under certain conditions
-- **Type 2**: Required data element, but it can be empty if the value is
-unknown. For example, think of the Patient's name, it is a required element but the
-actual name i.e. value can be unknown at the moment of performing a scan.
-- **Type 2C**: Same as Type 2 but is required under certain conditions
-- **Type 3**: Optional tags
-
-#### Nested Tags
-
-You can define nested tags in a DICOM Data Set, this is done using **Sequence
-of Items** (SQ) VR as mentioned above. This allows you to define a tag that
-has a sequence of items, where each item contains a set of Data Elements.
-
-For example, tag: `(0010,1002)` - _Other Patient IDs Sequence_; can contain many
-items that represent Patient ID Data Set. If you are familiar with JSON
-objects, you can look at it like:
-
-```json
-{
- "Other Patient IDs Sequence": [
- {
- "PatientID": "id",
- "Issuer of Patient ID": "issuer",
- "Type of Patient ID": "type",
- "Issuer Of Patient ID Qualifiers Sequence": {
- "Universal Entity ID": "uni",
- "Universal Entity ID Type": "uni_type",
- ...
- }
- },
- {...},
- {...}
- ]
-}
-```
-
-To be more precise about the structure of these items and nested data set, we
-could depict it:
-
-{% include image.html
- src="/assets/images/posts/dicom-basics/dicom-basics-sq.jpg"
- alt="dicom-sq"
- caption="Sequence structure, Source"
-%}
-
-If you would like to know more about encoding of nested data sets, please see:
-[Nesting of Data Sets](https://dicom.nema.org/medical/dicom/current/output/html/part05.html#sect_7.5){:target="_blank"}
-
-### Transfer Syntax
-
-We now know that we can have different types of tags and value representations,
-also that they can be implicit or explicit. When dealing with objects in general
-we have to store them somehow and send them to different applications. Basically,
-everyone should know how to read and use the object. To put it more precisely,
-everyone should be able to serialize and deserialize a DICOM object.
-
-Transfer Syntax does exactly that, it tells you how to read a DICOM object. It
-defines three things:
-
-- Explicit/Implicit VR - If VRs are present in a Data Element or not
-- Big/Little Endian - Byte ordering, see: [Endianness](https://en.wikipedia.org/wiki/Endianness){:target="_blank"}
-- Native/Encapsulated Pixel Data - If pixel data is compressed and what
-compression algorithm is used
-
-However, the Transfer Syntax applies only to the Data Set part of a DICOM file,
-while the File Meta Information has always the same encoding. To
-quote the DICOM Standard:
-
-> Except for the 128 byte preamble and the 4 byte prefix, the File Meta
-Information shall be encoded using the Explicit VR Little Endian Transfer
-Syntax as defined in DICOM PS3.5. The Unknown (UN) Value Representation shall
-not be used in the File Meta Information.
-Ref: [DICOM File Meta Information](https://dicom.nema.org/medical/dicom/current/output/html/part10.html#sect_7.1){:target="_blank"}
-
-So, to read a DICOM file, you have to:
-
-- Skip preamble (Why? See: [Preamble Hack](#preamble-hack))
-- Confirm that it's indeed a DICOM file, by reading bytes 128-131 which should
-be "DICM" i.e. DICOM prefix (don't rely on extenstions, it could be anything and
-is not specified in the Standard)
-- Start parsing all `0002` tags with Explicit VR Little Endian
-- Get `(0002,0010)` - TransferSyntaxUID and use it to parse the Data Set
-
-Transfer Syntaxes are defines with UIDs:
-
-| Transfer Syntax Name | Transfer Syntax UID |
-| ----------------------------------------------------------------------------------------------------- | ----------------------------- |
-| Implicit VR Endian: Default
Transfer Syntax for DICOM | 1.2.840.10008.1.2 |
-| Explicit VR Little Endian | 1.2.840.10008.1.2.1 |
-| Deflated Explicit VR Little Endian | 1.2.840.10008.1.2.1.99 |
-| JPEG Baseline (Process 1):
Default Transfer Syntax for Lossy JPEG
8-bit Image Compression | 1.2.840.10008.1.2.4.50 |
-| JPEG-LS Lossless Image Compression | 1.2.840.10008.1.2.4.80 |
-| ... | ... |
-
-To get the full list of available Transfer Syntax UIDs, please see:
-[Registry of DICOM Unique Identifiers](https://dicom.nema.org/medical/dicom/current/output/chtml/part06/chapter_A.html){:target="_blank"}
-
-### SOP Class
-
-When it comes to information objects in the DICOM Standard, there is a lot of
-abstract definitions, and I just mentioned some of them like _Information
-Object Definition_ (IOD) and _Module_.
-
-These topic go beyond the basics of the file format but I'll try to give you
-some really rough guidelines on how it works.
-
-If you are familiar with _Object Oriented Programming_ (OOP) you already know
-that we try to model information in object-oriented abstract data models that
-are used to specify information about the real world objects. In DICOM this
-class is represented as IOD. We can use this class as a template and
-instantiate it with attributes and that gives us a particular Data Set.
-
-Attributes of an IOD describe a property of a real world object and are
-grouped into _Information Entities_ or _Modules_, depending if IOD is
-_Normalized_ or _Composite_.
-
-If you are new to DICOM Standard, you may be asking: What in hell are you
-talking about?
-And I get your point, I like thinking about abstract concepts
-but to really grasp them, they need to be introduced in an undestandable way
-that showcases a real world application.
-
-If you open: [Dicom Standard Browser](https://dicom.innolitics.com/ciods){:target="_blank"}.
-You'll see a list of Composite IODs (CIOD), and there are many of them, such as:
-CR Image, CT Image, MR Image, US Image, Encapsulated PDF, etc. When you open
-one of them you'll see Modules, which can be _Mandatory_ (M) or _User Optional_
-(U). These CIOD actually represent a template which is instantiated from an
-abstact class, which we mentioned.
-
-All DICOM objects have to include a SOP Common Module, likewise if DICOM
-object represents an image, it should include an Image Module. Other main modules
-are: Patient, Study and Series. Additionally, there are specific modules, if
-we define a DICOM object as a CR Image, we must include a CR Image Module and so on...
-
-**Service-Object Pair Class** (SOP Class) - contains the rules and semantics
-that may restrict the use of the service in the **DICOM Message Service Element**
-(DIMSE) Service Group and/or the attributes of the IOD. This basically means that
-by defining a SOP Class of a DICOM object it specifies the mandatory and
-optional modules of an IOD.
-
-> DIMSE is connected to the part of DICOM Standard that deals with the protocol.
-This is not in the scope of this basic introduction, but if you are interested
-check out: [Dicom Part 7 - Message Exchange](https://dicom.nema.org/medical/dicom/current/output/html/part07.html){:target="_blank"}
-
-SOP Class is defined by SOPClassUID and is always present in a DICOM file in the
-(0008,0016) Data Element. Let's look at some examples:
-
-| SOP name | SOPClassUID |
-| ----------------------------------------- | --------------------------------- |
-| CR Image Storage | 1.2.840.10008.5.1.4.1.1.1 |
-| CT Image Storage | 1.2.840.10008.5.1.4.1.1.2 |
-| NM Image Storage | 1.2.840.10008.5.1.4.1.1.20 |
-| MR Image Storage | 1.2.840.10008.5.1.4.1.1.4 |
-| Encapsulated PDF Storage | 1.2.840.10008.5.1.4.1.1.104.1 |
-| Ultrasound Image Storage | 1.2.840.10008.5.1.4.1.1.6.1 |
-| Video Photographic Image Storage | 1.2.840.10008.5.1.4.1.1.77.1.4.1 |
-| ... | ... |
-
-If you are interested in more SOP Class UIDs, please see:
-[Registry of DICOM Unique Identifiers](https://dicom.nema.org/medical/dicom/current/output/chtml/part06/chapter_A.html){:target="_blank"}
-
-Now that we understand the SOP Class, it's important to understand that when
-we create a DICOM file, this file is an instance of the SOP Class. That's why
-we have a Data Element: **SOPInstanceUID** which is globally unique in a DICOM
-file.
-
-Of course, there are other instances, for example: **StudyInstanceUID** -
-uniquely identifies a study, which can contain many series that are identified
-using **SeriesInstanceUID** and so on...
-
-### Preamble Hack
-
-In 2019, a new hack surfaced and it used DICOM files to enable malware to
-infect patient data. It used the Preamble to insert itself into DICOM files and
-exploit flaws in the design of DICOM.
-
-We know that the Preamble is a part of the Header and that it represents a 128-byte
-section. The purpose of this section is to allow applications to use it for
-specific implementations. For example, it could be used to contain information
-enabling a multi-media applications to randomly access images stored in a DICOM
-Data Set or any other specific implementation.
-
-The important thing here is that the DICOM Standard does not require any
-structure and does not impose any requirements on the data inserted into the
-Preamble. Basically we can insert whatever we want. This allows hackers to
-masquerade an executable file as a DICOM image which will trigger an execution
-of the malware.
-
-If you'd like to find out more about this hack, please read:
-[HIPAA-Protected Malware? Exploiting DICOM Flaw to Embed Malware in CT/MRI Imagery](https://researchcylera.wpcomstaging.com/2019/04/16/pe-dicom-medical-malware/){:target="_blank"}
-
-## Final Words
-
-If you are just starting out with the DICOM Standard it can be daunting when
-you look at the
-[DICOM Standard](https://www.dicomstandard.org/current){:target="_blank"},
-22 parts where, for example, part 3 has 1802 pages in PDF format.
-Just skimming through the whole Standard can take a while :D
-
-The aim of this article is to introduce the basic concepts, related to the
-file format, in a clear and concise manner. Grasping the basics
-will give you a good starting point for exploring the whole Standard.
-
-I hope that this article gets you interested in the DICOM world and
-encourages you to dive deeper and research referenced material.
-
-If you have any questions or suggestions, please reach out, I'm always
-available.
-
-## Resources
-
-- [DICOM Standard](https://www.dicomstandard.org/current){:target="_blank"} - Online DICOM
-Standard, Current Edition
-- [DICOM Standard Browser](https://dicom.innolitics.com/ciods){:target="_blank"} - Standalone
-website that offers quick and really pleasent overview of DICOM attributes
-- [DICOM is Easy](https://dicomiseasy.blogspot.com/){:target="_blank"} - Great blog about
-sofware programming for medical applications
-- DICOM sample files:
- - [RuboMedical](https://www.rubomedical.com/dicom_files/){:target="_blank"}
- - [Osirix Dicom Viewer](https://www.osirix-viewer.com/resources/dicom-image-library/){:target="_blank"}
diff --git a/_posts/2022-01-01-dicom-file-processing.md b/_posts/2022-01-01-dicom-file-processing.md
deleted file mode 100644
index e7291c8..0000000
--- a/_posts/2022-01-01-dicom-file-processing.md
+++ /dev/null
@@ -1,769 +0,0 @@
----
-title: "DICOM File Processing"
-page_title: "DICOM File Processing"
-excerpt: "Discover how to handle and process DICOM files. Explore popular free
-and open-source libraries that can help you develop applications for efficient
-DICOM processing. These tools and libraries make managing medical images much
-easier and straightforward."
-date: January 1, 2022
-toc: true
-toc_label: "Content"
-toc_sticky: true
-last_modified_at: January 1, 2022
-og_image: /assets/images/posts/dicom-playground/dicom-playground.jpg
----
-
-{% include image.html
- src="/assets/images/posts/dicom-playground/dicom-playground.jpg"
- alt="dicom-basics-header"
- caption="Image Source"
-%}
-
-We'll explore some open source libraries in different programming languages and
-how you can use them to process DICOM files. We'll cover the basics of removing,
-modifying, adding data elements, changing compression, creating DICOM
-files for testing, and masking parts of images for de-identification purposes.
-
-If you are new to the DICOM Standard and you are not sure how the DICOM file
-format works, please read
-[DICOM Basics]({% post_url 2021-12-01-dicom-file-format-basics %}) first.
-
-> **Disclaimer**: Everything presented here is part of public knowledge and can
-be found in referenced material.
-
-## Open Source Libraries
-
-The following are some of the popular open source libraries for processing DICOM
-files:
-
-- [PixelMed](https://www.pixelmed.com/){:target="_blank"} - Java DICOM Toolkit
-which is a stand-alone DICOM toolkit that implements code for
-reading and creating DICOM data, DICOM network and file support, support for
-display of images, reports, and much much more
-- [pydicom](https://github.com/pydicom/pydicom){:target="_blank"} - Pure Python
-package for working with DICOM files. It lets you read, modify and write
-DICOM data in an easy "pythonic" way.
-- [Grassroots DICOM (GDCM)](https://sourceforge.net/projects/gdcm/){:target="_blank"} -
-Cross-platform library written in C++ for DICOM medical files. It is
-automatically wrapped to Python/C#/Java and PHP which allows you to use the
-language you are familiar with and to integrate it with other applications.
-- [dicomParser](https://github.com/cornerstonejs/dicomParser){:target="_blank"} -
-Lightweight library for parsing DICOM in modern HTML5 based web browsers.
-dicomParser is fast, easy to use and has no required external dependencies.
-- [DCMTK](https://dicom.offis.de/dcmtk.php.en){:target="_blank"} - is a
-collection of libraries and applications implementing large parts the DICOM
-standard. It includes software for examining, constructing and converting
-DICOM image files, handling offline media, sending and receiving images over
-a network connection, as well as demonstrative image storage and worklist
-servers. DCMTK is written in a mixture of ANSI C and C++.
-
-The choice depends on your use case and, of course, on the language you are
-comfortable with.
-
-These libraries implement most of the things from the DICOM Standard and there
-is no way I can cover all of them in this article. Therefore, we will focus only
-on the PixelMed, pydicom, some of GDCM tools, and switch between them to implement
-different functionalities.
-
-## Setup
-
-### PixelMed
-
-PixelMed is written in Java, so you'll need Java 1.8 or higher. After you
-install Java, go to [PixelMed Directory Tree](https://www.dclunie.com/pixelmed/software/index.html){:target="_blank"},
-select the current edition, and download the `pixelmed.jar` (or use Maven/Gradle
-if you are familiar with them).
-
-If you are not sure how to use this jar, I suggest you download
-[Eclipse IDE](https://www.eclipse.org/downloads/){:target="_blank"},
-create a Java project and add `pixelmed.jar` to the project, see:
-[How to import a jar in Eclipse](https://stackoverflow.com/questions/3280353/how-to-import-a-jar-in-eclipse){:target="_blank"}.
-
-PixelMed's API documentation can be found at [PixelMed JavaDocs](https://www.dclunie.com/pixelmed/software/javadoc/index.html){:target="_blank"}.
-
-### GDCM
-
-Source code: [Grassroots DICOM](https://sourceforge.net/projects/gdcm/){:target="_blank"}.
-
-There are premade tools which you can use:
-- **gdcmdump** - dumps a DICOM file, it will display the structure and values
-contained in the specified DICOM file.
-- **gdcmanon** - tool to anonymize a DICOM file.
-- **gdcmdiff** - dumps differences of two DICOM files
-- **gdcmconv** - tool to convert DICOM to DICOM etc
-
-To use them you can either compile the source or download the binaries from
-[GDCM Releases](https://github.com/malaterre/GDCM/releases){:target="_blank"}.
-
-> If you are on Linux, you'll have to add the gdcm `lib` folder to
-`/etc/ld.so.conf` file or create new `.conf` in folder `.d`. Then
-run `sudo ldconfig`. In order to have gdcm applications available in the
-terminal as commands you'll have to add `bin` to `$PATH`
-(globally or in `~/.bash_profile`).
-
-### pydicom
-
-This library requires python >= 3.6.1. I suggest you set up a virtual
-environment where you can install packages using a dependency manager ([poetry](https://python-poetry.org/){:target="_blank"},
-[pipenv](https://pipenv.pypa.io/en/latest/){:target="_blank"},
-or any other).
-
-To get familiar with the library, please see:
-[pydicom documentation](https://pydicom.github.io/pydicom/stable/){:target="_blank"}.
-
-## DICOM
-
-### Exploring Structure
-
-#### GDCM
-
-To explore the structure of a DICOM file, we can use `gdcmdump`:
-
-```bash
-gdcmdump
-```
-
-This will give us an output like:
-
-
-
-```text
-# Dicom-File-Format
-
-# Dicom-Meta-Information-Header
-# Used TransferSyntax:
-(0002,0000) UL 194 # 4,1 File Meta Information Group Length
-(0002,0001) OB 00\01 # 2,1 File Meta Information Version
-(0002,0002) UI [1.2.840.10008.5.1.4.1.1.12.2] # 28,1 Media Storage SOP Class UID
-(0002,0003) UI [1.3.6.1.4.1.5962.1.1.0.0.0.1168612284.20369.0.3] # 48,1 Media Storage SOP Instance UID
-(0002,0010) UI [1.2.840.10008.1.2.1] # 20,1 Transfer Syntax UID
-(0002,0012) UI [1.3.6.1.4.1.5962.2] # 18,1 Implementation Class UID
-(0002,0013) SH [DCTOOL100 ] # 10,1 Implementation Version Name
-(0002,0016) AE [CLUNIE1 ] # 8,1 Source Application Entity Title
-
-# Dicom-Data-Set
-# Used TransferSyntax: 1.2.840.10008.1.2.1
-(0008,0005) CS [ISO_IR 100] # 10,1-n Specific Character Set
-(0008,0008) CS [ORIGINAL\PRIMARY\SINGLE PLANE ] # 30,2-n Image Type
-(0008,0012) DA [20070112] # 8,1 Instance Creation Date
-(0008,0013) TM [093126] # 6,1 Instance Creation Time
-(0008,0014) UI [1.3.6.1.4.1.5962.3] # 18,1 Instance Creator UID
-(0008,0016) UI [1.2.840.10008.5.1.4.1.1.12.2] # 28,1 SOP Class UID
-(0008,0018) UI [1.3.6.1.4.1.5962.1.1.0.0.0.1168612284.20369.0.3] # 48,1 SOP Instance UID
-...
-```
-
-This tool offers a quick and easy way to explore DICOM files and debug
-applications that process DICOM files.
-
-Looking at the output we can see the _Dicom-Meta-Information-Header_ and
-_Dicom-Data-Set_. Additionally, for each Data Element, we have a Tag
-`(gggg,eeee)` then a VR, followed by a Value, and the information after `#`
-represents: VL, VM, and Tag Name.
-
-`gdcmdump` comes with a lot of options that you can use to generate an output,
-for more information please see: [gdcmdump](http://gdcm.sourceforge.net/html/gdcmdump.html){:target="_blank"}
-
-#### PixelMed
-
-The `AttributeList` class is a class in the PixelMed that maintains a list of
-individual DICOM attributes. It could be used to get the structure of a file,
-modify it, and save it.
-
-Using `.read(java.lang.String name)` we can read the tags:
-
-```java
-package pixelmed_demo;
-
-import java.io.IOException;
-import com.pixelmed.dicom.AttributeList;
-import com.pixelmed.dicom.DicomException;
-
-public class demo_main {
-
- public static void main(String[] args) {
- String filepath = args[0];
- AttributeList attList = new AttributeList();
- try {
- attList.read(filepath);
- System.out.println(attList.toString());
- } catch (IOException | DicomException e) {
- System.out.println("Oops! Error: " + e.getMessage());
- }
- }
-}
-```
-
-Passing the same DICOM file we used in `gdcmdump` example as an argument,
-we get:
-
-```
-(0x0002,0x0000) FileMetaInformationGroupLength VR= VL=<0x4> [0xc2]
-(0x0002,0x0001) FileMetaInformationVersion VR= VL=<0x2> []
-(0x0002,0x0002) MediaStorageSOPClassUID VR= VL=<0x1c> <1.2.840.10008.5.1.4.1.1.12.2>
-(0x0002,0x0003) MediaStorageSOPInstanceUID VR= VL=<0x30> <1.3.6.1.4.1.5962.1.1.0.0.0.1168612284.20369.0.3>
-(0x0002,0x0010) TransferSyntaxUID VR= VL=<0x14> <1.2.840.10008.1.2.1>
-(0x0002,0x0012) ImplementationClassUID VR= VL=<0x12> <1.3.6.1.4.1.5962.2>
-(0x0002,0x0013) ImplementationVersionName VR= VL=<0xa>
-(0x0002,0x0016) SourceApplicationEntityTitle VR= VL=<0x8>
-(0x0008,0x0005) SpecificCharacterSet VR= VL=<0xa>
-(0x0008,0x0008) ImageType VR= VL=<0x1e>
-(0x0008,0x0012) InstanceCreationDate VR= VL=<0x8> <20070112>
-(0x0008,0x0013) InstanceCreationTime VR= VL=<0x6> <093126>
-(0x0008,0x0014) InstanceCreatorUID VR= VL=<0x12> <1.3.6.1.4.1.5962.3>
-(0x0008,0x0016) SOPClassUID VR= VL=<0x1c> <1.2.840.10008.5.1.4.1.1.12.2>
-(0x0008,0x0018) SOPInstanceUID VR= VL=<0x30> <1.3.6.1.4.1.5962.1.1.0.0.0.1168612284.20369.0.3>
-...
-```
-
-which gives the same output as `gdcmdump` but in a different format.
-
-### Remove Data Element
-
-Let's try to remove the _InstanceCreatorUID_ `(0008,0014)` from the Data Set and
-save the DICOM object as a new file.
-
-#### PixelMed
-
-The `AttributeList` has a lot of options for manipulating Data Sets,
-we can remove a whole group, all private tags, specific tag etc.
-
-```java
-package pixelmed_demo;
-
-import java.io.IOException;
-import com.pixelmed.dicom.AttributeList;
-import com.pixelmed.dicom.AttributeTag;
-import com.pixelmed.dicom.Attribute;
-import com.pixelmed.dicom.DicomException;
-
-public class demo_main {
-
- public static void main(String[] args) {
- String filepath = args[0];
- AttributeList attList = new AttributeList();
- AttributeTag instanceCreatorTag = new AttributeTag(0x0008, 0x0014);
- AttributeTag transferSyntaxTag = new AttributeTag(0x0002, 0x0010);
- try {
- // Read DICOM
- attList.read(filepath);
-
- // Get TransferSyntaxUID
- Attribute transferSyntaxAtt = attList.get(transferSyntaxTag);
- String transferSyntaxUID = transferSyntaxAtt.getSingleStringValueOrEmptyString();
-
- // Remove Tag
- attList.remove(instanceCreatorTag);
-
- // Write DICOM
- attList.write("test.dcm", transferSyntaxUID, true, true);
- } catch (IOException | DicomException e) {
- System.out.println("Oops! Error: " + e.getMessage());
- }
- }
-}
-```
-
-This will output a new DICOM file `test.dcm` which doesn't contain the
-_InstanceCreatorUID_.
-
-To confirm this we can run **gdcmdiff** which dumps the difference between two
-DICOM files:
-
-```bash
-gdcmdiff
-```
-
-The output is:
-
-```
-(0008,0014) UI [only file 2] [1.3.6.1.4.1.5962.3] # Instance Creator UID
- -------------
-```
-
-as expected.
-
-You may notice that as we deal with many attributes, defining a specific tag as:
-
-```java
-AttributeTag transferSyntaxTag = new AttributeTag(0x0002, 0x0010);
-```
-
-becomes tedious. Fortunately, there are better ways. One of them is to use the
-`TagFromName` which contains constants that map names to tags, and the other
-is to use the `DicomDictionary` to do a lookup:
-
-```java
-// Using TagFromName
-AttributeTag transferSyntaxTag = TagFromName.TransferSyntaxUID;
-
-// Using DicomDictionary
-AttributeTag transferSyntaxTag = DicomDictionary.StandardDictionary.getTagFromName("TransferSyntaxUID")
-```
-
-#### pydicom
-
-Let's do the same using the pydicom. To do this, we will use the `dcmread` to
-read a DICOM file which returns a `FileDataset` instance that we can edit and
-then save.
-
-```python
-from pydicom import dcmread
-
-if __name__ == "__main__":
- with open("", "rb") as f_in:
- ds = dcmread(f_in)
- del ds.InstanceCreatorUID
- ds.save_as("test.dcm")
-```
-
-### Modify/Add Data Element
-
-#### PixelMed
-
-To modify/add a Data Element to the Data Set, we create an `AttributeList`
-instance and an `Attribute` instance, then put the attribute to the list:
-
-```java
-package pixelmed_demo;
-
-import java.io.IOException;
-import com.pixelmed.dicom.AttributeList;
-import com.pixelmed.dicom.AttributeTag;
-import com.pixelmed.dicom.Attribute;
-import com.pixelmed.dicom.DicomException;
-import com.pixelmed.dicom.TagFromName;
-import com.pixelmed.dicom.PersonNameAttribute;
-import com.pixelmed.dicom.CodeStringAttribute;
-
-public class demo_main {
-
- public static void main(String[] args) {
- String filepath = args[0];
- AttributeList attList = new AttributeList();
- try {
- // Read DICOM
- attList.read(filepath);
-
- // Get TransferSyntaxUID
- Attribute transferSyntaxAtt = attList.get(TagFromName.TransferSyntaxUID);
- String transferSyntaxUID = transferSyntaxAtt.getSingleStringValueOrEmptyString();
-
- // Modify Existing Tag
- Attribute patientName = new PersonNameAttribute(TagFromName.PatientName);
- patientName.addValue("TestName");
- attList.put(TagFromName.PatientName, patientName);
-
- // Add New Tag
- Attribute newAtt = new CodeStringAttribute(new AttributeTag(0x0011, 0x0010));
- newAtt.addValue("SomeRandomString");
- attList.put(newAtt);
-
- // Write DICOM
- attList.write("test.dcm", transferSyntaxUID, true, true);
- } catch (IOException | DicomException e) {
- System.out.println("Oops! Error: " + e.getMessage());
- }
- }
-}
-```
-
-This will create a new DICOM file with modified _PatientName_ tag and a new tag
-`(0011,0010)`. If we use **gdcmdiff** to check the differences between the old
-and the new file, we get exactly what we expect:
-
-```text
-(0010,0010) PN [from file 1] [TestName] # Patient's Name
-(0010,0010) PN [from file 2] [Test^FluroWithDisplayShutter] # Patient's Name
- -------------
-(0011,0010) CS [only file 1] [SomeRandomString] # Private Creator
- -------------
-```
-
-#### pydicom
-
-To do the same in python:
-
-```python
-from pydicom import dcmread
-
-if __name__ == "__main__":
- with open("", "rb") as f_in:
- ds = dcmread(f_in)
-
- # Modify Existing Tag
- ds.PatientName = "TestName"
-
- # Add New Tag
- ds.add_new([0x0011, 0x0010], "CS", "SomeRandomString")
-
- # Write DICOM
- ds.save_as("test.dcm")
-```
-
-which gives use the same result.
-
-If we want to add a tag from the DICOM Standard but we are not sure about its
-VR, we can use the `dictionary_VR`:
-
-```python
->>> from pydicom.datadict import dictionary_VR
->>> dictionary_VR([0x0028, 0x1050])
-'DS'
-```
-
-### Add Nested Data Element
-
-Let's add a private nested data element that contains two items which contain
-the same private attributes but with different values. The process is the same
-for the tags from the DICOM Standard.
-
-#### PixelMed
-
-To add a nested tag using the PixelMed, we have to define a `SequenceAttribute`.
-This attribute will contain Sequence Items which we create using the
-`AttributeList`. After adding Data Elements to the `AttributeList` we add the list
-to the `SequenceAttribute` using `addItem(AttributeList item)`:
-
-```java
-package pixelmed_demo;
-
-import java.io.IOException;
-import com.pixelmed.dicom.AttributeList;
-import com.pixelmed.dicom.AttributeTag;
-import com.pixelmed.dicom.Attribute;
-import com.pixelmed.dicom.DicomException;
-import com.pixelmed.dicom.TagFromName;
-import com.pixelmed.dicom.SequenceAttribute;
-import com.pixelmed.dicom.CodeStringAttribute;
-import com.pixelmed.dicom.DateAttribute;
-
-public class demo_main {
-
- public static void main(String[] args) {
- String filepath = args[0];
- AttributeList attList = new AttributeList();
- try {
- // Read DICOM
- attList.read(filepath);
-
- // Get TransferSyntaxUID
- Attribute transferSyntaxAtt = attList.get(TagFromName.TransferSyntaxUID);
- String transferSyntaxUID = transferSyntaxAtt.getSingleStringValueOrEmptyString();
-
- // Sequence Attribute
- SequenceAttribute seq = new SequenceAttribute(new AttributeTag(0x0011, 0x0010));
-
- // Sequence Item 1
- AttributeList seqItemOne = new AttributeList();
- Attribute attSeqOneOne = new CodeStringAttribute(new AttributeTag(0x0011, 0x0100));
- attSeqOneOne.addValue("Sequence One Value");
- Attribute attSeqOneTwo = new DateAttribute(new AttributeTag(0x0011, 0x0102));
- attSeqOneTwo.addValue("2022-01-01");
- seqItemOne.put(attSeqOneOne);
- seqItemOne.put(attSeqOneTwo);
-
- // Sequence Item 2
- AttributeList seqItemTwo = new AttributeList();
- Attribute attSeqTwoOne = new CodeStringAttribute(new AttributeTag(0x0011, 0x0100));
- attSeqTwoOne.addValue("Sequence Two Value");
- Attribute attSeqTwoTwo = new DateAttribute(new AttributeTag(0x0011, 0x0102));
- attSeqTwoTwo.addValue("2022-01-02");
- seqItemTwo.put(attSeqTwoOne);
- seqItemTwo.put(attSeqTwoTwo);
-
- // Add Sequence Items
- seq.addItem(seqItemOne);
- seq.addItem(seqItemTwo);
- attList.put(seq);
-
- // Write DICOM
- attList.write("test.dcm", transferSyntaxUID, true, true);
- } catch (IOException | DicomException e) {
- System.out.println("Oops! Error: " + e.getMessage());
- }
- }
-}
-```
-
-Each sequence item is a Data Set of its own and for each
-item we create an `AttributeList` instance where we add attributes, in the
-end we add these lists as sequence items to a sequence attribute.
-
-If we check the result with **gdcmdump**, we should see:
-
-```text
-(0011,0010) SQ (LO) (Sequence with undefined length) # u/l,1 Private Creator
- (fffe,e000) na (Item with undefined length)
- (0011,0100) CS [Sequence One Value] # 18,? (1) Private Element With Empty Private Creator
- (0011,0102) DA [2022-01-01] # 10,? (1) Private Element With Empty Private Creator
- (fffe,e00d)
- (fffe,e000) na (Item with undefined length)
- (0011,0100) CS [Sequence Two Value] # 18,? (1) Private Element With Empty Private Creator
- (0011,0102) DA [2022-01-02] # 10,? (1) Private Element With Empty Private Creator
- (fffe,e00d)
-(fffe,e0dd)
-```
-
-which is what we wanted.
-
-#### pydicom
-
-The same can be done using the pydicom, here we define a `seq` variable that is
-just a list of `Dataset` instances, these data sets have the same meaning as
-`AttributeList` in the PixelMed. After that we access individual data sets
-i.e. sequence items and add attributes to them:
-
-
-```python
-from pydicom import dcmread, Dataset
-
-if __name__ == "__main__":
- with open("test-dicom-files/image-2.dcm", "rb") as f_in:
- ds = dcmread(f_in)
-
- # Sequence Attribute
- seq = [Dataset(), Dataset()]
-
- # Sequence Item 1
- seq[0].add_new([0x0011, 0x0100], "CS", "Sequence One Value")
- seq[0].add_new([0x0011, 0x0102], "DA", "2021-01-01")
-
- # Sequence Item 2
- seq[1].add_new([0x0011, 0x0100], "CS", "Sequence Two Value")
- seq[1].add_new([0x0011, 0x0102], "DA", "2021-01-02")
-
- # Add Sequence Attribute
- ds.add_new([0x0011, 0x0010], 'SQ', seq)
-
- # Write DICOM
- ds.save_as("test.dcm")
-```
-
-The result is the same as above.
-
-### Change Transfer Syntax
-
-Let's first change Explicit to Implicit.
-
-Of course, before writing a new DICOM file, we have to update the information
-about new _TransferSyntaxUID_ and _SourceApplicationEntityTitle_.
-
-```java
-package pixelmed_demo;
-
-import java.io.IOException;
-import com.pixelmed.dicom.AttributeList;
-import com.pixelmed.dicom.DicomException;
-import com.pixelmed.dicom.TransferSyntax;
-import com.pixelmed.dicom.FileMetaInformation;
-
-public class demo_main {
-
- public static void main(String[] args) {
- String filepath = args[0];
- AttributeList attList = new AttributeList();
- try {
- // Read DICOM
- attList.read(filepath);
-
- // Update File Meta Information Header
- FileMetaInformation.addFileMetaInformation(attList, TransferSyntax.ImplicitVRLittleEndian, "DicomPlayGround");
-
- // Write DICOM
- attList.write("test.dcm", TransferSyntax.ImplicitVRLittleEndian, true, true);
- } catch (IOException | DicomException e) {
- System.out.println("Oops! Error: " + e.getMessage());
- }
- }
-}
-```
-
-The new file looks like:
-
-```text
-# Dicom-File-Format
-
-# Dicom-Meta-Information-Header
-# Used TransferSyntax:
-(0002,0000) UL 210 # 4,1 File Meta Information Group Length
-(0002,0001) OB 00\01 # 2,1 File Meta Information Version
-(0002,0002) UI [1.2.840.10008.5.1.4.1.1.12.2] # 28,1 Media Storage SOP Class UID
-(0002,0003) UI [1.3.6.1.4.1.5962.1.1.0.0.0.1168612284.20369.0.3] # 48,1 Media Storage SOP Instance UID
-(0002,0010) UI [1.2.840.10008.1.2] # 18,1 Transfer Syntax UID
-(0002,0012) UI [1.3.6.1.4.1.5962.99.2] # 22,1 Implementation Class UID
-(0002,0013) SH [PIXELMEDJAVA001 ] # 16,1 Implementation Version Name
-(0002,0016) AE [DicomPlayGround ] # 16,1 Source Application Entity Title
-
-# Dicom-Data-Set
-# Used TransferSyntax: 1.2.840.10008.1.2
-(0008,0005) ?? (CS) [ISO_IR 100] # 10,1-n Specific Character Set
-(0008,0008) ?? (CS) [ORIGINAL\PRIMARY\SINGLE PLANE ] # 30,2-n Image Type
-(0008,0012) ?? (DA) [20070112] # 8,1 Instance Creation Date
-(0008,0013) ?? (TM) [093126] # 6,1 Instance Creation Time
-(0008,0014) ?? (UI) [1.3.6.1.4.1.5962.3] # 18,1 Instance Creator UID
-(0008,0016) ?? (UI) [1.2.840.10008.5.1.4.1.1.12.2] # 28,1 SOP Class UID
-```
-
-Compare this to the same file in the [Explicit format](#dicom-file-in-explicit),
-the changes are obvious and expected.
-
-However, it's not so simple if we have compressed pixel data. Depending on the
-compression and wanted output it could be hard to find needed libraries
-that can do the conversion.
-
-The PixelMed relies on Java libraries for compressing and decompressing pixel data.
-You could use `imageIO`, the PixelMed's stand-alone _Java JPEG Selective Block
-Redaction Codec and Lossless JPEG Decoder_, or any other library, but as far
-as I know, things can get complicated and inefficient for certain
-TransferSyntaxUIDs i.e. compression algorithms.
-
-In my opinion, when it comes to compression, the better option is to use GDCM
-only or GDCM in combination with the pydicom. To get the better understanding
-of supported transfer syntaxes, please see the table at
-[Supported Transfer Syntaxes](https://pydicom.github.io/pydicom/dev/old/image_data_handlers.html#supported-transfer-syntaxes){:target="_blank"}.
-
-For now, we can use **gdcmconv** tool to play with different transfer syntaxes.
-There are a lot of options, and you can explore them at
-[gdcmconv](http://gdcm.sourceforge.net/html/gdcmconv.html){:target="_blank"}.
-
-To convert to JPEG Lossless i.e. `1.2.840.10008.1.2.4.70` we can use:
-
-```bash
-gdcmconv --jpeg -i -o
-```
-
-If your input file was uncompressed, you should see a significant loss in
-size of the output file.
-
-### Create DICOM from an Image
-
-Creating a DICOM file from an image can be really useful in many cases.
-Especially, when it comes to testing:
-- Application should be tested in test/dev environment where you cannot
-use real world scans
-- De-identification process that masks certain parts of images should be
-tested for different image resolutions etc.
-
-To achieve this you can use the `ImageToDicom`:
-
-```java
-package pixelmed_demo;
-
-import java.io.IOException;
-import com.pixelmed.dicom.ImageToDicom;
-import com.pixelmed.dicom.DicomException;
-
-public class demo_main {
-
- public static void main(String[] args) {
- String filepath = args[0];
- try {
- new ImageToDicom(
- filepath,
- "test.dcm", // Output path
- "Vladsiv", // Patient Name
- "TEST-98341-VladSiv", // Patient ID
- "847542", // Study ID
- "1", // Series Number
- "1", // Instance Number
- "US", // Modality
- "1.2.840.10008.5.1.4.1.1.6.1" // SOP Class UID
- );
- } catch (IOException | DicomException e) {
- System.out.println("Oops! Error: " + e.getMessage());
- }
- }
-}
-```
-
-The result is:
-
-{% include image.html
- src="/assets/images/posts/dicom-playground/avatar.png"
- alt="image-to-dicom"
- caption="Image to DICOM MicroDicom Preview"
-%}
-
-Of course, you can then process the output DICOM file, add/modify data
-elements, and tailor the file for your specific needs.
-
-To do the same using the pydicom, please see this thread:
-[PNG to DICOM with pydicom](https://github.com/pydicom/pydicom/issues/939){:target="_blank"}
-
-### Blackout Image
-
-In certain circumstances, we want to blackout certain parts of DICOM images.
-This is usually done to remove the
-[Protected Health Information - PHI](https://en.wikipedia.org/wiki/Protected_health_information){:target="_blank"}
-that can be found on an image.
-
-To do this using the PixelMed we can use the `ImageEditUtilities` and the method:
-`blackout(SourceImage srcImg, AttributeList list, java.util.Vector shapes)`:
-
-```java
-package pixelmed_demo;
-
-import java.awt.Shape;
-import java.awt.Rectangle;
-import java.util.Vector;
-import com.pixelmed.display.ImageEditUtilities;
-import com.pixelmed.display.SourceImage;
-import com.pixelmed.dicom.AttributeList;
-import com.pixelmed.dicom.Attribute;
-import com.pixelmed.dicom.TagFromName;
-
-public class demo_main {
-
- public static void main(String[] args) {
- String filepath = args[0];
- AttributeList attList = new AttributeList();
- try {
- // Read DICOM file
- attList.read(filepath);
-
- // Get Transfer Syntax
- Attribute transferSyntaxAtt = attList.get(TagFromName.TransferSyntaxUID);
- String transferSyntaxUID = transferSyntaxAtt.getSingleStringValueOrEmptyString();
-
- // Create Area to blackout
- Vector shapes = new Vector();
- Shape shape = new Rectangle(35, 60, 140, 50);
- shapes.add(shape);
-
- // Define Image and Perform Blackout
- SourceImage sImg = new SourceImage(attList);
- ImageEditUtilities.blackout(sImg, attList, shapes);
-
- // Write DICOM
- attList.write("test.dcm", transferSyntaxUID, true, true);
-
- } catch (Exception e) {
- System.out.println("Oops! Error: " + e.getMessage());
- }
- }
-}
-```
-
-This will give us:
-
-{% include image.html
- src="/assets/images/posts/dicom-playground/avatarblackout.png"
- alt="dicom-blackout "
- caption="Blackout MicroDicom Preview"
-%}
-
-## Final Words
-
-This article gives a brief introduction to basics of processing DICOM files
-using some of the open source libraries.
-
-There are many awesome libraries and DICOM applications that I didn't mention
-and I encourage you to go through provided material and play with them on your
-own.
-
-I hope this helps you get a better understanding how you can process DICOM
-files and start building your own DICOM applications.
-
-If you have any questions or suggestions, please reach out, I'm always
-available.
diff --git a/_posts/2022-04-01-big-data-file-formats.md b/_posts/2022-04-01-big-data-file-formats.md
deleted file mode 100644
index 2f1d2f2..0000000
--- a/_posts/2022-04-01-big-data-file-formats.md
+++ /dev/null
@@ -1,379 +0,0 @@
----
-title: "Understanding Big Data File Formats"
-page_title: "Understanding Big Data File Formats"
-excerpt: "Dive into the structure of popular Big Data file formats like
-Parquet, Avro, and ORC. Understand their unique features and advantages.
-Learn how these formats optimize data storage and processing."
-date: April 1, 2022
-toc: true
-toc_label: "Content"
-toc_sticky: true
-last_modified_at: April 1, 2022
-og_image: /assets/images/posts/big-data-file-formats/header.jpg
----
-
-{% include image.html
- src="/assets/images/posts/big-data-file-formats/header.jpg"
- alt="similarity-header"
- caption="Image Source: Pexels"
-%}
-
-## Introduction
-
-As we all know, juggling data from one system
-to another, transforming it through data pipelines, storing, and later
-performing analytics can easily become expensive and inefficient as the data
-grows. We often think about scalability and how to deal with the 4Vs[^1] of
-data, making sure that our system can handle dynamic flows and doesn't get
-overwhelmed with the amount of data.
-
-There are many aspects of Big Data systems that come into considerations when we
-talk about scalability. Not surprisingly, one of them is how we define the
-storage strategy. A huge bottleneck for many applications is the time it takes
-to find relevant data, processes it, and write it back to another location. As
-the data grows, managing large datasets and evolving schemas becomes a huge
-issue.
-
-Whether we are running Big Data analytics on on-prem clusters with dedicated or
-bare metal servers, or on cloud infrastructure, one thing is certain: the way
-we store our data and which file format we use will have an immense impact. How
-we store the data in our datalake or warehouse is critical.
-
-Among many things, choosing an appropriate file format can:
-- Increase read/write times
-- Split files
-- Support schema evolution
-- Support compression
-
-In this article we will cover some of the most common big data formats,
-their structure, when to use them, and what benefits they can have.
-
-[^1]: Volume, Variety, Velocity, and Veracity
-
-
-
-## AVRO
-
-As we all know, to transfer the data or store it, we need to serialize it
-first. [Avro](https://avro.apache.org/docs/current/){:target="_blank"}
-is one of the systems that implements data serialization. It's a
-row-based remote procedure call and data serialization framework. It was
-developed by Doug Cutting[^2] and managed within Apache's Hadoop project.
-
-[^2]: The father of Hadoop and Lucene. Ref: [Wikipedia - Doug Cutting](https://en.wikipedia.org/wiki/Doug_Cutting){:target="_blank"}
-
-It uses JSON for defining data types and serializes data in a compact binary
-format. Additionally, Avro is a language-neutral data serialization system which
-means that theoretically any language could use Avro.
-
-### Structure
-
-An Avro file consists of:
-- File header and
-- One or more file data blocks
-
-{% include image.html
- src="/assets/images/posts/big-data-file-formats/avro_format.png"
- alt="similarity-header"
- caption="Depiction of an Avro file - Image Source"
-%}
-
-Header contains information about the schema and codec, while blocks are
-divided into:
-- Counts of objects in a block
-- Size of the serialized objects
-- Objects themselves
-
-> If you are interested, take a look at
-[Java implementation](https://avro.apache.org/docs/current/api/java/index.html){:target="_blank"}
-
-
-### Schema
-
-Schema can be defined as:
-- JSON string,
-- JSON object, of the form: `{"type": "typeName" ...attributes...}`
-- JSON array
-
-> or Avro IDL for human readable schema
-
-`typeName` specifies a type name which can be a primitive or complex type.
-
-Primitive types are: `null`, `boolean`, `int`, `long`, `float`, `double`,
-`bytes`, and `string`.
-
-There are 6 complex types: `record`, `enum`, `array`,
-`map`, `union`, and `fixed`. Each of them supports `attributes`. In case of the
-`record` type, they are:
-- `name`: name of the record (required)
-- `namespace`: string that qualifies the name
-- `doc`: documentation of the schema (optional)
-- `aliases`: alternative names of the schema (optional)
-- `fields`: list of fields, where each field is a JSON object
-
-> The `attributes` of other complex types can be found in the
-[Apache Avro Specification](https://avro.apache.org/docs/current/spec.html){:target="_blank"}
-
-Now that we understand the structure of the schema, we can define it as, for
-example:
-
-```json
- {
- "namespace": "example.vladsiv",
- "type": "record",
- "name": "vladsiv",
- "doc": "Just a test schema",
- "fields": [
- {"name": "name", "type": "string"},
- {"name": "year", "type": ["null", "int"]}
- ]
- }
-```
-
-### Why Avro?
-
-- It is a very fast serialization/deserialization format, great for storing raw
-data
-- Schema is included in the file header, which allows downstream systems to
-easily retrieve the schema (no need for external metastores)
-- Supports evolutionary schemas - any source schema change is easily handled
-- It's more optimized for reading series of entire rows - since it's row-based
-- Good option for landing zones where we store the data for further processing
-(which usually means that we read whole files)
-- Great integration with Kafka
-- Supports file splitting
-
-
-
-## Parquet
-
-Apache Parquet is a free and open-source column-oriented data storage format
-and it began as a joint effort between Twitter and Cloudera. It's designed
-for efficient data storage and retrieval.
-
-Running queries on the Parquet-based file-system is extremely efficient since
-the column-oriented data allows you to focus on the relevant data very quickly.
-This means that the amount of data scanned is way smaller which results in
-efficient I/O usage.
-
-### Structure
-
-Parquet file consists of 3 parts:
-
-- Header: which has 4-byte magic number `PAR1`
-- Data Body: contains blocks of data stored in row groups
-- Footer: where all the metadata is stored
-
-The 4-byte magic number `PAR1`, stored in the header and footer, indicates that
-the file is in parquet format.
-
-Metadata includes the version of the format, schema, information about the
-columns in the data which includes: type, path, encoding, number of values etc.
-The interesting fact is that the metadata is stored in the footer, which allows
-single pass writing.
-
-So how do we read a parquet file? At the end of the file we have footer length,
-so the initial seek will be performed to read the length and then, since we
-know the length, jump to the beginning of the footer metadata.
-
-{% include image.html
- src="/assets/images/posts/big-data-file-formats/parquet_format.png"
- alt="similarity-header"
- caption="Depiction of a Parquet file - Image Source"
-%}
-
-Why is this important?
-
-Since body consists of blocks and each block has a boundary, writing
-metadata at the end (when all the blocks have been written) allows us to
-store these boundaries. This gives us a huge performance when it comes to
-locating the blocks and processing them in parallel.
-
-Each block is stored in the form of _row groups_. As we can see in the image
-above, row groups have multiple columns called: _columnar chunks_. These chunks
-are further divided into _pages_. The pages store values for a particular
-column and can be compressed as the values can repeat.
-
-### Schema
-
-Parquet supports schema evolution. For example, we can start with a simple
-schema, and then add more columns as needed. By doing this, we end up with
-multiple Parquet files with different schemas. This is not an
-issue since the schemas are mutually compatible and Parquet supports automatic
-schema merging among those files.
-
-When it comes to data types, Parquet defines a set of types that is intended to
-be as minimal as possible:
-
-- `BOOLEAN`: 1 bit boolean
-- `INT32`: 32 bit signed ints
-- `INT64`: 64 bit signed ints
-- `INT96`: 96 bit signed ints
-- `FLOAT`: IEEE 32-bit floating point values
-- `DOUBLE`: IEEE 64-bit floating point values
-- `BYTE_ARRAY`: arbitrarily long byte arrays.
-
-These are further extended by _Logical Types_. This keeps the set of primitive
-types to a minimum and reuses parquet's efficient encodings. These extended
-types are:
-
-- String Types: `STRING`, `ENUM`, `UUID`
-- Numerical Types: `Signed Integers`, `Unsigned Integers`, `DECIMAL`
-- Temporal Types: `DATE`, `TIME`, `TIMESTAMP`, `INTERVAL`
-- Embedded Types: `JSON`, `BSON`
-- Nested Types: `Lists`, `Maps`
-- Unknown Types: `UNKNOWN` - always null
-
-If you are interested in details regarding Logical Types and encodings, please see:
-[Parquet Logical Type Definitions](https://github.com/apache/parquet-format/blob/2e23a1168f50e83cacbbf970259a947e430ebe3a/LogicalTypes.md){:target="_blank"}
-and
-[Parquet encoding definitions](https://github.com/apache/parquet-format/blob/2e23a1168f50e83cacbbf970259a947e430ebe3a/Encodings.md){:target="_blank"}.
-
-### Why Parquet?
-
-- When we are dealing with many columns but only want to query some of them -
-Since Parquet is column-based it's great for analytics. As the business expands
-we usually tend to increase the number of fields in our datasets but most of our
-queries just use a subset of them
-- Great for very large amounts of data - Techniques like data skipping
-increase data throughput and performance on large datasets
-- Low storage consumption by implementing efficient column-wise compression
-- Free and open source - It's language agnostic and decouples storage from
-compute services (since most of data analytic services have support for
-Parquet out of the box)
-- Works great with serverless cloud technologies like AWS Athena,
-Amazon Redshift Spectrum, Google BigQuery, Google Dataproc etc
-
-
-
-## ORC
-
-Apache ORC (Optimized Row Columnar) is a free and open-source column-oriented
-data storage format. It was first announced in February 2013 by Hortonworks in
-collaboration with Facebook[^3].
-
-[^3]: A month later, the Apache Parquet format was announced. Ref: [Wikipedia - Apache ORC](https://en.wikipedia.org/wiki/Apache_ORC){:target="_blank"}
-
-ORC provides a highly efficient way to store data using block-mode compression
-based on data types. It has great reading, writing, and processing performance
-thanks to data skipping and indexing.
-
-### Structure
-
-ORC file format consists of:
-- Groups of row data called _stripes_
-- File footer
-- Postscript
-
-{% include image.html
- src="/assets/images/posts/big-data-file-formats/orc_format.png"
- alt="similarity-header"
- caption="Depiction of an ORC file - Image Source"
-%}
-
-The process of reading an ORC file starts at the end of the file. The final
-byte of the file contains the length of the Postscript. The Postscript is
-never compressed and provides the information about the file: metadata,
-version, compression etc. Once we parse the Postscript we can get the
-compressed form of the File Footer, decompress it, and learn more about the
-stripes stored in the file.
-
-The File Footer contains information about the stripes, the number of rows per
-stripe, the type schema information, and some column-level statistics:
-count, min, max, and sum.
-
-Stripes contain only entire rows and are divided into three sections:
-- Index data: a set of indexes for the rows within the stripe
-- Row data
-- Stripe footer: directory of stream locations
-
-What's important here is that both the indexes and the row data sections are
-in column-oriented format. This allows us to read the data only for the required
-columns.
-
-Index data provides information about the columns stored in row data. It
-includes min and max values for each column, but also row positions which
-provide offsets that enable row-skipping within a stripe for fast reads.
-
-> If you are interested in more details about the ORC file format, please see:
-[ORC Specification v1](https://orc.apache.org/specification/ORCv1/){:target="_blank"}
-
-### Schema
-
-Like Avro and Parquet, ORC also supports schema evolution. This allow us to
-merge schema of multiple ORC files with different but mutually compatible
-schemas.
-
-ORC provides a rich set of scalar and compound types:
-
-- Integer: `boolean`, `tinyint`, `smallint`, `int`, `bigint`
-- Floating point: `float`, `double`
-- String types: `string`, `char`, `varchar`
-- Binary blobs: `binary`
-- Date/time: `timestamp`, `timestamp` with local time zone, `date`
-- Compound types: `struct`, `list`, `map`, `union`
-
-All scalar and compound types in ORC can take null values.
-
-There is a nice example in the ORC documentation:
-[Types](https://orc.apache.org/docs/types.html){:target="_blank"}, that
-illustrates how it works.
-
-Let's say that we have the table `Foobar`:
-
-```
-create table Foobar (
- myInt int,
- myMap map>,
- myTime timestamp
-);
-```
-
-The columns in the file would form the following tree:
-
-{% include image.html
- src="/assets/images/posts/big-data-file-formats/orc-schema-tree.png"
- alt="similarity-header"
- caption="ORC schema tree - Image Source"
-%}
-
-### Why ORC?
-
-- Really efficient compression - It saves a lot of storage space
-- Support for ACID transactions - [ACID Support](https://orc.apache.org/docs/acid.html){:target="_blank"}
-- Predicate Pushdown efficiency - Pushes filters into reads so that minimal
-number of columns and rows are read
-- Schema evolution and merging
-- Efficient and fast reads - Thanks to built-in indexes and column aggregates,
-we can skip entire stripes and focus on the data we need
-
-## Final Words
-
-Understanding how big data file formats work helps us make the right decision
-that will impact efficiency and scalability of our data applications. Each
-file format has its own unique internal structure and could be the right choice
-for our storage strategy, depending on the use case.
-
-In this article I've tried to give a brief overview of some of the key points
-that will help you understand the underlying structure and benefits of popular
-big data file formats. Of course, there are many details which I didn't cover
-and I encourage you to go through all of the referenced material to learn more.
-
-I hope you enjoyed reading this article and, as always, feel free to reach
-out to me if you have any questions or suggestions.
-
-> **Bonus**: AliORC (Alibaba ORC) is a deeply optimized file format based on the
-open-source Apache ORC. It is fully compatible with open-source ORC, extended
-with additional features, and optimized for Async Prefetch, I/O mode management,
-and adaptive dictionary encoding. Ref:
-[AliORC: A Combination of MaxCompute and Apache ORC](https://www.alibabacloud.com/blog/aliorc-a-combination-of-maxcompute-and-apache-orc_595359){:target="_blank"}
-
-## Resources
-
-- [Big Data File Formats](https://www.clairvoyant.ai/blog/big-data-file-formats){:target="_blank"}
-- [Performance comparison of different file formats and storage engines in the Hadoop ecosystem ](https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines){:target="_blank"}
-- [Databricks - Parquet](https://databricks.com/glossary/what-is-parquet){:target="_blank"}
-- [All You Need To Know About Parquet File Structure In Depth ](https://www.linkedin.com/pulse/all-you-need-know-parquet-file-structure-depth-rohan-karanjawala){:target="_blank"}
-- [Apache Hive - ORC](https://cwiki.apache.org/confluence/display/hive/languagemanual+orc){:target="_blank"}
diff --git a/_posts/2022-12-18-python-development-environment.md b/_posts/2022-12-18-python-development-environment.md
deleted file mode 100644
index 419f220..0000000
--- a/_posts/2022-12-18-python-development-environment.md
+++ /dev/null
@@ -1,540 +0,0 @@
----
-title: "Local Python Development Environment"
-page_title: "Local Python Development Environment"
-excerpt: "Discover a comprehensive guide on configuring your local machine
-for Python projects. This guide provides an overview of the most commonly
-used tools throughout the development process."
-date: December 18, 2022
-toc: true
-toc_label: "Content"
-toc_sticky: true
-last_modified_at: December 18, 2022
-og_image: /assets/images/posts/python-development-environment/header.jpg
----
-
-{% include image.html
- src="/assets/images/posts/python-development-environment/header.jpg"
- alt="siesta-and-libraries-header"
- caption="Image Source: Pexels"
-%}
-
-## Introduction
-
-Some of my colleagues and friends had troubles setting up their local machines
-for working on Python projects. Things even got messier when there were multiple
-Python projects with different versions of packages or Python itself.
-Consequently, I've decided to write a small post that will serve as a quick
-guide and reference for those that are just starting with Python or switched
-from other programming languages and are not quite sure how things work in the
-Python ecosystem.
-
-I hope this will help you, and please don't hesitate to reach out with
-questions or suggestions!
-
-## Local Machine
-
-This guide is primarily for unix-like systems. If you are using Windows,
-I suggest you use Windows Subsystem for Linux (WSL) or run a Docker image,
-which will allow you to experiment without breaking your OS environment.
-
-The choice of OS shouldn't matter. I'm running Manjaro on my personal PC,
-Ubuntu on work laptop, and a colleague is using the same setup on
-WSL-ubuntu.
-
-## Pyenv
-
-[Pyenv](https://github.com/pyenv/pyenv){:target="_blank"}
-is a simple Python version management tool. It lets you easily switch
-between multiple versions of Python.
-
-### Installation
-
-To install `pyenv` you can use
-[pyenv installer](https://github.com/pyenv/pyenv-installer){:target="_blank"},
-and run:
-
-```bash
-curl https://pyenv.run | bash
-```
-
-If the installation was successful, you'll see the following message:
-
-```
-WARNING: seems you still have not added 'pyenv' to the load path.
-
-# Load pyenv automatically by appending
-# the following to
-~/.bash_profile if it exists, otherwise ~/.profile (for login shells)
-and ~/.bashrc (for interactive shells) :
-
-export PYENV_ROOT="$HOME/.pyenv"
-command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
-eval "$(pyenv init -)"
-
-# Restart your shell for the changes to take effect.
-
-# Load pyenv-virtualenv automatically by adding
-# the following to ~/.bashrc:
-
-eval "$(pyenv virtualenv-init -)"
-```
-
-This will export `PYENV_ROOT` variable, add `pyenv` to the `$PATH`, and load
-it automatically in a shell. After restarting your shell, the command
-`pyenv` should output:
-
-```
-pyenv 2.3.7
-Usage: pyenv []
-
-Some useful pyenv commands are:
- --version Display the version of pyenv
- activate Activate virtual environment
- commands List all available pyenv commands
- deactivate Deactivate virtual environment
- doctor Verify pyenv installation and development tools to build pythons.
- exec Run an executable with the selected Python version
- global Set or show the global Python version(s)
- help Display help for a command
- hooks List hook scripts for a given pyenv command
- init Configure the shell environment for pyenv
- install Install a Python version using python-build
- latest Print the latest installed or known version with the given prefix
- local Set or show the local application-specific Python version(s)
- prefix Display prefixes for Python versions
- rehash Rehash pyenv shims (run this after installing executables)
- root Display the root directory where versions and shims are kept
- shell Set or show the shell-specific Python version
- shims List existing pyenv shims
- uninstall Uninstall Python versions
- version Show the current Python version(s) and its origin
- version-file Detect the file that sets the current pyenv version
- version-name Show the current Python version
- version-origin Explain how the current Python version is set
- versions List all Python versions available to pyenv
- virtualenv Create a Python virtualenv using the pyenv-virtualenv plugin
- virtualenv-delete Uninstall a specific Python virtualenv
- virtualenv-init Configure the shell environment for pyenv-virtualenv
- virtualenv-prefix Display real_prefix for a Python virtualenv version
- virtualenvs List all Python virtualenvs found in `$PYENV_ROOT/versions/*'.
- whence List all Python versions that contain the given executable
- which Display the full path to an executable
-```
-
-### Python versions
-
-It's important to mention that `pyenv` builds python from source, which means
-that we'll need some library dependencies installed before we use it.
-Dependencies will vary depending on your OS. If you are using Ubuntu:
-
-```bash
-apt-get install -y make build-essential libssl-dev zlib1g-dev \
-libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \
-libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python-openssl
-```
-
-Now that we have everything that we need, we can build a Python version,
-for example `3.10.4`:
-
-```bash
-pyenv install 3.10.4
-```
-
-If that worked, you can list installed versions using:
-
-```bash
-> pyenv versions
-* system (set by /root/.pyenv/version)
- 3.10.4
-```
-
-As you can see, we are using the system's Python version. To set `3.10.4` as a
-global Python version use `pyenv global 3.10.4`, and check `python -V` to
-ensure that the version is correct.
-
-> If you run `which python`, you'll see an interesting path
-> `/.pyenv/shims/python`. This `shims` directory is inserted at the
-> front of your `$PATH` (check with `echo $PATH`) and it's a directory to match
-> every Python command across every installed version of Python. `pyenv` basically
-> maintains lightweight executables that pass python commands along to `pyenv`.
-
-If we have multiple versions and want to set a specific version on a folder
-level, we can use `pyenv local`, for example `pyenv local 3.9.15`. This will
-create a file `.python-version` which basically tells `pyenv` which version to
-use. It also works for subdirectories.
-
-## Poetry
-
-Alright, now that we can control multiple Python versions on our system. How to
-approach the scenario with multiple projects with different package versions?
-
-The idea is to create a separate environment for each project so that
-dependencies for one project have nothing to do and don't collide with
-dependencies for other projects. This is done through a
-virtual environment.
-
-> Python already has its built-in `venv` module for creating virtual
-> environments. To create a virtual environment use
-> `python -m venv /path/to/new/virtual/environment` and then activate it.
-> See [venv](https://docs.python.org/3/library/venv.html){:target="_blank"} for
-> more information. However, there are tools like Poetry and Pipenv that manage
-> that for you.
-
-Tools like [Poetry](https://python-poetry.org/){:target="_blank"}
-not only manage virtual environments but also handle dependencies that can
-help us manage projects in a deterministic way.
-
-> If you had experience with JavaScript, you can think of
-> Poetry as npm.
-
-### Template
-
-To start with Poetry, we can create a new project using
-`poetry new `, for example: `poetry new test-app`.
-This will create a basic boilerplate as a starter for any Python project.
-This boilerplate includes:
-- `tests` folder
-- `test-app` folder
-- `README.rst` file
-- `pyproject.toml` file
-
-`pyproject.toml` is basically a file that defines the project:
-
-```toml
-[tool.poetry]
-name = "test-app"
-version = "0.1.0"
-description = ""
-authors = ["Vladsiv "]
-
-[tool.poetry.dependencies]
-python = "^3.7"
-
-[tool.poetry.dev-dependencies]
-pytest = "^5.2"
-
-[build-system]
-requires = ["poetry-core>=1.0.0"]
-build-backend = "poetry.core.masonry.api"
-```
-
-The first section `[tool.poetry]` defines general information about the
-package: name, version, description, license, authors, maintainers,
-keywords, etc.
-
-The second section `[tool.poetry.dependencies]` defines production
-dependencies. If you build a package wheel, it will include only production
-libraries as dependencies.
-
-The third section `[tool.poetry-dev-dependencies]` defines development
-dependencies. As you can see in the example, it specifies the version for
-`pytest`, which is a library for writing tests. We don't need that for
-production environment.
-
-The forth section `[build-system]` is a
-[PEP-517](https://peps.python.org/pep-0517/){:target="_blank"}
-specification that is used to define alternative build systems to build a
-Python project. In other words, other libraries will know that your project
-is managed by Poetry.
-
-### Environment
-
-Now that we have a new Python project with some boilerplate code, let's run
-`poetry env info`. This command give us some information about the project's
-environment. Output looks something like:
-
-```text
-Virtualenv
-Python: 3.7.12
-Implementation: CPython
-Path: NA
-
-System
-Platform: linux
-OS: posix
-Python: /home/vladimir/.pyenv/versions/3.7.12
-```
-
-As we can see, we are using Python version managed by `pyenv`. However, for
-the virtual environment `path` we have `NA`, which tells us that our project does
-not have an environment.
-
-To create it, we can use `poetry shell`. It will create a virtual environment if
-it doesn't exist and activate it. In terminal it looks like:
-
-{% include image.html
- src="/assets/images/posts/python-development-environment/poetry_shell_terminal.jpg"
- alt="siesta-and-libraries-header"
- caption="Image Source: Pexels"
-%}
-
-The first line says that it's creating an environment in
-`/home/vladimir/.cache/pypoetry/virtualenvs`. This is the standard location for
-all virtual environments managed by Poetry. If you want to delete
-an environment and recreate it, just go to that path, run `rm -rf `,
-and create it again using the same poetry command.
-
-> If you take a closer look at the image, you'll see the environment
-> name on the right hand side (in blue). That's automatically added to the
-> terminal line using a zsh theme. If you want a lit terminal, see
-> [romkatv/powerlevel10k](https://github.com/romkatv/powerlevel10k){:target="_blank"}.
-
-Since we activated a virtual environment, we can run `poetry env info` again
-to confirm the setup:
-
-```
-Virtualenv
-Python: 3.7.12
-Implementation: CPython
-Path: /home/vladimir/.cache/pypoetry/virtualenvs/test-app-NOA-EWXm-py3.7
-Valid: True
-
-System
-Platform: linux
-OS: posix
-Python: /home/vladimir/.pyenv/versions/3.7.12
-```
-
-This is the setup that we want, `pyenv` manages python versions and Poetry
-takes care of virtual environment and packages. Using this approach, we can have
-multiple projects running on different Python versions in a completely isolated
-virtual environments.
-
-### Dependencies
-
-There is another really important file: `poetry.lock`. This file is used to
-specify exact versions and hashes of packages that are used as dependencies. If
-you look at our `test-app` project folder, we don't have it. This is because we
-haven't installed anything yet. To install dependencies, use `poetry install`:
-
-```text
-Updating dependencies
-Resolving dependencies... (2.6s)
-
-Writing lock file
-
-Package operations: 10 installs, 0 updates, 0 removals
-
- • Installing typing-extensions (4.4.0)
- • Installing zipp (3.11.0)
- • Installing importlib-metadata (5.1.0)
- • Installing attrs (22.1.0)
- • Installing more-itertools (9.0.0)
- • Installing packaging (22.0)
- • Installing pluggy (0.13.1)
- • Installing py (1.11.0)
- • Installing wcwidth (0.2.5)
- • Installing pytest (5.4.3)
-
-Installing the current project: test-app (0.1.0)
-```
-
-In this case it installs `pytest`, its dependencies, and creates a `poetry.lock`
-file. At the bottom of the `poetry.lock` you'll find sha hashes of package
-wheels which allows exact recreation of a virtual environment across multiple
-machines from development to production.
-
-Use `poetry add ` to add a new package dependency, for example
-`poetry add requests`:
-
-```text
-Using version ^2.28.1 for requests
-
-Updating dependencies
-Resolving dependencies... (5.1s)
-
-Writing lock file
-
-Package operations: 5 installs, 0 updates, 0 removals
-
- • Installing certifi (2022.12.7)
- • Installing charset-normalizer (2.1.1)
- • Installing idna (3.4)
- • Installing urllib3 (1.26.13)
- • Installing requests (2.28.1)
-```
-
-This will update `poetry.lock` but also `pyproject.toml`, which now
-specifies `requests` as a production dependency:
-
-```toml
-[tool.poetry.dependencies]
-python = "^3.7"
-requests = "^2.28.1"
-```
-
-If we want a development dependency use `poetry add --dev`, this
-will add the package to `[tool.poetry.dev-dependencies]` section.
-
-### Scripts
-
-Poetry offers a way to run scripts using `poetry run `. This is
-especially useful for CICD pipelines where we can define scripts for various
-things, such as tests, linters, building, deploying, etc.
-
-#### Simple Function
-
-The simplest way is to create a folder `scripts` that acts as a module of
-functions, for example `build.py` which has `build_documentation` function.
-Then in `pyproject.toml` create the `[tool.poetry.scripts]` section:
-
-```toml
-[tool.poetry.scripts]
-build-documentation = "scripts.build:start"
-```
-
-#### Click
-
-Personally, I like the setup with
-[click](https://click.palletsprojects.com/en/8.1.x/){:target="_blank"}.
-It offers elegant solutions for building CLI tools.
-
-Add it as a development dependency: `poetry add click --dev`, and create
-the following structure under `scripts` folder:
-
-```text
-scripts/
-├─ __init__.py
-├─ cli.py
-├─ build.py
-├─ linters.py
-├─ tests.py
-└─ deploy.py
-```
-
-Where the `cli.py` looks like, for example:
-
-```python
-import click
-from scripts.build import build_documentation, build_package, build_wheel
-from scripts.deploy import deploy_to_aws, deploy_to_azure
-from scripts.tests import test_integration, test_package
-from scripts.linters import run_pylint, run_mypy
-
-@click.group()
-def build():
- pass
-
-@click.group()
-def tests():
- pass
-
-@click.group()
-def linters():
- pass
-
-@click.group()
-def deploy():
- pass
-
-@build.command()
-def package():
- build_package()
-
-@build.command()
-def wheel():
- build_wheel()
-
-@build.command()
-def wheel():
- build_documentation()
-
-@deploy.command()
-def aws():
- deploy_to_aws()
-
-# ...
-```
-
-Then in `pyproject.toml` specify the scripts:
-
-```toml
-[tool.poetry.scripts]
-build = "scripts.cli:build"
-tests = "scripts.cli:tests"
-deploy = "scripts.cli:deploy"
-linters = "scripts.cli:linters"
-```
-
-This allows you to use all the nice CLI features that `click` provides, both
-in local development and CICD scripts.
-
-## Pytest
-
-[Pytest](https://docs.pytest.org/en/7.2.x/){:target="_blank"} is a framework
-for writing python tests. It has a lot of features that allow us to write
-complex and scalable functional tests for applications and libraries.
-Additionally, pytest supports third-party plugins that offer additional
-functionality -
-[How to install and use plugins](https://docs.pytest.org/en/7.2.x/how-to/plugins.html){:target="_blank"}.
-
-Pytest's documentation and how-to guides are extensive and well-written,
-especially when it comes to fixtures. If you are not familiar with fixtures
-and how to use them, please go through
-[How to use fixtures](https://docs.pytest.org/en/7.2.x/how-to/fixtures.html){:target="_blank"}.
-
-There are two plugins that I consider essential:
-
-- [pytest-cov](https://github.com/pytest-dev/pytest-cov){:target="_blank"} -
-Takes care of coverage reports. Basically, it checks how well an application
-is covered with tests. It is especially valuable in CICD pipelines. If a
-developer pushes new code, you want to make sure that each new line is covered
-with tests. So if coverage is not 100%, pipeline should fail with a detailed
-report.
-- [pytest-xdist](https://github.com/pytest-dev/pytest-xdist){:target="_blank"} -
-Takes care of parallel testing. As application grows, the number of tests and
-time it takes to run them significantly increases. Performance of test suites
-can directly impact cost and time spent for development. For example, if you
-are running CICD pipelines on GitLab that uses EC2 or Fargate instances on AWS.
-Essentially, you are paying for X number of CPUs, but running tests only on one.
-This plugin extends pytest with additional modes that allow it to distribute
-tests across multiple CPUs to speed up test execution. Additionally,
-`pytest-cov` also works with `pytest-xdist`.
-
-## Pylint
-
-Another tool that's essential for python projects is
-[pylint](https://pylint.pycqa.org/en/latest/){:target="_blank"}. It's a
-library that provides a python linter.
-
-> If you are not sure what a linter is, please see
-> [What is a linter and why your team should use it?](https://sourcelevel.io/blog/what-is-a-linter-and-why-your-team-should-use-it){:target="_blank"}
-
-Pylint runs through the source code without executing it (static analysis) and
-checks for potential bugs, performance issue, coding style misalignments,
-poorly designed code,
-[code smells](https://en.wikipedia.org/wiki/Code_smell){:target="_blank"}, etc.
-
-It supports extensions, plugins, and extensive configuration which you
-can use to enforce some of the projects requirements and coding styles.
-
-## MyPy
-
-[MyPy](mypy-lang.org){:target="_blank"} is another type of tool that checks
-your code for potential bugs. It is an optional static type checker for Python.
-
-Using it we can combine the benefits of both duck and static typing. For
-example the following code (example from the MyPy's documentation):
-
-```python
-def fib(n):
- a, b = 0, 1
- while a < n:
- yield a
- a, b = b, a+b
-```
-
-Can be statically typed as
-
-```python
-def fib(n: int) -> Iterator[int]:
- a, b = 0, 1
- while a < n:
- yield a
- a, b = b, a+b
-```
-
-Apart from bug checking, this tool can greatly improve maintenance of the
-project and serve as a machine-checked documentation. If you are just
-starting out with MyPy, check this awesome guide:
-[The Comprehensive Guide to mypy](https://dev.to/tusharsadhwani/the-comprehensive-guide-to-mypy-561m){:target="_blank"}.
\ No newline at end of file
diff --git a/_posts/2023-05-20-numerical-calculations-automation.md b/_posts/2023-05-20-numerical-calculations-automation.md
deleted file mode 100644
index 90a32b7..0000000
--- a/_posts/2023-05-20-numerical-calculations-automation.md
+++ /dev/null
@@ -1,311 +0,0 @@
----
-title: "Personal Project - Automating numerical calculations and implementing ML models"
-page_title: "Personal Project - Automating numerical calculations"
-excerpt: "The objective of this project is to develop a system enabling
-scientists to automate numerical calculations on remote clusters and build
-an internal database of calculation outcomes. It also involves training
-machine learning models on these calculations and seamlessly integrating them
-for numerical predictions."
-date: May 20, 2023
-toc: true
-toc_label: "Content"
-toc_sticky: true
-last_modified_at: May 20, 2023
-og_image: /assets/images/posts/numerical-calculations-automation/header.jpg
----
-
-{% include image.html
- src="/assets/images/posts/numerical-calculations-automation/header.jpg"
- alt="numerical-calculations-automation"
- caption="Image Source: Pexels"
-%}
-
-## Introduction
-
-As technology advances most scientists are looking for ways to use modern solutions
-in their everyday work. The web is filled with different kinds of tools that can boost
-the productivity of the whole team and speed-up the research process. Usually those
-tools are highly specialized for their field of research and require some tech
-knowledge on how to set them up.
-
-If the research team is vastly invested in numerical analysis and long-running
-calculations, that means setting up a specialized cluster, compiling all the
-necessary libraries, and handling updates - which requires some experience[^1].
-
-Once everything is up and running, most of the time goes into:
-- configuring calculations
-- copying required files (especially between multiple calculations)
-- running calculations
-- checking if everything is going well and monitoring resources
-- reviewing output data
-- confirming that the calculation converged and the output makes sense
-- sharing results with the team and comparing it between different calculations etc.
-
-Some of this work can be automated using custom scripts, but that's not a
-solution that provides high level of automation and frees up the time of
-scientists to focus on important parts of their research.
-
-Another important topic is ML-hype train, which is taking research papers by storm.
-Most of research teams are already working on incorporating ML models in their
-work, but dealing with tech required for training and running models can be
-daunting for starters. Moreover, experimenting with ML models and training requires
-easy access to well-structured data, not to mention resources for managing and
-running models for inference.
-
-Realizing that all of this can be a huge burden and a waste of time for researchers,
-I've decided to create a personal project in collaboration with a group
-from the Institute for Multidisciplinary Research in Belgrade. The goal is to
-create a system that allows scientists to automate numerical calculations and
-easily integrate ML models for numerical predictions.
-
-This post gives an overview of the system, its features, and how it works. The
-source code is not available for now. The idea is still in the progress and
-architecture presented here may change significantly.
-
-If you have any additional questions and suggestions, please don't hesitate
-to reach out.
-
-[^1]: Unless they are paying for computation time on some pre-configured cloud cluster, but that is expensive in most cases.
-
-## Features
-
-This system allows scientists to use a web application to:
-- Run calculations
-- Monitor cluster resources
-- Check calculation configurations and logs
-- Review and download analysis reports
-- Start and stop ML models
-- Use models for predictions etc.
-
-The web application acts as an admin dashboard where a user can create and edit
-different components. However, it has a hierarchy of permissions: user, moderator,
-and admin. The main difference is, of course, level of access to certain parts of
-the dashboard, for example, only admins can manager users etc.
-
-Another important feature are notifications. Instead of constantly checking
-what's going on with the calculation process, scientists can receive email
-notifications informing them that calculation was successful and an analysis
-report is ready, or something went wrong and an error was raised.
-
-All the calculation data and analysis reports are stored in a database. This
-gives a good overview and provides access to well-structured data for further
-research, especially for training models.
-
-Training of models is done outside of the system, since that process highly
-depends on the use case, requirements, and experimentation. However, once the
-team is satisfied with a model, they can "plug it in", and use it
-within the system by issuing commands through the web application.
-
-## Architecture
-
-The system consists of the following components:
-- Numerical Software
-- Cluster
-- Service
-- API
-- Database
-- Web Application
-- ML-API
-- Models
-
-
-{% include image.html
- src="/assets/images/posts/numerical-calculations-automation/architecture.png"
- alt="numeric-calculations-automation-architecture"
- caption="Architecture"
-%}
-
-### Service
-
-Written in Python, runs on a cluster as a background service, communicates
-with the API, and takes care of the following:
-- **Cluster heartbeat** - Sends information about the cluster's state every X
-seconds (CPU, memory ...). This data is then graphed in web application for
-a particular calculation or cluster
-- **Sets up the folder structure** - Manages folder structure for different
-calculations and copies required files and configurations. If you are familiar
-with
-[SIESTA](https://en.wikipedia.org/wiki/SIESTA_(computer_program)){:target="_blank"},
-think of .fdf, .psf files etc
-- **Runs a calculation** - Checks API if someone created a calculation and
-starts calculation process
-- **Runs multiple calculations** - Sometimes calculation run consists of multiple
-calculations with slightly different configuration. For example, same molecule
-but different voltage or rotation. In this case, service will create multiple
-folders for different calculations and run calculations in specific order while
-copying required files between them (since one calculation depends on the
-result of another)
-- **Calculation state** - Keeps track of the running calculation and informs
-API of what's happening. These state updates are used for email notifications
-- **Validates results** - Getting output results doesn't mean that the
-calculation was successful. Sometimes the process went fine but the calculation
-didn't converge. Service validates results to prevent cases where further
-calculations work with "invalid" results
-- **Extracts data for analysis** - Most of the time calculation output results
-are quite large, in order of gigabytes. Transferring this amount of data is not
-efficient and actually not needed, since research teams are usually interested
-in a small subset of those results. Therefore, service can extract and prepare
-data for further analysis. This data then gets stored in a database and is
-used for graphs on the frontend side
-- **Deletes raw files** - Since output data is quite large, storage needs to be
-closely monitored. Keeping raw data files for old calculations doesn't make sense
-in many cases. Therefore, this service can also clean up calculations folder,
-keeping extracted data but removing large files of raw data
-
-Some of the things mentioned here are very specific and highly depended on a
-particular use case. The service currently runs SIESTA and performs a set
-of analyses defined by the research team I'm collaborating with. However, it can
-be easily extended to support different numerical software and data preparation
-steps.
-
-### API
-
-Written in Node.js using
-[express.js](https://expressjs.com/){:target="_blank"}
-framework, takes care of the following:
-- Receives all kinds of updates from clusters
-- Stores/Retrieves data in/from the database
-- Manages authentication and authorization for the web application
-- Integrates with third party systems for email notifications
-- Gets model heartbeat from the ML-API
-
-### Web Application
-
-Written in React, serves as a control dashboard, contains the following
-sections:
-- Dashboard - Overview of clusters' state, running calculations...
-- Analyses - Calculation reports. Includes graphs, tables...
-- Calculations - List of calculations, details, configurations, state...
-- Molecules - This is specific for the team that I'm working with, but we have
-models of molecules which are connected to calculations. So the team can easily
-see all the calculations for a particular model and compare results.
-- Models - List of ML models that are present in the system and their
-configuration, state, etc... Basically, panel for managing and interacting with
-models.
-- Clusters - Panel for managing clusters
-- Logs - Live feed of service logs
-- Users - Panel for managing users
-
-The idea behind this web application is to give a user-friendly overview of
-the whole system where scientists can easily review calculations, set
-configurations, compare results, download reports, manage models,
-and use them for predictions.
-
-The aforementioned permissions limit the level of access, for example:
-- Only admins can delete logs, manage multiple cluster configurations etc
-- Moderators can change users' passwords and block them, but they cannot
-create or delete them
-- Users can view/create/edit everything related to calculations and analyses
-
-### ML-API
-
-If you take a closer look at the [architecture](#architecture-graph) you'll
-see that the web application communicates with two APIs. One of them is ML-API
-this API is written in Python using
-[FastAPI](https://fastapi.tiangolo.com/){:target="_blank"}
-framework, and it takes care of the following:
-- **Manages ML models** - Scientists can run, stop, or delete models
-- **Interacts with docker engine** - Models are implemented as docker images and
-this API controls them via docker engine
-- **Model heartbeat** - Checks the state of running models and sends it to API
-- **Proxies inference** - Inference calls from a web application are proxied to
-an appropriate docker container
-
-## Models
-
-Experimental phase of training and validating models is done outside of the
-system. Once the team is satisfied with the results, they can package the model
-in a docker image and "plug it in".
-
-The docker image has to satisfy certain things. It has to have a
-server with the following endpoints:
-- `/ping` - GET - Responds with 200 if model is loaded and everything is fine
-- `/invocations` - POST - Receives inference parameters and returns prediction
-data
-
-The `/ping` is required for the heartbeat. ML-API periodically checks running
-containers and requires 200 status, otherwise it will restart the container.
-
-If you've ever built custom BYOC models for AWS Sagemaker, you'll recognize
-this pattern. Meaning, you can run the same model in this and different systems.
-
-> Check
-> [SageMaker Serverless Inference using BYOC](https://www.vladsiv.com/sagemaker-serverless-inference-byoc/){:target="_blank"}
-> for more information
-
-## How it works
-
-### Running Calculations
-
-Scientist can create a new calculation in the web application by providing
-name, description, configuration, choosing a cluster and a set of analyses to be
-performed. This will then be stored in the database with `processed=False` field.
-
-Services, that run on multiple clusters, will query the API asking for
-calculations that have `processed=False&cluster=XYZ`, start with the
-process based on the configuration, and mark it as `running`.
-
-This process is very basic for now, but the plan is to extend it with:
-- `approved` - Calculation needs to be approved by moderator or admin before
-the service can pull it. This would be valuable for larger teams with less
-experienced researchers
-- `priority` - Some basic stuff in order to push one calculation in front of
-another
-
-### Authentication and Authorization
-
-Authentication and authorization is done using JSON web tokens (JWTs). When a
-user logs in, it will get a refresh token via a cookie and access token which
-the application stores in memory. Every X minutes the refresh token is used
-to refresh the access token. This is a standard way to manage JWT and admins
-can easily disable refresh tokens, if they ever have to.
-
-Another benefit of using JWTs is managing authentication across multiple APIs.
-For example, ML-API just checks if JWT is provided and valid, it doesn't have to
-issue or manage tokens.
-
-However, communication between cluster services and API is done using an
-API key. The keys can be easily changed in the web application and set in
-configuration files on the clusters' side. The same goes for ML-API.
-
-### Emails
-
-The service constantly updates the API and for certain types of events, API
-will send an email based on the user's configuration. Users can configure
-INFO or ERROR level of emails. Besides the user who created a calculation, all
-the admins will get emails as well.
-
-Emails are sent using third party email delivery platforms such as
-[Mailtrap](https://mailtrap.io/){:target="_blank"} or
-[SendGrid](https://sendgrid.com/).
-
-### Analysis
-
-Once the calculation is done, service will perform a set of analyses on
-extracted data and send it to the API, this data is then fetched on the
-frontend application and used to create graphs, tables, charts etc.
-
-Currently, the research team, I'm collaborating with, already has a well-defined
-set of data analyses they perform each time and the process is straight-forward,
-but that won't be the case for all teams. In such situations, extending the
-service can be easily done and web application adjusted to support checkboxes
-or other selectors.
-
-However, there are cases where `analysis_2` depends on the results of
-`analysis_1` and careful ordering of actions is required. The web application
-should make sure that the set of analyses is valid i.e. `analysis_2` cannot be
-selected without `analysis_1` and so on...
-
-### Multiple clusters
-
-Some research groups have access to multiple clusters or machines. In that case
-they can configure the system to work with all of them:
-- Add cluster to the system through the web application
-- Generate API keys
-- Install and configure the service on clusters
-- Check logs in UI to see if service is working and connection is established
-
-Having multiple clusters is beneficial when it comes to updates. Numerical
-software usually needs to be recompiled with updated libraries and that takes
-time. In such situations, the team can disable the service during the update and
-enable it once the cluster is ready.